[PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace
Oracle:
https://docs.oracle.com/en/database/oracle/oracle-database/18/adfns/regexp.html#GUID-F14733F3-B943-4BAD-8489-F9704986386B
IBM:
https://www.ibm.com/support/producthub/db2/docs/content/SSEPGG_11.5.0/com.ibm.db2.luw.sql.ref.doc/doc/r0061494.html?pos=2
Z/OS:
https://www.ibm.com/support/knowledgecenter/SSEPEK_12.0.0/sqlref/src/tpc/db2z_bif_regexplike.html
EDB:
https://www.enterprisedb.com/edb-docs/d/edb-postgres-advanced-server/reference/database-compatibility-for-oracle-developers-reference-guide/9.6/Database_Compatibility_for_Oracle_Developers_Reference_Guide.1.098.html
Hi,
I would like to suggest adding the $subject functions to PostgreSQL. We
can do lot of things using regexp_matches() and regexp_replace() but
some time it consist on building complex queries that these functions
can greatly simplify.
Look like all RDBMS that embedded a regexp engine implement these
functions (Oracle, DB2, MySQL, etc) but I don't know if they are part of
the SQL standard. Probably using regexp_matches() can be enough even if
it generates more complex statements but having these functions in
PostgreSQL could be useful for users and code coming from theses RDBMS.
- REGEXP_COUNT( string text, pattern text, [, position int] [, flags
text ] ) -> integer
Return the number of times a pattern occurs in a source string
after a certain position, default from beginning.
It can be implemented in PostgreSQL as a subquery using:
SELECT count(*) FROM regexp_matches('A1B2C3', '[A-Z][0-9]',
'g'); -> 3
To support positioning we have to use substr(), for example
starting at position 2:
SELECT count(*) FROM regexp_matches(substr('A1B2C3', 2),
'[A-Z][0-9]'); -> 2
With regexp_count() we can simply use it like this:
SELECT regexp_count('A1B2C3', '[A-Z][0-9]'); -> 3
SELECT regexp_count('A1B2C3', '[A-Z][0-9]', 2); -> 2
- REGEXP_INSTR( string text, pattern text, [, position int] [,
occurrence int] [, return_opt int ] [, flags text ] [, group int] ) ->
integer
Return the position in a string for a regular expression
pattern. It returns an integer indicating the beginning or ending
position of the matched substring, depending on the value of the
return_opt argument (default beginning). If no match is found, then the
function returns 0.
* position: indicates the character where the search should
begin.
* occurrence: indicates which occurrence of pattern found in
string should be search.
* return_opt: 0 mean returns the position of the first
character of the occurrence, 1 mean returns the position of the
character following the occurrence.
* flags: regular expression modifiers.
* group: indicates which subexpression in pattern is the
target of the function.
Example:
SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1,
0, 'i', 4); -> 7
to obtain a PostgreSQL equivalent:
SELECT position((SELECT (regexp_matches('1234567890',
'(123)(4(56)(78))', 'ig'))[4] offset 0 limit 1) IN '1234567890');
- REGEXP_SUBSTR( string text, pattern text, [, position int] [,
occurrence int] [, flags text ] [, group int] ) -> text
It is similar to regexp_instr(), but instead of returning the
position of the substring, it returns the substring itself.
Example:
SELECT regexp_substr('500 gilles''s street, 38000 Grenoble,
FR', ',[^,]+,'); -> , 38000 Grenoble,
or with a more complex extraction:
SELECT regexp_substr('1234567890', '(123)(4(56)(78))', 1, 1,
'i', 4); -> 78
SELECT regexp_substr('1234567890 1234557890',
'(123)(4(5[56])(78))', 1, 2, 'i', 3); -> 55
To obtain the same result for the last example we have to use:
SELECT (SELECT * FROM regexp_matches('1234567890
1234557890', '(123)(4(5[56])(78))', 'g') offset 1 limit 2)[3];
I have not implemented the regexp_like() function, it is quite similar
than the ~ and ~* operators except that it can also support other
modifiers than 'i'. I can implement it easily and add it to the patch if
we want to supports all those common functions.
- REGEXP_LIKE( string text, pattern text, [, flags text ] ) -> boolean
Similar to the LIKE condition, except that it performs regular
expression matching instead of the simple pattern matching performed by
LIKE.
Example:
SELECT * FROM t1 WHERE regexp_like(col1, '^d$', 'm');
to obtain a PostgreSQL equivalent:
SELECT * FROM t1 WHERE regexp_match (col1, '^d$', 'm' ) IS
NOT NULL;
There is also a possible extension to regexp_replace() that I have not
implemented yet because it need more work than the previous functions.
- REGEXP_REPLACE( string text, pattern text, replace_string text, [,
position int] [, occurrence int] [, flags text ] )
Extend PostgreSQL regexp_replace() by adding position and occurrence
capabilities.
The patch is ready for testing with documentation and regression tests.
Best regards,
--
Gilles Darold
LzLabs GmbH
My apologies for the links in the head, the email formatting and the
missing patch, I accidently send the email too early.
--
Gilles
Attachments:
regexp_functions-v1.patchtext/x-patch; charset=UTF-8; name=regexp_functions-v1.patchDownload
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index bf99f82149..88e08b40d2 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3097,6 +3097,47 @@ repeat('Pg', 4) <returnvalue>PgPgPgPg</returnvalue>
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>regexp_count</primary>
+ </indexterm>
+ <function>regexp_count</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type> [, <parameter>position</parameter> <type>integer</type> ] [, <parameter>flags</parameter> <type>text</type> ] )
+ <returnvalue>integer</returnvalue>
+ </para>
+ <para>
+ Return the number of times a pattern occurs for a match of a POSIX
+ regular expression to the <parameter>string</parameter>; see
+ <xref linkend="functions-posix-regexp"/>.
+ </para>
+ <para>
+ <literal>regexp_count('123456789012', '\d{3}', 3)</literal>
+ <returnvalue>3</returnvalue>
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>regexp_instr</primary>
+ </indexterm>
+ <function>regexp_instr</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type> [, <parameter>position</parameter> <type>integer</type> ] [, <parameter>occurence</parameter> <type>integer</type> ] [, <parameter>returnopt</parameter> <type>integer</type> ] [, <parameter>flags</parameter> <type>text</type> ] [, <parameter>group</parameter> <type>integer</type> ] )
+ <returnvalue>integer</returnvalue>
+ </para>
+ <para>
+ Return the position within <parameter>string</parameter> where the
+ match of a POSIX regular expression occurs. It returns an integer
+ indicating the beginning or ending position of the matched substring,
+ depending on the value of the <parameter>returnopt</parameter> argument
+ (default beginning). If no match is found, then the function returns 0;
+ see <xref linkend="functions-posix-regexp"/>.
+ </para>
+ <para>
+ <literal>regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 4)</literal>
+ <returnvalue>7</returnvalue>
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
@@ -3157,6 +3198,24 @@ repeat('Pg', 4) <returnvalue>PgPgPgPg</returnvalue>
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>regexp_substr</primary>
+ </indexterm>
+ <function>regexp_substr</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type> [, <parameter>position</parameter> <type>integer</type> ] [, <parameter>occurence</parameter> <type>integer</type> ] [, <parameter>flags</parameter> <type>text</type> ] [, <parameter>group</parameter> <type>integer</type> ] )
+ <returnvalue>text</returnvalue>
+ </para>
+ <para>
+ Return the substring within <parameter>string</parameter> corresponding to the
+ match of a POSIX regular expression; see <xref linkend="functions-posix-regexp"/>.
+ </para>
+ <para>
+ <literal>regexp_substr('1234567890 1234557890', '(123)(4(5[56])(78))', 1, 2, 'i', 3)</literal>
+ <returnvalue>55</returnvalue>
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
@@ -5295,6 +5354,15 @@ substring('foobar' similar '#"o_b#"%' escape '#') <lineannotation>NULL</linea
<indexterm>
<primary>regexp_split_to_array</primary>
</indexterm>
+ <indexterm>
+ <primary>regexp_count</primary>
+ </indexterm>
+ <indexterm>
+ <primary>regexp_instr</primary>
+ </indexterm>
+ <indexterm>
+ <primary>regexp_substr</primary>
+ </indexterm>
<para>
<xref linkend="functions-posix-table"/> lists the available
@@ -5669,6 +5737,132 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', '\s*') AS foo;
in practice. Other software systems such as Perl use similar definitions.
</para>
+ <para>
+ The <function>regexp_count</function> function returns the number of
+ captured substring(s) resulting from matching a POSIX regular
+ expression pattern to a string. It has the syntax <function>regexp_replace</function>(
+ <replaceable>string</replaceable>, <replaceable>pattern</replaceable> <optional>,
+ <replaceable>position</replaceable> </optional> <optional>, <replaceable>flags</replaceable> </optional>).
+ <replaceable>pattern</replaceable> is searched in <replaceable>string</replaceable> starting
+ from an optional <replaceable>position</replaceable> or from the beginning of <replaceable>string</replaceable>
+ by default. The <replaceable>flags</replaceable> parameter is an optional text string
+ containing zero or more single-letter flags that change the function's behavior.
+ <function>regexp_count</function> accepts all the flags
+ shown in <xref linkend="posix-embedded-options-table"/>.
+ The <literal>g</literal> flag is forced internally to count all matches.
+ This function returns 0 if there is no match or the number of match as
+ an integer.
+ </para>
+
+ <para>
+ Some examples:
+<programlisting>
+SELECT regexp_count('123123123123', '\d{3}', 1);
+ regexp_count
+--------------
+ 4
+(1 row)
+
+SELECT regexp_count('Hello'||CHR(10)||'world!', '^world!$', 1, 'm');
+ regexp_count
+--------------
+ 1
+(1 row)
+</programlisting>
+ </para>
+
+ <para>
+ The <function>regexp_instr</function> function returns the beginning or ending
+ position of the matched substring resulting from matching a POSIX regular
+ expression pattern to a string. It has the syntax <function>regexp_instr</function>(
+ <replaceable>string</replaceable>, <replaceable>pattern</replaceable> <optional>,
+ <replaceable>position</replaceable> </optional> <optional>, <replaceable>occurrence</replaceable> </optional>
+ <optional>, <replaceable>returnopt</replaceable> </optional>
+ <optional>, <replaceable>flags</replaceable> </optional>
+ <optional>, <replaceable>group</replaceable> </optional>).
+ <replaceable>pattern</replaceable> is searched in <replaceable>string</replaceable> starting
+ from an optional <replaceable>position</replaceable> or from the beginning
+ of <replaceable>string</replaceable> by default. <replaceable>occurrence</replaceable>
+ indicates which occurrence of <replaceable>pattern</replaceable> in <replaceable>string</replaceable>
+ should be searched. When <replaceable>returnopt</replaceable> is set to 0 (default) the function
+ returns the position of the first character of the occurrence. When set to 1 returns the position
+ of the character after the occurrence.
+ The <replaceable>flags</replaceable> parameter is an optional text string
+ containing zero or more single-letter flags that change the function's behavior.
+ <function>regexp_count</function> accepts all the flags
+ shown in <xref linkend="posix-embedded-options-table"/>.
+ The <literal>g</literal> flag is forced internally to track all matches.
+ For a pattern with capture groups, <replaceable>group</replaceable> is an integer indicating
+ which capture in pattern is the target of the function. A capture group is a part of the pattern
+ enclosed in parentheses. Capture groups can be nested. They are numbered in order in which their
+ left parentheses appear in pattern. If <replaceable>group</replaceable> is zero, then the position
+ of the entire substring that matches the pattern is returned. If <replaceable>pattern</replaceable>
+ does not have at least <replaceable>group</replaceable> capture group, the function returns zero.
+ This function returns 0 if there is no match or the starting or ending position
+ of match as an integer.
+ </para>
+
+ <para>
+ Some examples:
+<programlisting>
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[^ ]+', 1, 6);
+ regexp_instr
+--------------
+ 32
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 3);
+ regexp_instr
+--------------
+ 5
+(1 row)
+</programlisting>
+ </para>
+
+ <para>
+ The <function>regexp_substr</function> function returns the
+ matched substring resulting from matching a POSIX regular
+ expression pattern to a string. It has the syntax <function>regexp_substr</function>(
+ <replaceable>string</replaceable>, <replaceable>pattern</replaceable> <optional>,
+ <replaceable>position</replaceable> </optional> <optional>, <replaceable>occurrence</replaceable> </optional>
+ <optional>, <replaceable>flags</replaceable> </optional>
+ <optional>, <replaceable>group</replaceable> </optional>).
+ <replaceable>pattern</replaceable> is searched in <replaceable>string</replaceable> starting
+ from an optional <replaceable>position</replaceable> or from the beginning
+ of <replaceable>string</replaceable> by default. <replaceable>occurrence</replaceable>
+ indicates which occurrence of <replaceable>pattern</replaceable> in <replaceable>string</replaceable>
+ should be searched. The <replaceable>flags</replaceable> parameter is an optional text string
+ containing zero or more single-letter flags that change the function's behavior.
+ <function>regexp_count</function> accepts all the flags
+ shown in <xref linkend="posix-embedded-options-table"/>.
+ The <literal>g</literal> flag is forced internally to track all matches.
+ For a pattern with capture groups, <replaceable>group</replaceable> is an integer indicating
+ which capture in pattern is the target of the function. A capture group is a part of the pattern
+ enclosed in parentheses. Capture groups can be nested. They are numbered in order in which their
+ left parentheses appear in pattern. If <replaceable>group</replaceable> is zero, then the position
+ of the entire substring that matches the pattern is returned. If <replaceable>pattern</replaceable>
+ does not have at least <replaceable>group</replaceable> capture group, the function returns zero.
+ This function returns NULL if there is no match or the substring of match.
+ </para>
+
+ <para>
+ Some examples:
+<programlisting>
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+', 1, 2);
+ regexp_substr
+---------------
+ , FR
+(1 row)
+
+SELECT regexp_substr('1234567890 1234557890', '(123)(4(5[56])(78))', 1, 2, 'i', 3);
+ regexp_substr
+---------------
+ 55
+(1 row)
+</programlisting>
+ </para>
+
+
<!-- derived from the re_syntax.n man page -->
<sect3 id="posix-syntax-details">
diff --git a/src/backend/utils/adt/regexp.c b/src/backend/utils/adt/regexp.c
index a32c5c82ab..42c25cdd75 100644
--- a/src/backend/utils/adt/regexp.c
+++ b/src/backend/utils/adt/regexp.c
@@ -120,6 +120,7 @@ static regexp_matches_ctx *setup_regexp_matches(text *orig_str, text *pattern,
static ArrayType *build_regexp_match_result(regexp_matches_ctx *matchctx);
static Datum build_regexp_split_result(regexp_matches_ctx *splitctx);
+static text *enclose_with_parenthesis(text *str);
/*
* RE_compile_and_cache - compile a RE, caching if possible
@@ -555,7 +556,6 @@ texticregexne(PG_FUNCTION_ARGS)
0, NULL));
}
-
/*
* textregexsubstr()
* Return a substring matched by a regular expression.
@@ -1063,6 +1063,350 @@ regexp_matches_no_flags(PG_FUNCTION_ARGS)
return regexp_matches(fcinfo);
}
+/*
+ * regexp_count()
+ * Return the number of matches of a pattern within a string.
+ */
+Datum
+regexp_count(PG_FUNCTION_ARGS)
+{
+ text *orig_str = NULL;
+ text *pattern = PG_GETARG_TEXT_PP(1);
+ int start = 1;
+ text *flags = PG_GETARG_TEXT_PP_IF_EXISTS(3);
+
+ pg_re_flags re_flags;
+ regexp_matches_ctx *matchctx;
+
+ if (PG_NARGS() > 2)
+ start = PG_GETARG_INT32(2);
+
+ if (start < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("negative start position not allowed.")));
+
+ /* regexp_count(string, pattern, start[, flags]) */
+ if (start > 1)
+ orig_str = DatumGetTextPP(DirectFunctionCall2(text_substr_no_len,
+ PG_GETARG_DATUM(0),
+ Int32GetDatum(start)
+ ));
+ else
+ orig_str = PG_GETARG_TEXT_PP(0);
+
+ /* Determine options */
+ parse_re_flags(&re_flags, flags);
+
+ /* this function require flag 'g' */
+ re_flags.glob = true;
+
+ matchctx = setup_regexp_matches(orig_str, pattern, &re_flags,
+ PG_GET_COLLATION(), true, false, false);
+
+ PG_RETURN_INT32(matchctx->nmatches);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_count_no_start(PG_FUNCTION_ARGS)
+{
+ return regexp_count(fcinfo);
+}
+
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_count_no_flags(PG_FUNCTION_ARGS)
+{
+ return regexp_count(fcinfo);
+}
+
+/*
+ * Return text string between parenthesis
+ */
+static text *
+enclose_with_parenthesis(text *str)
+{
+ int len = VARSIZE_ANY_EXHDR(str);
+ text *result;
+ char *ptr;
+
+ result = palloc(len + VARHDRSZ + 2);
+ SET_VARSIZE(result, len + VARHDRSZ + 2);
+ ptr = VARDATA(result);
+ memcpy(ptr, "(", 1);
+ memcpy(ptr+1, VARDATA_ANY(str), len);
+ memcpy(ptr+len+1, ")", 1);
+
+ return result;
+}
+
+/*
+ * regexp_instr()
+ * Return the position within the string where the match was located
+ */
+Datum
+regexp_instr(PG_FUNCTION_ARGS)
+{
+ text *orig_str = NULL;
+ text *pattern = NULL;
+ text *flags = PG_GETARG_TEXT_PP_IF_EXISTS(5);
+ int start = 1;
+ int occurrence = 1;
+ int return_opt = 0;
+ int subexpr = 0;
+ int pos;
+
+ pg_re_flags re_flags;
+ regexp_matches_ctx *matchctx;
+
+ /* regexp_instr(string, pattern, start) */
+ if (PG_NARGS() > 2)
+ start = PG_GETARG_INT32(2);
+
+ if (start < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("negative start position not allowed.")));
+
+ if (start > 1)
+ orig_str = DatumGetTextPP(DirectFunctionCall2(text_substr_no_len,
+ PG_GETARG_DATUM(0),
+ Int32GetDatum(start)
+ ));
+ else
+ /* regexp_instr(string, pattern) */
+ orig_str = PG_GETARG_TEXT_PP(0);
+
+ if (orig_str == NULL)
+ PG_RETURN_NULL();
+
+ /* regexp_instr(string, pattern, start, occurrence) */
+ if (PG_NARGS() > 3)
+ occurrence = PG_GETARG_INT32(3);
+
+ if (occurrence <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("occurrence value must be greater than 0.")));
+
+ /* regexp_instr(string, pattern, start, occurrence, return_opt) */
+ if (PG_NARGS() > 4)
+ return_opt = PG_GETARG_INT32(4);
+
+ if (return_opt != 0 && return_opt != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("return option value must be 0 or 1.")));
+
+ /* regexp_instr(string, pattern, start, occurrence, return_opt, flags, subexpr) */
+ if (PG_NARGS() > 6)
+ subexpr = PG_GETARG_INT32(6);
+ if (subexpr < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("number of capture group must be a positive value.")));
+
+ /*
+ * If subexpr is zero (default), then the position of the entire
+ * substring that matches the pattern is returned. Otherwise we
+ * will exactly register the subexpressions given in the pattern.
+ * Enclose pattern between parenthesis to register the position
+ * of the entire substring.
+ */
+ pattern = enclose_with_parenthesis(PG_GETARG_TEXT_PP(1));
+
+ /* Determine options */
+ parse_re_flags(&re_flags, flags);
+ /* this function require flag 'g' */
+ re_flags.glob = true;
+
+ matchctx = setup_regexp_matches(orig_str, pattern, &re_flags,
+ PG_GET_COLLATION(), true, false, false);
+
+ /* If no match is found, then the function returns 0 */
+ if (matchctx->nmatches == 0)
+ PG_RETURN_INT32(0);
+
+ /* When occurrence exceed matches return 0 */
+ if (occurrence > matchctx->nmatches)
+ PG_RETURN_INT32(0);
+
+ /* When subexpr exceed number of subexpression return 0 */
+ if (subexpr > matchctx->npatterns - 1)
+ PG_RETURN_INT32(0);
+
+ /*
+ * Returns the position of the first character of the occurrence
+ * or for subexpression in this occurrence.
+ */
+ pos = (occurrence*matchctx->npatterns*2)-((matchctx->npatterns-subexpr)*2);
+ if (return_opt == 1)
+ pos += 1;
+
+ PG_RETURN_INT32(matchctx->match_locs[pos]+start);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_instr_no_start(PG_FUNCTION_ARGS)
+{
+ return regexp_instr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_instr_no_occurrence(PG_FUNCTION_ARGS)
+{
+ return regexp_instr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_instr_no_return_opt(PG_FUNCTION_ARGS)
+{
+ return regexp_instr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_instr_no_flags(PG_FUNCTION_ARGS)
+{
+ return regexp_instr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_instr_no_subexpr(PG_FUNCTION_ARGS)
+{
+ return regexp_instr(fcinfo);
+}
+
+/*
+ * regexp_substr()
+ * Return the substring within the string that match a regular
+ * expression pattern
+ */
+Datum
+regexp_substr(PG_FUNCTION_ARGS)
+{
+ text *orig_str = NULL;
+ text *pattern = NULL;
+ text *flags = PG_GETARG_TEXT_PP_IF_EXISTS(4);
+ int start = 1;
+ int occurrence = 1;
+ int subexpr = 0;
+ int so, eo, pos;
+
+ pg_re_flags re_flags;
+ regexp_matches_ctx *matchctx;
+
+ if (PG_NARGS() > 2)
+ start = PG_GETARG_INT32(2);
+
+ if (start < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("negative start position not allowed.")));
+
+ /* regexp_substr(string, pattern, start) */
+ if (start > 1)
+ orig_str = DatumGetTextPP(DirectFunctionCall2(text_substr_no_len,
+ PG_GETARG_DATUM(0),
+ Int32GetDatum(start)
+ ));
+ else
+ /* regexp_substr(string, pattern) */
+ orig_str = PG_GETARG_TEXT_PP(0);
+
+ if (orig_str == NULL)
+ PG_RETURN_NULL();
+
+ /* regexp_substr(string, pattern, start, occurrence) */
+ if (PG_NARGS() > 3)
+ occurrence = PG_GETARG_INT32(3);
+ if (occurrence <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("occurrence value must be greater than 0.")));
+
+ /* regexp_substr(string, pattern, start, occurrence, flags, subexpr) */
+ if (PG_NARGS() > 5)
+ subexpr = PG_GETARG_INT32(5);
+ if (subexpr < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("number of capture group must be a positive value.")));
+
+ /*
+ * If subexpr is zero (default), then the position of the entire
+ * substring that matches the pattern is returned. Otherwise we
+ * will exactly register the subexpressions given in the pattern.
+ * Enclose pattern between parenthesis to register the position
+ * of the entire substring.
+ */
+ pattern = enclose_with_parenthesis(PG_GETARG_TEXT_PP(1));
+
+ /* Determine options */
+ parse_re_flags(&re_flags, flags);
+ /* this function require flag 'g' */
+ re_flags.glob = true;
+
+ matchctx = setup_regexp_matches(orig_str, pattern, &re_flags,
+ PG_GET_COLLATION(), true, false, false);
+
+ /* If no match is found, then the function returns NULL */
+ if (matchctx->nmatches == 0)
+ PG_RETURN_NULL();
+
+ /* When occurrence exceed matches return NULL */
+ if (occurrence > matchctx->nmatches)
+ PG_RETURN_NULL();
+
+ /* When subexpr exceed number of subexpression return NULL */
+ if (subexpr > matchctx->npatterns - 1)
+ PG_RETURN_NULL();
+
+ /* Returns the substring corresponding to the occurrence. */
+ pos = (occurrence*matchctx->npatterns*2)-((matchctx->npatterns-subexpr)*2);
+ so = matchctx->match_locs[pos]+1;
+ eo = matchctx->match_locs[pos+1]+1;
+
+ PG_RETURN_DATUM(DirectFunctionCall3(text_substr,
+ PointerGetDatum(matchctx->orig_str),
+ Int32GetDatum(so),
+ Int32GetDatum(eo - so)));
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_substr_no_start(PG_FUNCTION_ARGS)
+{
+ return regexp_substr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_substr_no_occurrence(PG_FUNCTION_ARGS)
+{
+ return regexp_substr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_substr_no_flags(PG_FUNCTION_ARGS)
+{
+ return regexp_substr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_substr_no_subexpr(PG_FUNCTION_ARGS)
+{
+ return regexp_substr(fcinfo);
+}
+
/*
* setup_regexp_matches --- do the initial matching for regexp_match
* and regexp_split functions
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 3d3974f467..aae9226ec5 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -3557,6 +3557,48 @@
proname => 'regexp_matches', prorows => '10', proretset => 't',
prorettype => '_text', proargtypes => 'text text text',
prosrc => 'regexp_matches' },
+{ oid => '9614', descr => 'count match(es) for regexp',
+ proname => 'regexp_count', prorettype => 'int4', proargtypes => 'text text',
+ prosrc => 'regexp_count_no_start' },
+{ oid => '9615', descr => 'count match(es) for regexp',
+ proname => 'regexp_count', prorettype => 'int4', proargtypes => 'text text int4',
+ prosrc => 'regexp_count_no_flags' },
+{ oid => '9616', descr => 'count match(es) for regexp',
+ proname => 'regexp_count', prorettype => 'int4',
+ proargtypes => 'text text int4 text', prosrc => 'regexp_count' },
+{ oid => '9617', descr => 'position where the match for regexp was located',
+ proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text',
+ prosrc => 'regexp_instr_no_start' },
+{ oid => '9618', descr => 'position where the match for regexp was located',
+ proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text int4',
+ prosrc => 'regexp_instr_no_occurrence' },
+{ oid => '9619', descr => 'position where the match for regexp was located',
+ proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text int4 int4',
+ prosrc => 'regexp_instr_no_return_opt' },
+{ oid => '9620', descr => 'position where the match for regexp was located',
+ proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text int4 int4 int4',
+ prosrc => 'regexp_instr_no_flags' },
+{ oid => '9621', descr => 'position where the match for regexp was located',
+ proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text int4 int4 int4 text',
+ prosrc => 'regexp_instr_no_subexpr' },
+{ oid => '9622', descr => 'position where the match for regexp was located',
+ proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text int4 int4 int4 text int4',
+ prosrc => 'regexp_instr' },
+{ oid => '9623', descr => 'substring that match the regexp pattern',
+ proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text',
+ prosrc => 'regexp_substr_no_start' },
+{ oid => '9624', descr => 'substring that match the regexp pattern',
+ proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text int4',
+ prosrc => 'regexp_substr_no_occurrence' },
+{ oid => '9625', descr => 'substring that match the regexp pattern',
+ proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text int4 int4',
+ prosrc => 'regexp_substr_no_flags' },
+{ oid => '9626', descr => 'substring that match the regexp pattern',
+ proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text int4 int4 text',
+ prosrc => 'regexp_substr_no_subexpr' },
+{ oid => '9627', descr => 'substring that match the regexp pattern',
+ proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text int4 int4 text int4',
+ prosrc => 'regexp_substr' },
{ oid => '2088', descr => 'split string by field_sep and return field_num',
proname => 'split_part', prorettype => 'text',
proargtypes => 'text text int4', prosrc => 'split_part' },
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index fb4573d85f..99372be6f0 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -905,6 +905,393 @@ SELECT foo, length(foo) FROM regexp_split_to_table('thE QUick bROWn FOx jUMPs ov
ERROR: regexp_split_to_table() does not support the "global" option
SELECT regexp_split_to_array('thE QUick bROWn FOx jUMPs ovEr The lazy dOG', 'e', 'g');
ERROR: regexp_split_to_array() does not support the "global" option
+-- count all matches for regexp
+SELECT regexp_count('123123123123123', '(12)3');
+ regexp_count
+--------------
+ 5
+(1 row)
+
+-- count all matches with start position
+SELECT regexp_count('123123123123', '123', 0);
+ regexp_count
+--------------
+ 4
+(1 row)
+
+SELECT regexp_count('123123123123', '123', 1);
+ regexp_count
+--------------
+ 4
+(1 row)
+
+SELECT regexp_count('123123123123', '123', 3);
+ regexp_count
+--------------
+ 3
+(1 row)
+
+SELECT regexp_count('123123123123', '123', -3);
+ERROR: negative start position not allowed.
+-- count all matches in NULL string with a start position
+SELECT regexp_count(NULL, '123', 3);
+ regexp_count
+--------------
+ \N
+(1 row)
+
+-- count all matches with a start position greater than string length
+SELECT regexp_count('123', '123', 10);
+ regexp_count
+--------------
+ 0
+(1 row)
+
+-- count all matches from different regexp
+SELECT regexp_count('ABC123', '[A-Z]'), regexp_count('A1B2C3', '[A-Z]');
+ regexp_count | regexp_count
+--------------+--------------
+ 3 | 3
+(1 row)
+
+SELECT regexp_count('ABC123', '[A-Z][0-9]'), regexp_count('A1B2C3', '[A-Z][0-9]');
+ regexp_count | regexp_count
+--------------+--------------
+ 1 | 3
+(1 row)
+
+SELECT regexp_count('ABC123', '[A-Z][0-9]{2}'), regexp_count('A1B2C3', '[A-Z][0-9]{2}');
+ regexp_count | regexp_count
+--------------+--------------
+ 1 | 0
+(1 row)
+
+SELECT regexp_count('ABC123', '([A-Z][0-9]){2}'), regexp_count('A1B2C3', '([A-Z][0-9]){2}');
+ regexp_count | regexp_count
+--------------+--------------
+ 0 | 1
+(1 row)
+
+SELECT regexp_count('ABC123A5', '^[A-Z][0-9]'), regexp_count('A1B2C3', '^[A-Z][0-9]');
+ regexp_count | regexp_count
+--------------+--------------
+ 0 | 1
+(1 row)
+
+SELECT regexp_count('ABC123', '[A-Z][0-9]{2}'), regexp_count('A1B2C34', '[A-Z][0-9]{2}');
+ regexp_count | regexp_count
+--------------+--------------
+ 1 | 1
+(1 row)
+
+-- count matches with newline case insensivity
+SELECT regexp_count('a'||CHR(10)||'d', 'a.d');
+ regexp_count
+--------------
+ 1
+(1 row)
+
+SELECT regexp_count('a'||CHR(10)||'d', 'a.d', 1, 's');
+ regexp_count
+--------------
+ 1
+(1 row)
+
+-- count matches with newline case sensivity
+SELECT regexp_count('a'||CHR(10)||'d', 'a.d', 1, 'n');
+ regexp_count
+--------------
+ 0
+(1 row)
+
+-- count not multiline matches
+SELECT regexp_count('a'||CHR(10)||'d', '^d$');
+ regexp_count
+--------------
+ 0
+(1 row)
+
+-- count multiline matches
+SELECT regexp_count('a'||CHR(10)||'d', '^d$', 1, 'm');
+ regexp_count
+--------------
+ 1
+(1 row)
+
+SELECT regexp_count('Hello'||CHR(10)||'world!', '^world!$', 1, 'm'); -- 1
+ regexp_count
+--------------
+ 1
+(1 row)
+
+DROP TABLE IF EXISTS regexp_temp;
+NOTICE: table "regexp_temp" does not exist, skipping
+CREATE TABLE regexp_temp(fullname varchar(20), email varchar(20));
+INSERT INTO regexp_temp (fullname, email) VALUES ('John Doe', 'johndoe@example.com');
+INSERT INTO regexp_temp (fullname, email) VALUES ('Jane Doe', 'janedoe');
+-- count matches case sensitive
+SELECT fullname, regexp_count(fullname, 'e', 1, 'c') FROM regexp_temp;
+ fullname | regexp_count
+----------+--------------
+ John Doe | 1
+ Jane Doe | 2
+(2 rows)
+
+SELECT fullname, regexp_count(fullname, 'D', 1, 'c') FROM regexp_temp;
+ fullname | regexp_count
+----------+--------------
+ John Doe | 1
+ Jane Doe | 1
+(2 rows)
+
+SELECT fullname, regexp_count(fullname, 'd', 1, 'c') FROM regexp_temp;
+ fullname | regexp_count
+----------+--------------
+ John Doe | 0
+ Jane Doe | 0
+(2 rows)
+
+-- count matches case insensitive
+SELECT fullname, regexp_count(fullname, 'E', 1, 'i') FROM regexp_temp;
+ fullname | regexp_count
+----------+--------------
+ John Doe | 1
+ Jane Doe | 2
+(2 rows)
+
+SELECT fullname, regexp_count(fullname, 'do', 1, 'i') FROM regexp_temp;
+ fullname | regexp_count
+----------+--------------
+ John Doe | 1
+ Jane Doe | 1
+(2 rows)
+
+-- return the start position of the 6th occurence starting at beginning of the string
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[^ ]+', 1, 6);
+ regexp_instr
+--------------
+ 32
+(1 row)
+
+-- return the start position of the 5th occurence starting after the first word
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[^ ]+', 7, 5, 0);
+ regexp_instr
+--------------
+ 32
+(1 row)
+
+-- return the ending position of the 5th occurence starting after the first word
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[^ ]+', 7, 5, 0);
+ regexp_instr
+--------------
+ 32
+(1 row)
+
+-- return the ending position of the 2nd occurence starting after the first word
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[T|Z|S][[:alpha:]]{5}', 7, 2, 1, 'i');
+ regexp_instr
+--------------
+ 30
+(1 row)
+
+-- return the starting position corresponding to the different capture group
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 0);
+ regexp_instr
+--------------
+ 1
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 1);
+ regexp_instr
+--------------
+ 1
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 2);
+ regexp_instr
+--------------
+ 4
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 3);
+ regexp_instr
+--------------
+ 5
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 4);
+ regexp_instr
+--------------
+ 7
+(1 row)
+
+-- return the starting position corresponding to a non existant capture group
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 5);
+ regexp_instr
+--------------
+ 0
+(1 row)
+
+-- Same but with the ending position
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 0);
+ regexp_instr
+--------------
+ 9
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 1);
+ regexp_instr
+--------------
+ 4
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 2);
+ regexp_instr
+--------------
+ 9
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 3);
+ regexp_instr
+--------------
+ 7
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 4);
+ regexp_instr
+--------------
+ 9
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 5);
+ regexp_instr
+--------------
+ 0
+(1 row)
+
+-- start position of a valid email
+SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+') "valid_email" FROM regexp_temp;
+ email | valid_email
+---------------------+-------------
+ johndoe@example.com | 1
+ janedoe | 0
+(2 rows)
+
+-- ending position of a valid email
+SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+', 1, 1, 1) "valid_email" FROM regexp_temp;
+ email | valid_email
+---------------------+-------------
+ johndoe@example.com | 20
+ janedoe | 0
+(2 rows)
+
+-- start position of first capture group in the email (the dot part)
+SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+', 1, 1, 0, 'i', 1) FROM regexp_temp;
+ email | regexp_instr
+---------------------+--------------
+ johndoe@example.com | 16
+ janedoe | 0
+(2 rows)
+
+-- ending position of first capture group in the email (the dot part)
+SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+', 1, 1, 1, 'i', 1) FROM regexp_temp;
+ email | regexp_instr
+---------------------+--------------
+ johndoe@example.com | 20
+ janedoe | 0
+(2 rows)
+
+-- test negative values
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', -1, 1, 1, 'i', 1);
+ERROR: negative start position not allowed.
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, -1, 1, 'i', 1);
+ERROR: occurrence value must be greater than 0.
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, -1, 'i', 1);
+ERROR: return option value must be 0 or 1.
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', -1);
+ERROR: number of capture group must be a positive value.
+-- return the substring matching the regexp
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+');
+ regexp_substr
+----------------
+ , zipcode town
+(1 row)
+
+-- return the substring matching the regexp
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+', 24);
+ regexp_substr
+---------------
+ , FR
+(1 row)
+
+-- return the substring matching the regexp at the first occurrence
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+', 1, 1);
+ regexp_substr
+----------------
+ , zipcode town
+(1 row)
+
+-- return the substring matching the regexp at the second occurrence
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+', 1, 2);
+ regexp_substr
+---------------
+ , FR
+(1 row)
+
+-- case sensitivity substring search
+SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1);
+ regexp_substr
+---------------
+ \N
+(1 row)
+
+SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1, 'i');
+ regexp_substr
+----------------
+ , zipcode town
+(1 row)
+
+-- case sensitivity substring search with no capture group
+SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1, 'i', 0);
+ regexp_substr
+----------------
+ , zipcode town
+(1 row)
+
+-- case sensitivity substring search with non existing capture group
+SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1, 'i', 0);
+ regexp_substr
+----------------
+ , zipcode town
+(1 row)
+
+-- return the substring matching the regexp at different occurrence and capture group
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, 1, 'i', 4);
+ regexp_substr
+---------------
+ 78
+(1 row)
+
+SELECT regexp_substr('1234567890 1234557890', '(123)(4(5[56])(78))', 1, 2, 'i', 3);
+ regexp_substr
+---------------
+ 55
+(1 row)
+
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, 1, 'i', 0);
+ regexp_substr
+---------------
+ 12345678
+(1 row)
+
+-- test negative values
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', -1, 1, 'i', 4);
+ERROR: negative start position not allowed.
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, -1, 'i', 4);
+ERROR: occurrence value must be greater than 0.
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, 1, 'i', -4);
+ERROR: number of capture group must be a positive value.
+DROP TABLE IF EXISTS regexp_temp;
-- change NULL-display back
\pset null ''
-- E021-11 position expression
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index 57a48c9d0b..30d44cc6da 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -255,6 +255,110 @@ SELECT regexp_split_to_array('thE QUick bROWn FOx jUMPs ovEr The lazy dOG', 'e',
SELECT foo, length(foo) FROM regexp_split_to_table('thE QUick bROWn FOx jUMPs ovEr The lazy dOG', 'e', 'g') AS foo;
SELECT regexp_split_to_array('thE QUick bROWn FOx jUMPs ovEr The lazy dOG', 'e', 'g');
+-- count all matches for regexp
+SELECT regexp_count('123123123123123', '(12)3');
+-- count all matches with start position
+SELECT regexp_count('123123123123', '123', 0);
+SELECT regexp_count('123123123123', '123', 1);
+SELECT regexp_count('123123123123', '123', 3);
+SELECT regexp_count('123123123123', '123', -3);
+-- count all matches in NULL string with a start position
+SELECT regexp_count(NULL, '123', 3);
+-- count all matches with a start position greater than string length
+SELECT regexp_count('123', '123', 10);
+-- count all matches from different regexp
+SELECT regexp_count('ABC123', '[A-Z]'), regexp_count('A1B2C3', '[A-Z]');
+SELECT regexp_count('ABC123', '[A-Z][0-9]'), regexp_count('A1B2C3', '[A-Z][0-9]');
+SELECT regexp_count('ABC123', '[A-Z][0-9]{2}'), regexp_count('A1B2C3', '[A-Z][0-9]{2}');
+SELECT regexp_count('ABC123', '([A-Z][0-9]){2}'), regexp_count('A1B2C3', '([A-Z][0-9]){2}');
+SELECT regexp_count('ABC123A5', '^[A-Z][0-9]'), regexp_count('A1B2C3', '^[A-Z][0-9]');
+SELECT regexp_count('ABC123', '[A-Z][0-9]{2}'), regexp_count('A1B2C34', '[A-Z][0-9]{2}');
+-- count matches with newline case insensivity
+SELECT regexp_count('a'||CHR(10)||'d', 'a.d');
+SELECT regexp_count('a'||CHR(10)||'d', 'a.d', 1, 's');
+-- count matches with newline case sensivity
+SELECT regexp_count('a'||CHR(10)||'d', 'a.d', 1, 'n');
+-- count not multiline matches
+SELECT regexp_count('a'||CHR(10)||'d', '^d$');
+-- count multiline matches
+SELECT regexp_count('a'||CHR(10)||'d', '^d$', 1, 'm');
+SELECT regexp_count('Hello'||CHR(10)||'world!', '^world!$', 1, 'm'); -- 1
+
+DROP TABLE IF EXISTS regexp_temp;
+CREATE TABLE regexp_temp(fullname varchar(20), email varchar(20));
+INSERT INTO regexp_temp (fullname, email) VALUES ('John Doe', 'johndoe@example.com');
+INSERT INTO regexp_temp (fullname, email) VALUES ('Jane Doe', 'janedoe');
+-- count matches case sensitive
+SELECT fullname, regexp_count(fullname, 'e', 1, 'c') FROM regexp_temp;
+SELECT fullname, regexp_count(fullname, 'D', 1, 'c') FROM regexp_temp;
+SELECT fullname, regexp_count(fullname, 'd', 1, 'c') FROM regexp_temp;
+-- count matches case insensitive
+SELECT fullname, regexp_count(fullname, 'E', 1, 'i') FROM regexp_temp;
+SELECT fullname, regexp_count(fullname, 'do', 1, 'i') FROM regexp_temp;
+
+-- return the start position of the 6th occurence starting at beginning of the string
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[^ ]+', 1, 6);
+-- return the start position of the 5th occurence starting after the first word
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[^ ]+', 7, 5, 0);
+-- return the ending position of the 5th occurence starting after the first word
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[^ ]+', 7, 5, 0);
+-- return the ending position of the 2nd occurence starting after the first word
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[T|Z|S][[:alpha:]]{5}', 7, 2, 1, 'i');
+-- return the starting position corresponding to the different capture group
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 0);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 1);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 2);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 3);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 4);
+-- return the starting position corresponding to a non existant capture group
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 5);
+-- Same but with the ending position
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 0);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 1);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 2);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 3);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 4);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 5);
+-- start position of a valid email
+SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+') "valid_email" FROM regexp_temp;
+-- ending position of a valid email
+SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+', 1, 1, 1) "valid_email" FROM regexp_temp;
+-- start position of first capture group in the email (the dot part)
+SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+', 1, 1, 0, 'i', 1) FROM regexp_temp;
+-- ending position of first capture group in the email (the dot part)
+SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+', 1, 1, 1, 'i', 1) FROM regexp_temp;
+-- test negative values
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', -1, 1, 1, 'i', 1);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, -1, 1, 'i', 1);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, -1, 'i', 1);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', -1);
+
+-- return the substring matching the regexp
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+');
+-- return the substring matching the regexp
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+', 24);
+-- return the substring matching the regexp at the first occurrence
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+', 1, 1);
+-- return the substring matching the regexp at the second occurrence
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+', 1, 2);
+-- case sensitivity substring search
+SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1);
+SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1, 'i');
+-- case sensitivity substring search with no capture group
+SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1, 'i', 0);
+-- case sensitivity substring search with non existing capture group
+SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1, 'i', 0);
+-- return the substring matching the regexp at different occurrence and capture group
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, 1, 'i', 4);
+SELECT regexp_substr('1234567890 1234557890', '(123)(4(5[56])(78))', 1, 2, 'i', 3);
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, 1, 'i', 0);
+-- test negative values
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', -1, 1, 'i', 4);
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, -1, 'i', 4);
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, 1, 'i', -4);
+
+DROP TABLE IF EXISTS regexp_temp;
+
-- change NULL-display back
\pset null ''
Hi,
This is a new version of the patch that now implements all the XQUERY
regexp functions as described in the standard, minus the differences of
PostgerSQL regular expression explain in [1]https://www.postgresql.org/docs/13/functions-matching.html#FUNCTIONS-POSIX-REGEXP.
The standard SQL describe functions like_regex(), occurrences_regex(),
position_regex(), substring_regex() and translate_regex() which
correspond to the commonly named functions regexp_like(),
regexp_count(), regexp_instr(), regexp_substr() and regexp_replace() as
reported by Chapman Flack in [2]/messages/by-id/bf2222d5-909d-408b-8531-95b32f18d4ab@www.fastmail.com. All these function are implemented in
the patch. Syntax of the functions are:
- regexp_like(string, pattern [, flags ])
- regexp_count( string, pattern [, position ] [, flags ])
- regexp_instr( string, pattern [, position ] [, occurrence ] [,
returnopt ] [, flags ] [, group ])
- regexp_substr( string, pattern [, position ] [, occurrence ] [, flags
] [, group ])
- regexp_replace(source, pattern, replacement [, position ] [,
occurrence ] [, flags ])
In addition to previous patch version I have added the regexp()_like
function and extended the existsing regex_replace() function. The patch
documents these functions and adds regression tests for all functions. I
will add it to the commitfest.
An other regexp functions regexp_positions() that returns all
occurrences that matched a POSIX regular expression is also developped
by Joel Jacobson, see [2]/messages/by-id/bf2222d5-909d-408b-8531-95b32f18d4ab@www.fastmail.com. This function expands the list of regexp
functions described in XQUERY.
[1]: https://www.postgresql.org/docs/13/functions-matching.html#FUNCTIONS-POSIX-REGEXP
https://www.postgresql.org/docs/13/functions-matching.html#FUNCTIONS-POSIX-REGEXP
[2]: /messages/by-id/bf2222d5-909d-408b-8531-95b32f18d4ab@www.fastmail.com
/messages/by-id/bf2222d5-909d-408b-8531-95b32f18d4ab@www.fastmail.com
--
Gilles Darold
LzLabs GmbH
http://www.lzlabs.com/
Attachments:
v2-0001-xquery-regexp-functions.patchtext/x-patch; charset=UTF-8; name=v2-0001-xquery-regexp-functions.patchDownload
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index fee0561961..f5f08f1509 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3097,6 +3097,66 @@ repeat('Pg', 4) <returnvalue>PgPgPgPg</returnvalue>
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>regexp_count</primary>
+ </indexterm>
+ <function>regexp_count</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type> [, <parameter>position</parameter> <type>integer</type> ] [, <parameter>flags</parameter> <type>text</type> ] )
+ <returnvalue>integer</returnvalue>
+ </para>
+ <para>
+ Return the number of times a pattern occurs for a match of a POSIX
+ regular expression to the <parameter>string</parameter>; see
+ <xref linkend="functions-posix-regexp"/>.
+ </para>
+ <para>
+ <literal>regexp_count('123456789012', '\d{3}', 3)</literal>
+ <returnvalue>3</returnvalue>
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>regexp_instr</primary>
+ </indexterm>
+ <function>regexp_instr</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type> [, <parameter>position</parameter> <type>integer</type> ] [, <parameter>occurence</parameter> <type>integer</type> ] [, <parameter>returnopt</parameter> <type>integer</type> ] [, <parameter>flags</parameter> <type>text</type> ] [, <parameter>group</parameter> <type>integer</type> ] )
+ <returnvalue>integer</returnvalue>
+ </para>
+ <para>
+ Return the position within <parameter>string</parameter> where the
+ match of a POSIX regular expression occurs. It returns an integer
+ indicating the beginning or ending position of the matched substring,
+ depending on the value of the <parameter>returnopt</parameter> argument
+ (default beginning). If no match is found, then the function returns 0;
+ see <xref linkend="functions-posix-regexp"/>.
+ </para>
+ <para>
+ <literal>regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 4)</literal>
+ <returnvalue>7</returnvalue>
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>regexp_like</primary>
+ </indexterm>
+ <function>regexp_like</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type> [, <parameter>flags</parameter> <type>text</type> ] )
+ <returnvalue>boolean</returnvalue>
+ </para>
+ <para>
+ Evaluate the existence of a match to a POSIX regular expression
+ in <parameter>string</parameter>; see <xref linkend="functions-posix-regexp"/>.
+ </para>
+ <para>
+ <literal>regexp_like('Hello'||chr(10)||'world', '^world$', 'm')</literal>
+ <returnvalue>3</returnvalue>
+ </para></entry>
+ </row>
+
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
@@ -3144,7 +3204,7 @@ repeat('Pg', 4) <returnvalue>PgPgPgPg</returnvalue>
<indexterm>
<primary>regexp_replace</primary>
</indexterm>
- <function>regexp_replace</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type>, <parameter>replacement</parameter> <type>text</type> [, <parameter>flags</parameter> <type>text</type> ] )
+ <function>regexp_replace</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type>, <parameter>replacement</parameter> <type>text</type> [, <parameter>position</parameter> <type>integer</type> ] [, <parameter>occurence</parameter> <type>integer</type> ] [, <parameter>flags</parameter> <type>text</type> ] )
<returnvalue>text</returnvalue>
</para>
<para>
@@ -3157,6 +3217,24 @@ repeat('Pg', 4) <returnvalue>PgPgPgPg</returnvalue>
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>regexp_substr</primary>
+ </indexterm>
+ <function>regexp_substr</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type> [, <parameter>position</parameter> <type>integer</type> ] [, <parameter>occurence</parameter> <type>integer</type> ] [, <parameter>flags</parameter> <type>text</type> ] [, <parameter>group</parameter> <type>integer</type> ] )
+ <returnvalue>text</returnvalue>
+ </para>
+ <para>
+ Return the substring within <parameter>string</parameter> corresponding to the
+ match of a POSIX regular expression; see <xref linkend="functions-posix-regexp"/>.
+ </para>
+ <para>
+ <literal>regexp_substr('1234567890 1234557890', '(123)(4(5[56])(78))', 1, 2, 'i', 3)</literal>
+ <returnvalue>55</returnvalue>
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
@@ -5295,6 +5373,18 @@ substring('foobar' similar '#"o_b#"%' escape '#') <lineannotation>NULL</linea
<indexterm>
<primary>regexp_split_to_array</primary>
</indexterm>
+ <indexterm>
+ <primary>regexp_like</primary>
+ </indexterm>
+ <indexterm>
+ <primary>regexp_count</primary>
+ </indexterm>
+ <indexterm>
+ <primary>regexp_instr</primary>
+ </indexterm>
+ <indexterm>
+ <primary>regexp_substr</primary>
+ </indexterm>
<para>
<xref linkend="functions-posix-table"/> lists the available
@@ -5451,6 +5541,8 @@ substring('foobar' from 'o(.)b') <lineannotation>o</lineannotation>
It has the syntax
<function>regexp_replace</function>(<replaceable>source</replaceable>,
<replaceable>pattern</replaceable>, <replaceable>replacement</replaceable>
+ <optional>, <replaceable>position</replaceable> </optional>
+ <optional>, <replaceable>occurrence</replaceable> </optional>
<optional>, <replaceable>flags</replaceable> </optional>).
The <replaceable>source</replaceable> string is returned unchanged if
there is no match to the <replaceable>pattern</replaceable>. If there is a
@@ -5464,12 +5556,19 @@ substring('foobar' from 'o(.)b') <lineannotation>o</lineannotation>
substring matching the entire pattern should be inserted. Write
<literal>\\</literal> if you need to put a literal backslash in the replacement
text.
- The <replaceable>flags</replaceable> parameter is an optional text
- string containing zero or more single-letter flags that change the
+ <replaceable>pattern</replaceable> is searched in <replaceable>string</replaceable> starting
+ from an optional <replaceable>position</replaceable> or from the beginning
+ of <replaceable>source</replaceable> by default. Optional <replaceable>occurrence</replaceable>
+ parameter indicates which occurrence of <replaceable>pattern</replaceable> in
+ <replaceable>source</replaceable> should be replaced. Default value for <replaceable>occurrence</replaceable>
+ is 1, replace only the first occurrence. The <replaceable>flags</replaceable> parameter
+ is an optional text string containing zero or more single-letter flags that change the
function's behavior. Flag <literal>i</literal> specifies case-insensitive
matching, while flag <literal>g</literal> specifies replacement of each matching
- substring rather than only the first one. Supported flags (though
- not <literal>g</literal>) are
+ substring rather than only the first one. When <replaceable>occurrence</replaceable>
+ is set flag <literal>g</literal> has no effect. If <replaceable>occurrence</replaceable>
+ is set to zero, all occurrences are replaced which is similar to flag <literal>g</literal>.
+ Supported flags (though not <literal>g</literal>) are
described in <xref linkend="posix-embedded-options-table"/>.
</para>
@@ -5482,6 +5581,10 @@ regexp_replace('foobarbaz', 'b..', 'X', 'g')
<lineannotation>fooXX</lineannotation>
regexp_replace('foobarbaz', 'b(..)', 'X\1Y', 'g')
<lineannotation>fooXarYXazY</lineannotation>
+regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 0, 'i')
+ <lineannotation>X PXstgrXSQL fXnctXXn</lineannotation>
+regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 3, 'i')
+ <lineannotation>A PostgrXSQL function</lineannotation>
</programlisting>
</para>
@@ -5669,6 +5772,161 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', '\s*') AS foo;
in practice. Other software systems such as Perl use similar definitions.
</para>
+ <para>
+ The <function>regexp_like</function> function evaluate the existence of a match
+ to a POSIX regular expression in <parameter>string</parameter>; returns a boolean
+ resulting from matching a POSIX regular expression pattern to a string. It has
+ the syntax <function>regexp_like</function>(<replaceable>string</replaceable>,
+ <replaceable>pattern</replaceable> <optional>, <replaceable>flags</replaceable> </optional>).
+ <replaceable>pattern</replaceable> is searched in <replaceable>string</replaceable> starting
+ from the beginning of <replaceable>string</replaceable>.
+ The <replaceable>flags</replaceable> parameter is an optional text string
+ containing zero or more single-letter flags that change the function's behavior.
+ <function>regexp_count</function> accepts all the flags
+ shown in <xref linkend="posix-embedded-options-table"/>.
+ This function is similar to regexp operator <literal>~</literal> when used without
+ <replaceable>flags</replaceable> and similar to operator <literal>~*</literal> when
+ used with <replaceable>flags</replaceable> <literal>'i'</literal> only.
+ </para>
+
+ <para>
+ Some examples:
+<programlisting>
+SELECT 'found' FROM t1 WHERE regexp_like('Hello'||chr(10)||'world', '^world$', 'm');
+ regexp_like
+--------------
+ found
+(1 row)
+</programlisting>
+ </para>
+
+
+ <para>
+ The <function>regexp_count</function> function returns the number of
+ captured substring(s) resulting from matching a POSIX regular
+ expression pattern to a string. It has the syntax <function>regexp_count</function>(
+ <replaceable>string</replaceable>, <replaceable>pattern</replaceable> <optional>,
+ <replaceable>position</replaceable> </optional> <optional>, <replaceable>flags</replaceable> </optional>).
+ <replaceable>pattern</replaceable> is searched in <replaceable>string</replaceable> starting
+ from an optional <replaceable>position</replaceable> or from the beginning of <replaceable>string</replaceable>
+ by default. The <replaceable>flags</replaceable> parameter is an optional text string
+ containing zero or more single-letter flags that change the function's behavior.
+ <function>regexp_count</function> accepts all the flags
+ shown in <xref linkend="posix-embedded-options-table"/>.
+ The <literal>g</literal> flag is forced internally to count all matches.
+ This function returns 0 if there is no match or the number of match as
+ an integer.
+ </para>
+
+ <para>
+ Some examples:
+<programlisting>
+SELECT regexp_count('123123123123', '\d{3}', 1);
+ regexp_count
+--------------
+ 4
+(1 row)
+
+SELECT regexp_count('Hello'||CHR(10)||'world!', '^world!$', 1, 'm');
+ regexp_count
+--------------
+ 1
+(1 row)
+</programlisting>
+ </para>
+
+ <para>
+ The <function>regexp_instr</function> function returns the beginning or ending
+ position of the matched substring resulting from matching a POSIX regular
+ expression pattern to a string. It has the syntax <function>regexp_instr</function>(
+ <replaceable>string</replaceable>, <replaceable>pattern</replaceable> <optional>,
+ <replaceable>position</replaceable> </optional> <optional>, <replaceable>occurrence</replaceable> </optional>
+ <optional>, <replaceable>returnopt</replaceable> </optional>
+ <optional>, <replaceable>flags</replaceable> </optional>
+ <optional>, <replaceable>group</replaceable> </optional>).
+ <replaceable>pattern</replaceable> is searched in <replaceable>string</replaceable> starting
+ from an optional <replaceable>position</replaceable> or from the beginning
+ of <replaceable>string</replaceable> by default. Optional <replaceable>occurrence</replaceable> parameter
+ indicates which occurrence of <replaceable>pattern</replaceable> in <replaceable>string</replaceable>
+ should be searched. When optional <replaceable>returnopt</replaceable> parameter is set to 0 (default)
+ the function returns the position of the first character of the occurrence. When set to 1 returns
+ the position of the character after the occurrence.
+ The <replaceable>flags</replaceable> parameter is an optional text string
+ containing zero or more single-letter flags that change the function's behavior.
+ <function>regexp_count</function> accepts all the flags
+ shown in <xref linkend="posix-embedded-options-table"/>.
+ The <literal>g</literal> flag is forced internally to track all matches.
+ For a pattern with capture groups, <replaceable>group</replaceable> is an integer indicating
+ which capture in pattern is the target of the function. A capture group is a part of the pattern
+ enclosed in parentheses. Capture groups can be nested. They are numbered in order in which their
+ left parentheses appear in pattern. If <replaceable>group</replaceable> is zero, then the position
+ of the entire substring that matches the pattern is returned. If <replaceable>pattern</replaceable>
+ does not have at least <replaceable>group</replaceable> capture group, the function returns zero.
+ This function returns 0 if there is no match or the starting or ending position
+ of match as an integer.
+ </para>
+
+ <para>
+ Some examples:
+<programlisting>
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[^ ]+', 1, 6);
+ regexp_instr
+--------------
+ 32
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 3);
+ regexp_instr
+--------------
+ 5
+(1 row)
+</programlisting>
+ </para>
+
+ <para>
+ The <function>regexp_substr</function> function returns the
+ matched substring resulting from matching a POSIX regular
+ expression pattern to a string. It has the syntax <function>regexp_substr</function>(
+ <replaceable>string</replaceable>, <replaceable>pattern</replaceable> <optional>,
+ <replaceable>position</replaceable> </optional> <optional>, <replaceable>occurrence</replaceable> </optional>
+ <optional>, <replaceable>flags</replaceable> </optional>
+ <optional>, <replaceable>group</replaceable> </optional>).
+ <replaceable>pattern</replaceable> is searched in <replaceable>string</replaceable> starting
+ from an optional <replaceable>position</replaceable> or from the beginning
+ of <replaceable>string</replaceable> by default. Optional parameter <replaceable>occurrence</replaceable>
+ indicates which occurrence of <replaceable>pattern</replaceable> in <replaceable>string</replaceable>
+ should be searched. The <replaceable>flags</replaceable> parameter is an optional text string
+ containing zero or more single-letter flags that change the function's behavior.
+ <function>regexp_count</function> accepts all the flags
+ shown in <xref linkend="posix-embedded-options-table"/>.
+ The <literal>g</literal> flag is forced internally to track all matches.
+ For a pattern with capture groups, optional <replaceable>group</replaceable> is an integer indicating
+ which capture in pattern is the target of the function. A capture group is a part of the pattern
+ enclosed in parentheses. Capture groups can be nested. They are numbered in order in which their
+ left parentheses appear in pattern. If <replaceable>group</replaceable> is zero, then the position
+ of the entire substring that matches the pattern is returned. If <replaceable>pattern</replaceable>
+ does not have at least <replaceable>group</replaceable> capture group, the function returns zero.
+ This function returns NULL if there is no match or the substring of match.
+ </para>
+
+ <para>
+ Some examples:
+<programlisting>
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+', 1, 2);
+ regexp_substr
+---------------
+ , FR
+(1 row)
+
+SELECT regexp_substr('1234567890 1234557890', '(123)(4(5[56])(78))', 1, 2, 'i', 3);
+ regexp_substr
+---------------
+ 55
+(1 row)
+</programlisting>
+ </para>
+
+
<!-- derived from the re_syntax.n man page -->
<sect3 id="posix-syntax-details">
diff --git a/src/backend/utils/adt/regexp.c b/src/backend/utils/adt/regexp.c
index a32c5c82ab..624c57cec9 100644
--- a/src/backend/utils/adt/regexp.c
+++ b/src/backend/utils/adt/regexp.c
@@ -120,6 +120,7 @@ static regexp_matches_ctx *setup_regexp_matches(text *orig_str, text *pattern,
static ArrayType *build_regexp_match_result(regexp_matches_ctx *matchctx);
static Datum build_regexp_split_result(regexp_matches_ctx *splitctx);
+static text *enclose_with_parenthesis(text *str);
/*
* RE_compile_and_cache - compile a RE, caching if possible
@@ -555,7 +556,6 @@ texticregexne(PG_FUNCTION_ARGS)
0, NULL));
}
-
/*
* textregexsubstr()
* Return a substring matched by a regular expression.
@@ -1063,6 +1063,508 @@ regexp_matches_no_flags(PG_FUNCTION_ARGS)
return regexp_matches(fcinfo);
}
+/*
+ * regexp_like()
+ * Return the true if a pattern match within a string.
+ */
+Datum
+regexp_like(PG_FUNCTION_ARGS)
+{
+ text *str = PG_GETARG_TEXT_PP(0);
+ text *pattern = PG_GETARG_TEXT_PP(1);
+ text *flags = PG_GETARG_TEXT_PP_IF_EXISTS(2);
+
+ pg_re_flags re_flags;
+ regexp_matches_ctx *matchctx;
+
+ /* Determine options */
+ parse_re_flags(&re_flags, flags);
+
+ matchctx = setup_regexp_matches(str, pattern, &re_flags,
+ PG_GET_COLLATION(), true, false, false);
+
+ if (matchctx->nmatches > 0)
+ PG_RETURN_BOOL(true);
+
+ PG_RETURN_BOOL(false);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_like_no_flags(PG_FUNCTION_ARGS)
+{
+ return regexp_like(fcinfo);
+}
+
+/*
+ * regexp_count()
+ * Return the number of matches of a pattern within a string.
+ */
+Datum
+regexp_count(PG_FUNCTION_ARGS)
+{
+ text *orig_str = NULL;
+ text *pattern = PG_GETARG_TEXT_PP(1);
+ int start = 1;
+ text *flags = PG_GETARG_TEXT_PP_IF_EXISTS(3);
+
+ pg_re_flags re_flags;
+ regexp_matches_ctx *matchctx;
+
+ if (PG_NARGS() > 2)
+ start = PG_GETARG_INT32(2);
+
+ if (start < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d", "start_position", start)));
+
+ /* regexp_count(string, pattern, start[, flags]) */
+ if (start > 1)
+ orig_str = DatumGetTextPP(DirectFunctionCall2(text_substr_no_len,
+ PG_GETARG_DATUM(0),
+ Int32GetDatum(start)
+ ));
+ else
+ orig_str = PG_GETARG_TEXT_PP(0);
+
+ /* Determine options */
+ parse_re_flags(&re_flags, flags);
+
+ /* this function require flag 'g' */
+ re_flags.glob = true;
+
+ matchctx = setup_regexp_matches(orig_str, pattern, &re_flags,
+ PG_GET_COLLATION(), true, false, false);
+
+ PG_RETURN_INT32(matchctx->nmatches);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_count_no_start(PG_FUNCTION_ARGS)
+{
+ return regexp_count(fcinfo);
+}
+
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_count_no_flags(PG_FUNCTION_ARGS)
+{
+ return regexp_count(fcinfo);
+}
+
+/*
+ * textregexreplace_extended()
+ * Return a string matched by a regular expression, with replacement.
+ * Extends textregexreplace by allowing a start position and the
+ * choice of the occurrence to replace (0 means all occurrences).
+ */
+Datum
+textregexreplace_extended(PG_FUNCTION_ARGS)
+{
+ text *s = NULL;
+ text *p = PG_GETARG_TEXT_PP(1);
+ text *pattern = enclose_with_parenthesis(PG_GETARG_TEXT_PP(1));
+ text *r = PG_GETARG_TEXT_PP(2);
+ int start = 1;
+ int occurrence = 1;
+ int pos = 0;
+ text *flags = PG_GETARG_TEXT_PP_IF_EXISTS(5);
+ StringInfoData str;
+ regex_t *re;
+ pg_re_flags re_flags;
+ regexp_matches_ctx *matchctx;
+ text *after = NULL;
+
+ /* start position */
+ if (PG_NARGS() > 3)
+ start = PG_GETARG_INT32(3);
+ if (start < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d", "start_position", start)));
+ if (start > 1)
+ s = DatumGetTextPP(DirectFunctionCall2(text_substr_no_len,
+ PG_GETARG_DATUM(0),
+ Int32GetDatum(start)
+ ));
+ else
+ s = PG_GETARG_TEXT_PP(0);
+
+ /* occurrence to replace */
+ if (PG_NARGS() > 4)
+ occurrence = PG_GETARG_INT32(4);
+
+ if (occurrence < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d", "occurrence", occurrence)));
+
+ /* Determine options */
+ parse_re_flags(&re_flags, flags);
+ re_flags.glob = true;
+
+ /* lookup for pattern */
+ matchctx = setup_regexp_matches(s, pattern, &re_flags,
+ PG_GET_COLLATION(), true,
+ false, re_flags.glob);
+
+ /* If no match is found, then the function returns the original string */
+ if (matchctx->nmatches == 0)
+ PG_RETURN_DATUM(PG_GETARG_DATUM(0));
+
+ /* Get the position of the occurence to replace */
+ if (PG_NARGS() > 4 && occurrence > 0)
+ {
+ /* When occurrence exceed matches return the original string */
+ if (occurrence > matchctx->nmatches)
+ PG_RETURN_DATUM(PG_GETARG_DATUM(0));
+
+ pos = (occurrence*matchctx->npatterns*2)-(matchctx->npatterns*2);
+ pos = matchctx->match_locs[pos]+start;
+ }
+ else
+ pos = start;
+
+ /* Normal case without explicit VARIADIC marker */
+ initStringInfo(&str);
+
+ /* Get the string before the occurrence starting */
+ if (pos > 1)
+ {
+ text *before = DatumGetTextPP(DirectFunctionCall3(text_substr,
+ PG_GETARG_DATUM(0),
+ Int32GetDatum(1),
+ Int32GetDatum(pos - 1)));
+ appendStringInfoString(&str, TextDatumGetCString(before));
+ }
+
+ /* all occurences must be replaced? */
+ if (occurrence == 0)
+ re_flags.glob = true;
+ else
+ re_flags.glob = false;
+
+ /* Compile the regular expression */
+ re = RE_compile_and_cache(p, re_flags.cflags, PG_GET_COLLATION());
+
+ /* Get the substring starting at the right occurrence position */
+ after = DatumGetTextPP(DirectFunctionCall2(text_substr_no_len,
+ PG_GETARG_DATUM(0),
+ Int32GetDatum(pos)
+ ));
+
+ appendStringInfoString(&str, TextDatumGetCString(replace_text_regexp(
+ after,
+ (void *) re,
+ r,
+ re_flags.glob)));
+
+ PG_RETURN_TEXT_P(CStringGetTextDatum(str.data));
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+textregexreplace_extended_no_occurrence(PG_FUNCTION_ARGS)
+{
+ return textregexreplace_extended(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+textregexreplace_extended_no_flags(PG_FUNCTION_ARGS)
+{
+ return textregexreplace_extended(fcinfo);
+}
+
+/*
+ * Return text string between parenthesis
+ */
+static text *
+enclose_with_parenthesis(text *str)
+{
+ int len = VARSIZE_ANY_EXHDR(str);
+ text *result;
+ char *ptr;
+
+ result = palloc(len + VARHDRSZ + 2);
+ SET_VARSIZE(result, len + VARHDRSZ + 2);
+ ptr = VARDATA(result);
+ memcpy(ptr, "(", 1);
+ memcpy(ptr+1, VARDATA_ANY(str), len);
+ memcpy(ptr+len+1, ")", 1);
+
+ return result;
+}
+
+/*
+ * regexp_instr()
+ * Return the position within the string where the match was located
+ */
+Datum
+regexp_instr(PG_FUNCTION_ARGS)
+{
+ text *orig_str = NULL;
+ text *pattern = NULL;
+ text *flags = PG_GETARG_TEXT_PP_IF_EXISTS(5);
+ int start = 1;
+ int occurrence = 1;
+ int return_opt = 0;
+ int subexpr = 0;
+ int pos;
+
+ pg_re_flags re_flags;
+ regexp_matches_ctx *matchctx;
+
+ /* regexp_instr(string, pattern, start) */
+ if (PG_NARGS() > 2)
+ start = PG_GETARG_INT32(2);
+
+ if (start < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d", "start_position", start)));
+
+ if (start > 1)
+ orig_str = DatumGetTextPP(DirectFunctionCall2(text_substr_no_len,
+ PG_GETARG_DATUM(0),
+ Int32GetDatum(start)
+ ));
+ else
+ /* regexp_instr(string, pattern) */
+ orig_str = PG_GETARG_TEXT_PP(0);
+
+ if (orig_str == NULL)
+ PG_RETURN_NULL();
+
+ /* regexp_instr(string, pattern, start, occurrence) */
+ if (PG_NARGS() > 3)
+ occurrence = PG_GETARG_INT32(3);
+
+ if (occurrence <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d", "occurrence", occurrence)));
+
+ /* regexp_instr(string, pattern, start, occurrence, return_opt) */
+ if (PG_NARGS() > 4)
+ return_opt = PG_GETARG_INT32(4);
+
+ if (return_opt != 0 && return_opt != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d", "return_option", return_opt)));
+
+ /* regexp_instr(string, pattern, start, occurrence, return_opt, flags, subexpr) */
+ if (PG_NARGS() > 6)
+ subexpr = PG_GETARG_INT32(6);
+ if (subexpr < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d", "group", subexpr)));
+
+ /*
+ * If subexpr is zero (default), then the position of the entire
+ * substring that matches the pattern is returned. Otherwise we
+ * will exactly register the subexpressions given in the pattern.
+ * Enclose pattern between parenthesis to register the position
+ * of the entire substring.
+ */
+ pattern = enclose_with_parenthesis(PG_GETARG_TEXT_PP(1));
+
+ /* Determine options */
+ parse_re_flags(&re_flags, flags);
+ /* this function require flag 'g' */
+ re_flags.glob = true;
+
+ matchctx = setup_regexp_matches(orig_str, pattern, &re_flags,
+ PG_GET_COLLATION(), true, false, false);
+
+ /* If no match is found, then the function returns 0 */
+ if (matchctx->nmatches == 0)
+ PG_RETURN_INT32(0);
+
+ /* When occurrence exceed matches return 0 */
+ if (occurrence > matchctx->nmatches)
+ PG_RETURN_INT32(0);
+
+ /* When subexpr exceed number of subexpression return 0 */
+ if (subexpr > matchctx->npatterns - 1)
+ PG_RETURN_INT32(0);
+
+ /*
+ * Returns the position of the first character of the occurrence
+ * or for subexpression in this occurrence.
+ */
+ pos = (occurrence*matchctx->npatterns*2)-((matchctx->npatterns-subexpr)*2);
+ if (return_opt == 1)
+ pos += 1;
+
+ PG_RETURN_INT32(matchctx->match_locs[pos]+start);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_instr_no_start(PG_FUNCTION_ARGS)
+{
+ return regexp_instr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_instr_no_occurrence(PG_FUNCTION_ARGS)
+{
+ return regexp_instr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_instr_no_return_opt(PG_FUNCTION_ARGS)
+{
+ return regexp_instr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_instr_no_flags(PG_FUNCTION_ARGS)
+{
+ return regexp_instr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_instr_no_subexpr(PG_FUNCTION_ARGS)
+{
+ return regexp_instr(fcinfo);
+}
+
+/*
+ * regexp_substr()
+ * Return the substring within the string that match a regular
+ * expression pattern
+ */
+Datum
+regexp_substr(PG_FUNCTION_ARGS)
+{
+ text *orig_str = NULL;
+ text *pattern = NULL;
+ text *flags = PG_GETARG_TEXT_PP_IF_EXISTS(4);
+ int start = 1;
+ int occurrence = 1;
+ int subexpr = 0;
+ int so, eo, pos;
+
+ pg_re_flags re_flags;
+ regexp_matches_ctx *matchctx;
+
+ if (PG_NARGS() > 2)
+ start = PG_GETARG_INT32(2);
+
+ if (start < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d", "start_position", start)));
+
+ /* regexp_substr(string, pattern, start) */
+ if (start > 1)
+ orig_str = DatumGetTextPP(DirectFunctionCall2(text_substr_no_len,
+ PG_GETARG_DATUM(0),
+ Int32GetDatum(start)
+ ));
+ else
+ /* regexp_substr(string, pattern) */
+ orig_str = PG_GETARG_TEXT_PP(0);
+
+ if (orig_str == NULL)
+ PG_RETURN_NULL();
+
+ /* regexp_substr(string, pattern, start, occurrence) */
+ if (PG_NARGS() > 3)
+ occurrence = PG_GETARG_INT32(3);
+
+ if (occurrence <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d", "occurrence", occurrence)));
+
+ /* regexp_substr(string, pattern, start, occurrence, flags, subexpr) */
+ if (PG_NARGS() > 5)
+ subexpr = PG_GETARG_INT32(5);
+ if (subexpr < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d", "group", subexpr)));
+
+ /*
+ * If subexpr is zero (default), then the position of the entire
+ * substring that matches the pattern is returned. Otherwise we
+ * will exactly register the subexpressions given in the pattern.
+ * Enclose pattern between parenthesis to register the position
+ * of the entire substring.
+ */
+ pattern = enclose_with_parenthesis(PG_GETARG_TEXT_PP(1));
+
+ /* Determine options */
+ parse_re_flags(&re_flags, flags);
+ /* this function require flag 'g' */
+ re_flags.glob = true;
+
+ matchctx = setup_regexp_matches(orig_str, pattern, &re_flags,
+ PG_GET_COLLATION(), true, false, false);
+
+ /* If no match is found, then the function returns NULL */
+ if (matchctx->nmatches == 0)
+ PG_RETURN_NULL();
+
+ /* When occurrence exceed matches return NULL */
+ if (occurrence > matchctx->nmatches)
+ PG_RETURN_NULL();
+
+ /* When subexpr exceed number of subexpression return NULL */
+ if (subexpr > matchctx->npatterns - 1)
+ PG_RETURN_NULL();
+
+ /* Returns the substring corresponding to the occurrence. */
+ pos = (occurrence*matchctx->npatterns*2)-((matchctx->npatterns-subexpr)*2);
+ so = matchctx->match_locs[pos]+1;
+ eo = matchctx->match_locs[pos+1]+1;
+
+ PG_RETURN_DATUM(DirectFunctionCall3(text_substr,
+ PointerGetDatum(matchctx->orig_str),
+ Int32GetDatum(so),
+ Int32GetDatum(eo - so)));
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_substr_no_start(PG_FUNCTION_ARGS)
+{
+ return regexp_substr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_substr_no_occurrence(PG_FUNCTION_ARGS)
+{
+ return regexp_substr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_substr_no_flags(PG_FUNCTION_ARGS)
+{
+ return regexp_substr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_substr_no_subexpr(PG_FUNCTION_ARGS)
+{
+ return regexp_substr(fcinfo);
+}
+
/*
* setup_regexp_matches --- do the initial matching for regexp_match
* and regexp_split functions
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 59d2b71ca9..fd760ef861 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -3560,6 +3560,63 @@
proname => 'regexp_matches', prorows => '10', proretset => 't',
prorettype => '_text', proargtypes => 'text text text',
prosrc => 'regexp_matches' },
+{ oid => '9614', descr => 'count match(es) for regexp',
+ proname => 'regexp_count', prorettype => 'int4', proargtypes => 'text text',
+ prosrc => 'regexp_count_no_start' },
+{ oid => '9615', descr => 'count match(es) for regexp',
+ proname => 'regexp_count', prorettype => 'int4', proargtypes => 'text text int4',
+ prosrc => 'regexp_count_no_flags' },
+{ oid => '9616', descr => 'count match(es) for regexp',
+ proname => 'regexp_count', prorettype => 'int4',
+ proargtypes => 'text text int4 text', prosrc => 'regexp_count' },
+{ oid => '9617', descr => 'position where the match for regexp was located',
+ proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text',
+ prosrc => 'regexp_instr_no_start' },
+{ oid => '9618', descr => 'position where the match for regexp was located',
+ proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text int4',
+ prosrc => 'regexp_instr_no_occurrence' },
+{ oid => '9619', descr => 'position where the match for regexp was located',
+ proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text int4 int4',
+ prosrc => 'regexp_instr_no_return_opt' },
+{ oid => '9620', descr => 'position where the match for regexp was located',
+ proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text int4 int4 int4',
+ prosrc => 'regexp_instr_no_flags' },
+{ oid => '9621', descr => 'position where the match for regexp was located',
+ proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text int4 int4 int4 text',
+ prosrc => 'regexp_instr_no_subexpr' },
+{ oid => '9622', descr => 'position where the match for regexp was located',
+ proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text int4 int4 int4 text int4',
+ prosrc => 'regexp_instr' },
+{ oid => '9623', descr => 'substring that match the regexp pattern',
+ proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text',
+ prosrc => 'regexp_substr_no_start' },
+{ oid => '9624', descr => 'substring that match the regexp pattern',
+ proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text int4',
+ prosrc => 'regexp_substr_no_occurrence' },
+{ oid => '9625', descr => 'substring that match the regexp pattern',
+ proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text int4 int4',
+ prosrc => 'regexp_substr_no_flags' },
+{ oid => '9626', descr => 'substring that match the regexp pattern',
+ proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text int4 int4 text',
+ prosrc => 'regexp_substr_no_subexpr' },
+{ oid => '9627', descr => 'substring that match the regexp pattern',
+ proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text int4 int4 text int4',
+ prosrc => 'regexp_substr' },
+{ oid => '9628', descr => 'evaluate match(es) for regexp',
+ proname => 'regexp_like', prorettype => 'bool', proargtypes => 'text text',
+ prosrc => 'regexp_like_no_flags' },
+{ oid => '9629', descr => 'evaluate match(es) for regexp',
+ proname => 'regexp_like', prorettype => 'bool',
+ proargtypes => 'text text text', prosrc => 'regexp_like' },
+{ oid => '9606', descr => 'replace text using regexp',
+ proname => 'regexp_replace', prorettype => 'text',
+ proargtypes => 'text text text int4 int4 text', prosrc => 'textregexreplace_extended' },
+{ oid => '9607', descr => 'replace text using regexp',
+ proname => 'regexp_replace', prorettype => 'text',
+ proargtypes => 'text text text int4 int4', prosrc => 'textregexreplace_extended_no_flags' },
+{ oid => '9608', descr => 'replace text using regexp',
+ proname => 'regexp_replace', prorettype => 'text',
+ proargtypes => 'text text text int4', prosrc => 'textregexreplace_extended_no_occurrence' },
{ oid => '2088', descr => 'split string by field_sep and return field_num',
proname => 'split_part', prorettype => 'text',
proargtypes => 'text text int4', prosrc => 'split_part' },
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index fb4573d85f..b9cef32e9e 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -905,6 +905,751 @@ SELECT foo, length(foo) FROM regexp_split_to_table('thE QUick bROWn FOx jUMPs ov
ERROR: regexp_split_to_table() does not support the "global" option
SELECT regexp_split_to_array('thE QUick bROWn FOx jUMPs ovEr The lazy dOG', 'e', 'g');
ERROR: regexp_split_to_array() does not support the "global" option
+-- regexp_like tests
+SELECT regexp_like('a'||CHR(10)||'d', 'a.d');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT regexp_like('a'||CHR(10)||'d', 'a.d', 'm');
+ regexp_like
+-------------
+ f
+(1 row)
+
+SELECT regexp_like('a'||CHR(10)||'d', 'a.d', 'n');
+ regexp_like
+-------------
+ f
+(1 row)
+
+SELECT regexp_like('Steven', '^Ste(v|ph)en$');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT regexp_like('foo' || chr(10) || 'bar' || chr(10) || 'bequq' || chr(10) || 'baz', '^bar');
+ regexp_like
+-------------
+ f
+(1 row)
+
+SELECT regexp_like('foo' || chr(10) || 'bar' || chr(10) || 'bequq' || chr(10) || 'baz', 'bar');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT regexp_like('foo' || chr(10) || 'bar' || chr(10) || 'bequq' || chr(10) || 'baz', '^bar', 'm');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT regexp_like('foo' || chr(10) || 'bar' || chr(10) || 'bequq' || chr(10) || 'baz', '^bar', 'n');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT regexp_like('GREEN', '([aeiou])\1');
+ regexp_like
+-------------
+ f
+(1 row)
+
+SELECT regexp_like('GREEN', '([aeiou])\1', 'i');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '([aeiou])\1', 'i');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '^..([aeiou])\1', 'i');
+ regexp_like
+-------------
+ f
+(1 row)
+
+SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '([aeiou])\1', 'in');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '^..([aeiou])\1', 'in');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '^..([aeiou])\1', 'im');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT REGEXP_LIKE('abc', 'a b c');
+ regexp_like
+-------------
+ f
+(1 row)
+
+SELECT REGEXP_LIKE('abc', 'a b c','x');
+ regexp_like
+-------------
+ t
+(1 row)
+
+-- count all matches for regexp
+SELECT regexp_count('123123123123123', '(12)3');
+ regexp_count
+--------------
+ 5
+(1 row)
+
+-- count all matches with start position
+SELECT regexp_count('123123123123', '123', 0);
+ regexp_count
+--------------
+ 4
+(1 row)
+
+SELECT regexp_count('123123123123', '123', 1);
+ regexp_count
+--------------
+ 4
+(1 row)
+
+SELECT regexp_count('123123123123', '123', 3);
+ regexp_count
+--------------
+ 3
+(1 row)
+
+SELECT regexp_count('123123123123', '123', -3);
+ERROR: invalid value for parameter "start_position": -3
+-- count all matches in NULL string with a start position
+SELECT regexp_count(NULL, '123', 3);
+ regexp_count
+--------------
+ \N
+(1 row)
+
+-- count all matches with a start position greater than string length
+SELECT regexp_count('123', '123', 10);
+ regexp_count
+--------------
+ 0
+(1 row)
+
+-- count all matches from different regexp
+SELECT regexp_count('ABC123', '[A-Z]'), regexp_count('A1B2C3', '[A-Z]');
+ regexp_count | regexp_count
+--------------+--------------
+ 3 | 3
+(1 row)
+
+SELECT regexp_count('ABC123', '[A-Z][0-9]'), regexp_count('A1B2C3', '[A-Z][0-9]');
+ regexp_count | regexp_count
+--------------+--------------
+ 1 | 3
+(1 row)
+
+SELECT regexp_count('ABC123', '[A-Z][0-9]{2}'), regexp_count('A1B2C3', '[A-Z][0-9]{2}');
+ regexp_count | regexp_count
+--------------+--------------
+ 1 | 0
+(1 row)
+
+SELECT regexp_count('ABC123', '([A-Z][0-9]){2}'), regexp_count('A1B2C3', '([A-Z][0-9]){2}');
+ regexp_count | regexp_count
+--------------+--------------
+ 0 | 1
+(1 row)
+
+SELECT regexp_count('ABC123A5', '^[A-Z][0-9]'), regexp_count('A1B2C3', '^[A-Z][0-9]');
+ regexp_count | regexp_count
+--------------+--------------
+ 0 | 1
+(1 row)
+
+SELECT regexp_count('ABC123', '[A-Z][0-9]{2}'), regexp_count('A1B2C34', '[A-Z][0-9]{2}');
+ regexp_count | regexp_count
+--------------+--------------
+ 1 | 1
+(1 row)
+
+-- count matches with newline case insensivity
+SELECT regexp_count('a'||CHR(10)||'d', 'a.d');
+ regexp_count
+--------------
+ 1
+(1 row)
+
+SELECT regexp_count('a'||CHR(10)||'d', 'a.d', 1, 's');
+ regexp_count
+--------------
+ 1
+(1 row)
+
+-- count matches with newline case sensivity
+SELECT regexp_count('a'||CHR(10)||'d', 'a.d', 1, 'n');
+ regexp_count
+--------------
+ 0
+(1 row)
+
+-- count not multiline matches
+SELECT regexp_count('a'||CHR(10)||'d', '^d$');
+ regexp_count
+--------------
+ 0
+(1 row)
+
+-- count multiline matches
+SELECT regexp_count('a'||CHR(10)||'d', '^d$', 1, 'm');
+ regexp_count
+--------------
+ 1
+(1 row)
+
+SELECT regexp_count('Hello'||CHR(10)||'world!', '^world!$', 1, 'm'); -- 1
+ regexp_count
+--------------
+ 1
+(1 row)
+
+-- Count the number of occurrences of the substring an in the string.
+SELECT regexp_count('a man, a plan, a canal: Panama', 'an');
+ regexp_count
+--------------
+ 4
+(1 row)
+
+-- Find the number of occurrences of the substring an in the string starting with the fifth character.
+SELECT regexp_count('a man, a plan, a canal: Panama', 'an',5);
+ regexp_count
+--------------
+ 3
+(1 row)
+
+-- Find the number of occurrences of a substring containing a lower-case character
+-- followed by an. In the first example, do not use a modifier. In the second example,
+-- use the i modifier to force the regular expression to ignore case.
+SELECT regexp_count('a man, a plan, a canal: Panama', '[a-z]an');
+ regexp_count
+--------------
+ 3
+(1 row)
+
+SELECT regexp_count('a man, a plan, a canal: Panama', '[a-z]an', 1, 'i');
+ regexp_count
+--------------
+ 4
+(1 row)
+
+DROP TABLE IF EXISTS regexp_temp;
+NOTICE: table "regexp_temp" does not exist, skipping
+CREATE TABLE regexp_temp(fullname varchar(20), email varchar(20));
+INSERT INTO regexp_temp (fullname, email) VALUES ('John Doe', 'johndoe@example.com');
+INSERT INTO regexp_temp (fullname, email) VALUES ('Jane Doe', 'janedoe');
+-- count matches case sensitive
+SELECT fullname, regexp_count(fullname, 'e', 1, 'c') FROM regexp_temp;
+ fullname | regexp_count
+----------+--------------
+ John Doe | 1
+ Jane Doe | 2
+(2 rows)
+
+SELECT fullname, regexp_count(fullname, 'D', 1, 'c') FROM regexp_temp;
+ fullname | regexp_count
+----------+--------------
+ John Doe | 1
+ Jane Doe | 1
+(2 rows)
+
+SELECT fullname, regexp_count(fullname, 'd', 1, 'c') FROM regexp_temp;
+ fullname | regexp_count
+----------+--------------
+ John Doe | 0
+ Jane Doe | 0
+(2 rows)
+
+-- count matches case insensitive
+SELECT fullname, regexp_count(fullname, 'E', 1, 'i') FROM regexp_temp;
+ fullname | regexp_count
+----------+--------------
+ John Doe | 1
+ Jane Doe | 2
+(2 rows)
+
+SELECT fullname, regexp_count(fullname, 'do', 1, 'i') FROM regexp_temp;
+ fullname | regexp_count
+----------+--------------
+ John Doe | 1
+ Jane Doe | 1
+(2 rows)
+
+-- return the start position of the 6th occurence starting at beginning of the string
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[^ ]+', 1, 6);
+ regexp_instr
+--------------
+ 32
+(1 row)
+
+-- return the start position of the 5th occurence starting after the first word
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[^ ]+', 7, 5, 0);
+ regexp_instr
+--------------
+ 32
+(1 row)
+
+-- return the ending position of the 5th occurence starting after the first word
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[^ ]+', 7, 5, 0);
+ regexp_instr
+--------------
+ 32
+(1 row)
+
+-- return the ending position of the 2nd occurence starting after the first word
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[T|Z|S][[:alpha:]]{5}', 7, 2, 1, 'i');
+ regexp_instr
+--------------
+ 30
+(1 row)
+
+-- return the starting position corresponding to the different capture group
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 0);
+ regexp_instr
+--------------
+ 1
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 1);
+ regexp_instr
+--------------
+ 1
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 2);
+ regexp_instr
+--------------
+ 4
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 3);
+ regexp_instr
+--------------
+ 5
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 4);
+ regexp_instr
+--------------
+ 7
+(1 row)
+
+-- return the starting position corresponding to a non existant capture group
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 5);
+ regexp_instr
+--------------
+ 0
+(1 row)
+
+-- Same but with the ending position
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 0);
+ regexp_instr
+--------------
+ 9
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 1);
+ regexp_instr
+--------------
+ 4
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 2);
+ regexp_instr
+--------------
+ 9
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 3);
+ regexp_instr
+--------------
+ 7
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 4);
+ regexp_instr
+--------------
+ 9
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 5);
+ regexp_instr
+--------------
+ 0
+(1 row)
+
+-- start position of a valid email
+SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+') "valid_email" FROM regexp_temp;
+ email | valid_email
+---------------------+-------------
+ johndoe@example.com | 1
+ janedoe | 0
+(2 rows)
+
+-- ending position of a valid email
+SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+', 1, 1, 1) "valid_email" FROM regexp_temp;
+ email | valid_email
+---------------------+-------------
+ johndoe@example.com | 20
+ janedoe | 0
+(2 rows)
+
+-- start position of first capture group in the email (the dot part)
+SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+', 1, 1, 0, 'i', 1) FROM regexp_temp;
+ email | regexp_instr
+---------------------+--------------
+ johndoe@example.com | 16
+ janedoe | 0
+(2 rows)
+
+-- ending position of first capture group in the email (the dot part)
+SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+', 1, 1, 1, 'i', 1) FROM regexp_temp;
+ email | regexp_instr
+---------------------+--------------
+ johndoe@example.com | 20
+ janedoe | 0
+(2 rows)
+
+-- test negative values
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', -1, 1, 1, 'i', 1);
+ERROR: invalid value for parameter "start_position": -1
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, -1, 1, 'i', 1);
+ERROR: invalid value for parameter "occurrence": -1
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, -1, 'i', 1);
+ERROR: invalid value for parameter "return_option": -1
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', -1);
+ERROR: invalid value for parameter "group": -1
+-- Find the first occurrence of a sequence of letters starting with the letter e
+-- and ending with the letter y in the phrase "easy come, easy go."
+SELECT regexp_instr('easy come, easy go','e\w*y');
+ regexp_instr
+--------------
+ 1
+(1 row)
+
+-- Find the first sequence of letters starting with the letter e and ending with
+-- the letter y in the string "easy come, easy go" starting at the second character
+SELECT regexp_instr('easy come, easy go','e\w*y',2);
+ regexp_instr
+--------------
+ 12
+(1 row)
+
+-- Find the second sequence of letters starting with the letter e and ending with
+-- the letter y in the string "easy come, easy go" starting at the first character.
+SELECT regexp_instr('easy come, easy go','e\w*y',1,2);
+ regexp_instr
+--------------
+ 12
+(1 row)
+
+-- Find the position of the first character after the first whitespace in the string "easy come, easy go."
+SELECT regexp_instr('easy come, easy go','\s',1,1,1);
+ regexp_instr
+--------------
+ 6
+(1 row)
+
+-- Find the position of the start of the third word in a string by capturing each
+-- word as a subexpression, and returning the third subexpression's start position.
+SELECT regexp_instr('one two three','(\w+)\s+(\w+)\s+(\w+)', 1,1,0,'',3);
+ regexp_instr
+--------------
+ 9
+(1 row)
+
+-- return the substring matching the regexp
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+');
+ regexp_substr
+----------------
+ , zipcode town
+(1 row)
+
+-- return the substring matching the regexp
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+', 24);
+ regexp_substr
+---------------
+ , FR
+(1 row)
+
+-- return the substring matching the regexp at the first occurrence
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+', 1, 1);
+ regexp_substr
+----------------
+ , zipcode town
+(1 row)
+
+-- return the substring matching the regexp at the second occurrence
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+', 1, 2);
+ regexp_substr
+---------------
+ , FR
+(1 row)
+
+-- case sensitivity substring search
+SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1);
+ regexp_substr
+---------------
+ \N
+(1 row)
+
+SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1, 'i');
+ regexp_substr
+----------------
+ , zipcode town
+(1 row)
+
+-- case sensitivity substring search with no capture group
+SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1, 'i', 0);
+ regexp_substr
+----------------
+ , zipcode town
+(1 row)
+
+-- case sensitivity substring search with non existing capture group
+SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1, 'i', 0);
+ regexp_substr
+----------------
+ , zipcode town
+(1 row)
+
+-- return the substring matching the regexp at different occurrence and capture group
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, 1, 'i', 4);
+ regexp_substr
+---------------
+ 78
+(1 row)
+
+SELECT regexp_substr('1234567890 1234557890', '(123)(4(5[56])(78))', 1, 2, 'i', 3);
+ regexp_substr
+---------------
+ 55
+(1 row)
+
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, 1, 'i', 0);
+ regexp_substr
+---------------
+ 12345678
+(1 row)
+
+-- test negative values
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', -1, 1, 'i', 4);
+ERROR: invalid value for parameter "start_position": -1
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, -1, 'i', 4);
+ERROR: invalid value for parameter "occurrence": -1
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, 1, 'i', -4);
+ERROR: invalid value for parameter "group": -4
+-- Select the first substring of letters that end with "thy."
+SELECT regexp_substr('healthy, wealthy, and wise','\w+thy');
+ regexp_substr
+---------------
+ healthy
+(1 row)
+
+-- Select the first substring of letters that ends with "thy" starting at the second character in the string.
+SELECT regexp_substr('healthy, wealthy, and wise','\w+thy',2);
+ regexp_substr
+---------------
+ ealthy
+(1 row)
+
+-- Return the contents of the third captured subexpression, which captures the third word in the string.
+SELECT regexp_substr('one two three', '(\w+)\s+(\w+)\s+(\w+)', 1, 1, '', 3);
+ regexp_substr
+---------------
+ three
+(1 row)
+
+DROP TABLE IF EXISTS regexp_temp;
+-- Regression tests for extended regexp_replace() function with start position and occurrence
+SELECT regexp_replace('512.123.4567', '([[:digit:]]{3})\.([[:digit:]]{3})\.([[:digit:]]{4})', '(\1) \2-\3', 1);
+ regexp_replace
+----------------
+ (512) 123-4567
+(1 row)
+
+SELECT regexp_replace('512.123.4567 612.123.4567', '([[:digit:]]{3})\.([[:digit:]]{3})\.([[:digit:]]{4})', '(\1) \2-\3', 1, 0);
+ regexp_replace
+-------------------------------
+ (512) 123-4567 (612) 123-4567
+(1 row)
+
+SELECT regexp_replace('number your street,'||CHR(10)||' zipcode town, FR', '( ){2,}', ' ', 1, 0);
+ regexp_replace
+---------------------
+ number your street,+
+ zipcode town, FR
+(1 row)
+
+SELECT regexp_replace('number your street, zipcode town, FR', '( ){2,}', ' ', 9);
+ regexp_replace
+--------------------------------------------
+ number your street, zipcode town, FR
+(1 row)
+
+SELECT regexp_replace('number your street, zipcode town, FR', '( ){2,}', ' ', 9, 0);
+ regexp_replace
+----------------------------------------
+ number your street, zipcode town, FR
+(1 row)
+
+SELECT regexp_replace('number your street, zipcode town, FR', '( ){2,}', ' ', 9, 2);
+ regexp_replace
+---------------------------------------------
+ number your street, zipcode town, FR
+(1 row)
+
+SELECT regexp_replace('number your street, zipcode town, FR', '( ){2,}', ' ', 9, 2, 'm');
+ regexp_replace
+---------------------------------------------
+ number your street, zipcode town, FR
+(1 row)
+
+SELECT regexp_replace('number your street, zipcode town, FR', '([EURT]){2,}', '[\1]', 9, 1, 'i');
+ regexp_replace
+-------------------------------------------------
+ number yo[r] street, zipcode town, FR
+(1 row)
+
+SELECT regexp_replace('number your street, zipcode town, FR', '([EURT]){2,}', '[\1]', 9, 2, 'i');
+ regexp_replace
+----------------------------------------------
+ number your s[t], zipcode town, FR
+(1 row)
+
+SELECT regexp_replace ('A PostgreSQL function', 'A|e|i|o|u', 'X', 1, 2);
+ regexp_replace
+-----------------------
+ A PXstgreSQL function
+(1 row)
+
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 0, 'i');
+ regexp_replace
+-----------------------
+ X PXstgrXSQL fXnctXXn
+(1 row)
+
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 1, 'i');
+ regexp_replace
+-----------------------
+ X PostgreSQL function
+(1 row)
+
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 2, 'i');
+ regexp_replace
+-----------------------
+ A PXstgreSQL function
+(1 row)
+
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 3, 'i');
+ regexp_replace
+-----------------------
+ A PostgrXSQL function
+(1 row)
+
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 9, 'i');
+ regexp_replace
+-----------------------
+ A PostgreSQL function
+(1 row)
+
+SELECT regexp_replace ('A PostgreSQL function', 'A|e|i|o|u', 'X', 1, 9);
+ regexp_replace
+-----------------------
+ A PostgreSQL function
+(1 row)
+
+-- Invalid parameter values
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', -1, 0, 'i');
+ERROR: invalid value for parameter "start_position": -1
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, -1, 'i');
+ERROR: invalid value for parameter "occurrence": -1
+-- Modifier 'g' should not be taken in account, we have an occurrence to replace
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 1, 'g');
+ regexp_replace
+-----------------------
+ A PXstgreSQL function
+(1 row)
+
+-- Find groups of "word characters" (letters, numbers and underscore) ending with
+-- "thy" in the string "healthy, wealthy, and wise" and replace them with nothing.
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', '', 'g');
+ regexp_replace
+----------------
+ , , and wise
+(1 row)
+
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', '', 1, 0);
+ regexp_replace
+----------------
+ , , and wise
+(1 row)
+
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', '', 1, 0, 'g');
+ regexp_replace
+----------------
+ , , and wise
+(1 row)
+
+-- Find groups of word characters ending with "thy" and replace with "something."
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', 'something', 'g');
+ regexp_replace
+--------------------------------
+ something, something, and wise
+(1 row)
+
+-- Find groups of word characters ending with "thy" and replace with the string
+-- "something" starting at the third character in the string.
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', 'something', 3, 0);
+ regexp_replace
+----------------------------------
+ hesomething, something, and wise
+(1 row)
+
+-- Replace the second group of word characters ending with "thy" with "something."
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', 'something', 1, 2);
+ regexp_replace
+------------------------------
+ healthy, something, and wise
+(1 row)
+
+-- Find groups of word characters ending with "thy" capturing the letters before
+-- the "thy", and replace with the captured letters plus the letters "ish."
+SELECT regexp_replace('healthy, wealthy, and wise','(\w+)thy', '\1ish', 'g');
+ regexp_replace
+----------------------------
+ healish, wealish, and wise
+(1 row)
+
+SELECT regexp_replace('healthy, wealthy, and wise','(\w+)thy', '\1ish', 1, 0);
+ regexp_replace
+----------------------------
+ healish, wealish, and wise
+(1 row)
+
-- change NULL-display back
\pset null ''
-- E021-11 position expression
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index 57a48c9d0b..7623e85f1e 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -255,6 +255,199 @@ SELECT regexp_split_to_array('thE QUick bROWn FOx jUMPs ovEr The lazy dOG', 'e',
SELECT foo, length(foo) FROM regexp_split_to_table('thE QUick bROWn FOx jUMPs ovEr The lazy dOG', 'e', 'g') AS foo;
SELECT regexp_split_to_array('thE QUick bROWn FOx jUMPs ovEr The lazy dOG', 'e', 'g');
+-- regexp_like tests
+SELECT regexp_like('a'||CHR(10)||'d', 'a.d');
+SELECT regexp_like('a'||CHR(10)||'d', 'a.d', 'm');
+SELECT regexp_like('a'||CHR(10)||'d', 'a.d', 'n');
+SELECT regexp_like('Steven', '^Ste(v|ph)en$');
+SELECT regexp_like('foo' || chr(10) || 'bar' || chr(10) || 'bequq' || chr(10) || 'baz', '^bar');
+SELECT regexp_like('foo' || chr(10) || 'bar' || chr(10) || 'bequq' || chr(10) || 'baz', 'bar');
+SELECT regexp_like('foo' || chr(10) || 'bar' || chr(10) || 'bequq' || chr(10) || 'baz', '^bar', 'm');
+SELECT regexp_like('foo' || chr(10) || 'bar' || chr(10) || 'bequq' || chr(10) || 'baz', '^bar', 'n');
+SELECT regexp_like('GREEN', '([aeiou])\1');
+SELECT regexp_like('GREEN', '([aeiou])\1', 'i');
+SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '([aeiou])\1', 'i');
+SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '^..([aeiou])\1', 'i');
+SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '([aeiou])\1', 'in');
+SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '^..([aeiou])\1', 'in');
+SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '^..([aeiou])\1', 'im');
+SELECT REGEXP_LIKE('abc', 'a b c');
+SELECT REGEXP_LIKE('abc', 'a b c','x');
+
+-- count all matches for regexp
+SELECT regexp_count('123123123123123', '(12)3');
+-- count all matches with start position
+SELECT regexp_count('123123123123', '123', 0);
+SELECT regexp_count('123123123123', '123', 1);
+SELECT regexp_count('123123123123', '123', 3);
+SELECT regexp_count('123123123123', '123', -3);
+-- count all matches in NULL string with a start position
+SELECT regexp_count(NULL, '123', 3);
+-- count all matches with a start position greater than string length
+SELECT regexp_count('123', '123', 10);
+-- count all matches from different regexp
+SELECT regexp_count('ABC123', '[A-Z]'), regexp_count('A1B2C3', '[A-Z]');
+SELECT regexp_count('ABC123', '[A-Z][0-9]'), regexp_count('A1B2C3', '[A-Z][0-9]');
+SELECT regexp_count('ABC123', '[A-Z][0-9]{2}'), regexp_count('A1B2C3', '[A-Z][0-9]{2}');
+SELECT regexp_count('ABC123', '([A-Z][0-9]){2}'), regexp_count('A1B2C3', '([A-Z][0-9]){2}');
+SELECT regexp_count('ABC123A5', '^[A-Z][0-9]'), regexp_count('A1B2C3', '^[A-Z][0-9]');
+SELECT regexp_count('ABC123', '[A-Z][0-9]{2}'), regexp_count('A1B2C34', '[A-Z][0-9]{2}');
+-- count matches with newline case insensivity
+SELECT regexp_count('a'||CHR(10)||'d', 'a.d');
+SELECT regexp_count('a'||CHR(10)||'d', 'a.d', 1, 's');
+-- count matches with newline case sensivity
+SELECT regexp_count('a'||CHR(10)||'d', 'a.d', 1, 'n');
+-- count not multiline matches
+SELECT regexp_count('a'||CHR(10)||'d', '^d$');
+-- count multiline matches
+SELECT regexp_count('a'||CHR(10)||'d', '^d$', 1, 'm');
+SELECT regexp_count('Hello'||CHR(10)||'world!', '^world!$', 1, 'm'); -- 1
+-- Count the number of occurrences of the substring an in the string.
+SELECT regexp_count('a man, a plan, a canal: Panama', 'an');
+-- Find the number of occurrences of the substring an in the string starting with the fifth character.
+SELECT regexp_count('a man, a plan, a canal: Panama', 'an',5);
+-- Find the number of occurrences of a substring containing a lower-case character
+-- followed by an. In the first example, do not use a modifier. In the second example,
+-- use the i modifier to force the regular expression to ignore case.
+SELECT regexp_count('a man, a plan, a canal: Panama', '[a-z]an');
+SELECT regexp_count('a man, a plan, a canal: Panama', '[a-z]an', 1, 'i');
+
+DROP TABLE IF EXISTS regexp_temp;
+CREATE TABLE regexp_temp(fullname varchar(20), email varchar(20));
+INSERT INTO regexp_temp (fullname, email) VALUES ('John Doe', 'johndoe@example.com');
+INSERT INTO regexp_temp (fullname, email) VALUES ('Jane Doe', 'janedoe');
+-- count matches case sensitive
+SELECT fullname, regexp_count(fullname, 'e', 1, 'c') FROM regexp_temp;
+SELECT fullname, regexp_count(fullname, 'D', 1, 'c') FROM regexp_temp;
+SELECT fullname, regexp_count(fullname, 'd', 1, 'c') FROM regexp_temp;
+-- count matches case insensitive
+SELECT fullname, regexp_count(fullname, 'E', 1, 'i') FROM regexp_temp;
+SELECT fullname, regexp_count(fullname, 'do', 1, 'i') FROM regexp_temp;
+
+-- return the start position of the 6th occurence starting at beginning of the string
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[^ ]+', 1, 6);
+-- return the start position of the 5th occurence starting after the first word
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[^ ]+', 7, 5, 0);
+-- return the ending position of the 5th occurence starting after the first word
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[^ ]+', 7, 5, 0);
+-- return the ending position of the 2nd occurence starting after the first word
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[T|Z|S][[:alpha:]]{5}', 7, 2, 1, 'i');
+-- return the starting position corresponding to the different capture group
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 0);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 1);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 2);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 3);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 4);
+-- return the starting position corresponding to a non existant capture group
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 5);
+-- Same but with the ending position
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 0);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 1);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 2);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 3);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 4);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 5);
+-- start position of a valid email
+SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+') "valid_email" FROM regexp_temp;
+-- ending position of a valid email
+SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+', 1, 1, 1) "valid_email" FROM regexp_temp;
+-- start position of first capture group in the email (the dot part)
+SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+', 1, 1, 0, 'i', 1) FROM regexp_temp;
+-- ending position of first capture group in the email (the dot part)
+SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+', 1, 1, 1, 'i', 1) FROM regexp_temp;
+-- test negative values
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', -1, 1, 1, 'i', 1);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, -1, 1, 'i', 1);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, -1, 'i', 1);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', -1);
+
+-- Find the first occurrence of a sequence of letters starting with the letter e
+-- and ending with the letter y in the phrase "easy come, easy go."
+SELECT regexp_instr('easy come, easy go','e\w*y');
+-- Find the first sequence of letters starting with the letter e and ending with
+-- the letter y in the string "easy come, easy go" starting at the second character
+SELECT regexp_instr('easy come, easy go','e\w*y',2);
+-- Find the second sequence of letters starting with the letter e and ending with
+-- the letter y in the string "easy come, easy go" starting at the first character.
+SELECT regexp_instr('easy come, easy go','e\w*y',1,2);
+-- Find the position of the first character after the first whitespace in the string "easy come, easy go."
+SELECT regexp_instr('easy come, easy go','\s',1,1,1);
+-- Find the position of the start of the third word in a string by capturing each
+-- word as a subexpression, and returning the third subexpression's start position.
+SELECT regexp_instr('one two three','(\w+)\s+(\w+)\s+(\w+)', 1,1,0,'',3);
+
+-- return the substring matching the regexp
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+');
+-- return the substring matching the regexp
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+', 24);
+-- return the substring matching the regexp at the first occurrence
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+', 1, 1);
+-- return the substring matching the regexp at the second occurrence
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+', 1, 2);
+-- case sensitivity substring search
+SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1);
+SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1, 'i');
+-- case sensitivity substring search with no capture group
+SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1, 'i', 0);
+-- case sensitivity substring search with non existing capture group
+SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1, 'i', 0);
+-- return the substring matching the regexp at different occurrence and capture group
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, 1, 'i', 4);
+SELECT regexp_substr('1234567890 1234557890', '(123)(4(5[56])(78))', 1, 2, 'i', 3);
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, 1, 'i', 0);
+-- test negative values
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', -1, 1, 'i', 4);
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, -1, 'i', 4);
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, 1, 'i', -4);
+-- Select the first substring of letters that end with "thy."
+SELECT regexp_substr('healthy, wealthy, and wise','\w+thy');
+-- Select the first substring of letters that ends with "thy" starting at the second character in the string.
+SELECT regexp_substr('healthy, wealthy, and wise','\w+thy',2);
+-- Return the contents of the third captured subexpression, which captures the third word in the string.
+SELECT regexp_substr('one two three', '(\w+)\s+(\w+)\s+(\w+)', 1, 1, '', 3);
+
+DROP TABLE IF EXISTS regexp_temp;
+
+-- Regression tests for extended regexp_replace() function with start position and occurrence
+SELECT regexp_replace('512.123.4567', '([[:digit:]]{3})\.([[:digit:]]{3})\.([[:digit:]]{4})', '(\1) \2-\3', 1);
+SELECT regexp_replace('512.123.4567 612.123.4567', '([[:digit:]]{3})\.([[:digit:]]{3})\.([[:digit:]]{4})', '(\1) \2-\3', 1, 0);
+SELECT regexp_replace('number your street,'||CHR(10)||' zipcode town, FR', '( ){2,}', ' ', 1, 0);
+SELECT regexp_replace('number your street, zipcode town, FR', '( ){2,}', ' ', 9);
+SELECT regexp_replace('number your street, zipcode town, FR', '( ){2,}', ' ', 9, 0);
+SELECT regexp_replace('number your street, zipcode town, FR', '( ){2,}', ' ', 9, 2);
+SELECT regexp_replace('number your street, zipcode town, FR', '( ){2,}', ' ', 9, 2, 'm');
+SELECT regexp_replace('number your street, zipcode town, FR', '([EURT]){2,}', '[\1]', 9, 1, 'i');
+SELECT regexp_replace('number your street, zipcode town, FR', '([EURT]){2,}', '[\1]', 9, 2, 'i');
+SELECT regexp_replace ('A PostgreSQL function', 'A|e|i|o|u', 'X', 1, 2);
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 0, 'i');
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 1, 'i');
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 2, 'i');
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 3, 'i');
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 9, 'i');
+SELECT regexp_replace ('A PostgreSQL function', 'A|e|i|o|u', 'X', 1, 9);
+-- Invalid parameter values
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', -1, 0, 'i');
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, -1, 'i');
+-- Modifier 'g' should not be taken in account, we have an occurrence to replace
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 1, 'g');
+-- Find groups of "word characters" (letters, numbers and underscore) ending with
+-- "thy" in the string "healthy, wealthy, and wise" and replace them with nothing.
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', '', 'g');
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', '', 1, 0);
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', '', 1, 0, 'g');
+-- Find groups of word characters ending with "thy" and replace with "something."
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', 'something', 'g');
+-- Find groups of word characters ending with "thy" and replace with the string
+-- "something" starting at the third character in the string.
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', 'something', 3, 0);
+-- Replace the second group of word characters ending with "thy" with "something."
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', 'something', 1, 2);
+-- Find groups of word characters ending with "thy" capturing the letters before
+-- the "thy", and replace with the captured letters plus the letters "ish."
+SELECT regexp_replace('healthy, wealthy, and wise','(\w+)thy', '\1ish', 'g');
+SELECT regexp_replace('healthy, wealthy, and wise','(\w+)thy', '\1ish', 1, 0);
+
+
-- change NULL-display back
\pset null ''
Le 20/03/2021 à 19:48, Gilles Darold a écrit :
Hi,
This is a new version of the patch that now implements all the XQUERY
regexp functions as described in the standard, minus the differences
of PostgerSQL regular expression explain in [1].The standard SQL describe functions like_regex(), occurrences_regex(),
position_regex(), substring_regex() and translate_regex() which
correspond to the commonly named functions regexp_like(),
regexp_count(), regexp_instr(), regexp_substr() and regexp_replace()
as reported by Chapman Flack in [2]. All these function are
implemented in the patch. Syntax of the functions are:- regexp_like(string, pattern [, flags ])
- regexp_count( string, pattern [, position ] [, flags ])
- regexp_instr( string, pattern [, position ] [, occurrence ] [,
returnopt ] [, flags ] [, group ])- regexp_substr( string, pattern [, position ] [, occurrence ] [,
flags ] [, group ])- regexp_replace(source, pattern, replacement [, position ] [,
occurrence ] [, flags ])In addition to previous patch version I have added the regexp()_like
function and extended the existsing regex_replace() function. The
patch documents these functions and adds regression tests for all
functions. I will add it to the commitfest.An other regexp functions regexp_positions() that returns all
occurrences that matched a POSIX regular expression is also developped
by Joel Jacobson, see [2]. This function expands the list of regexp
functions described in XQUERY.[1]
https://www.postgresql.org/docs/13/functions-matching.html#FUNCTIONS-POSIX-REGEXP[2]
/messages/by-id/bf2222d5-909d-408b-8531-95b32f18d4ab@www.fastmail.com
I would like to see these functions in PG 14 but it is a bit too late,
added to commitfest 2021-07.
--
Gilles Darold
LzLabs GmbH
http://www.lzlabs.com/
On 2021.03.20. 19:48 Gilles Darold <gilles@darold.net> wrote:
This is a new version of the patch that now implements all the XQUERY
regexp functions as described in the standard, minus the differences of
PostgerSQL regular expression explain in [1].The standard SQL describe functions like_regex(), occurrences_regex(),
position_regex(), substring_regex() and translate_regex() which
correspond to the commonly named functions regexp_like(),
regexp_count(), regexp_instr(), regexp_substr() and regexp_replace() as
reported by Chapman Flack in [2]. All these function are implemented in
[v2-0001-xquery-regexp-functions.patch]
Hi,
Apply, compile and (world)check are fine. I haven't found errors in functionality.
I went through the docs, and came up with these changes in func.sgml, and pg_proc.dat.
Useful functions - thanks!
Erik Rijkers
Attachments:
func.sgml.20210321.difftext/x-patch; charset=ISO-8859-1; name=func.sgml.20210321.diffDownload
--- doc/src/sgml/func.sgml.orig 2021-03-21 03:59:37.884365465 +0100
+++ doc/src/sgml/func.sgml 2021-03-21 11:37:46.880644051 +0100
@@ -3106,7 +3106,7 @@
<returnvalue>integer</returnvalue>
</para>
<para>
- Return the number of times a pattern occurs for a match of a POSIX
+ Returns the number of times a pattern occurs for a match of a POSIX
regular expression to the <parameter>string</parameter>; see
<xref linkend="functions-posix-regexp"/>.
</para>
@@ -3125,11 +3125,11 @@
<returnvalue>integer</returnvalue>
</para>
<para>
- Return the position within <parameter>string</parameter> where the
+ Returns the position within <parameter>string</parameter> where the
match of a POSIX regular expression occurs. It returns an integer
indicating the beginning or ending position of the matched substring,
depending on the value of the <parameter>returnopt</parameter> argument
- (default beginning). If no match is found, then the function returns 0;
+ (default beginning). If no match is found the function returns 0;
see <xref linkend="functions-posix-regexp"/>.
</para>
<para>
@@ -3147,12 +3147,12 @@
<returnvalue>boolean</returnvalue>
</para>
<para>
- Evaluate the existence of a match to a POSIX regular expression
+ Evaluates the existence of a match to a POSIX regular expression
in <parameter>string</parameter>; see <xref linkend="functions-posix-regexp"/>.
</para>
<para>
<literal>regexp_like('Hello'||chr(10)||'world', '^world$', 'm')</literal>
- <returnvalue>3</returnvalue>
+ <returnvalue>t</returnvalue>
</para></entry>
</row>
@@ -5773,7 +5773,7 @@
</para>
<para>
- The <function>regexp_like</function> function evaluate the existence of a match
+ The <function>regexp_like</function> function evaluates the existence of a match
to a POSIX regular expression in <parameter>string</parameter>; returns a boolean
resulting from matching a POSIX regular expression pattern to a string. It has
the syntax <function>regexp_like</function>(<replaceable>string</replaceable>,
@@ -5782,7 +5782,7 @@
from the beginning of <replaceable>string</replaceable>.
The <replaceable>flags</replaceable> parameter is an optional text string
containing zero or more single-letter flags that change the function's behavior.
- <function>regexp_count</function> accepts all the flags
+ <function>regexp_like</function> accepts all the flags
shown in <xref linkend="posix-embedded-options-table"/>.
This function is similar to regexp operator <literal>~</literal> when used without
<replaceable>flags</replaceable> and similar to operator <literal>~*</literal> when
@@ -5792,9 +5792,9 @@
<para>
Some examples:
<programlisting>
-SELECT 'found' FROM t1 WHERE regexp_like('Hello'||chr(10)||'world', '^world$', 'm');
- regexp_like
---------------
+SELECT 'found' FROM (values('Hello'||chr(10)||'world') as f(col) WHERE regexp_like(col, '^world$', 'm');
+ ?column?
+----------
found
(1 row)
</programlisting>
@@ -5814,7 +5814,7 @@
<function>regexp_count</function> accepts all the flags
shown in <xref linkend="posix-embedded-options-table"/>.
The <literal>g</literal> flag is forced internally to count all matches.
- This function returns 0 if there is no match or the number of match as
+ This function returns 0 if there is no match or the number of matches as
an integer.
</para>
@@ -5853,17 +5853,17 @@
the position of the character after the occurrence.
The <replaceable>flags</replaceable> parameter is an optional text string
containing zero or more single-letter flags that change the function's behavior.
- <function>regexp_count</function> accepts all the flags
+ <function>regexp_instr</function> accepts all the flags
shown in <xref linkend="posix-embedded-options-table"/>.
The <literal>g</literal> flag is forced internally to track all matches.
For a pattern with capture groups, <replaceable>group</replaceable> is an integer indicating
- which capture in pattern is the target of the function. A capture group is a part of the pattern
- enclosed in parentheses. Capture groups can be nested. They are numbered in order in which their
- left parentheses appear in pattern. If <replaceable>group</replaceable> is zero, then the position
+ which capture in <replaceable>pattern</replaceable> is the target of the function. A capture group is a part of the pattern
+ enclosed in parentheses. Capture groups can be nested. They are numbered in the order in which their
+ left parentheses appear in <replaceable>pattern</replaceable>. If <replaceable>group</replaceable> is zero, then the position
of the entire substring that matches the pattern is returned. If <replaceable>pattern</replaceable>
- does not have at least <replaceable>group</replaceable> capture group, the function returns zero.
+ does not have at least <replaceable>group</replaceable> capture groups, the function returns zero.
This function returns 0 if there is no match or the starting or ending position
- of match as an integer.
+ of a match as an integer.
</para>
<para>
@@ -5897,16 +5897,16 @@
indicates which occurrence of <replaceable>pattern</replaceable> in <replaceable>string</replaceable>
should be searched. The <replaceable>flags</replaceable> parameter is an optional text string
containing zero or more single-letter flags that change the function's behavior.
- <function>regexp_count</function> accepts all the flags
+ <function>regexp_substr</function> accepts all the flags
shown in <xref linkend="posix-embedded-options-table"/>.
The <literal>g</literal> flag is forced internally to track all matches.
For a pattern with capture groups, optional <replaceable>group</replaceable> is an integer indicating
- which capture in pattern is the target of the function. A capture group is a part of the pattern
- enclosed in parentheses. Capture groups can be nested. They are numbered in order in which their
- left parentheses appear in pattern. If <replaceable>group</replaceable> is zero, then the position
+ which capture in <replaceable>pattern</replaceable> is the target of the function. A capture group is a part of the pattern
+ enclosed in parentheses. Capture groups can be nested. They are numbered in the order in which their
+ left parentheses appear in <replaceable>pattern</replaceable>. If <replaceable>group</replaceable> is zero, then the position
of the entire substring that matches the pattern is returned. If <replaceable>pattern</replaceable>
- does not have at least <replaceable>group</replaceable> capture group, the function returns zero.
- This function returns NULL if there is no match or the substring of match.
+ does not have at least <replaceable>group</replaceable> capture groups, the function returns zero.
+ This function returns NULL if there is no match or the substring of the match.
</para>
<para>
pg_proc.dat.20210321.difftext/x-patch; charset=ISO-8859-1; name=pg_proc.dat.20210321.diffDownload
--- src/include/catalog/pg_proc.dat.orig 2021-03-21 11:59:04.107454798 +0100
+++ src/include/catalog/pg_proc.dat 2021-03-21 12:02:24.006401415 +0100
@@ -3587,19 +3587,19 @@
{ oid => '9622', descr => 'position where the match for regexp was located',
proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text int4 int4 int4 text int4',
prosrc => 'regexp_instr' },
-{ oid => '9623', descr => 'substring that match the regexp pattern',
+{ oid => '9623', descr => 'substring that matches the regexp pattern',
proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text',
prosrc => 'regexp_substr_no_start' },
-{ oid => '9624', descr => 'substring that match the regexp pattern',
+{ oid => '9624', descr => 'substring that matches the regexp pattern',
proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text int4',
prosrc => 'regexp_substr_no_occurrence' },
-{ oid => '9625', descr => 'substring that match the regexp pattern',
+{ oid => '9625', descr => 'substring that matches the regexp pattern',
proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text int4 int4',
prosrc => 'regexp_substr_no_flags' },
-{ oid => '9626', descr => 'substring that match the regexp pattern',
+{ oid => '9626', descr => 'substring that matches the regexp pattern',
proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text int4 int4 text',
prosrc => 'regexp_substr_no_subexpr' },
-{ oid => '9627', descr => 'substring that match the regexp pattern',
+{ oid => '9627', descr => 'substring that matches the regexp pattern',
proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text int4 int4 text int4',
prosrc => 'regexp_substr' },
{ oid => '9628', descr => 'evaluate match(es) for regexp',
Le 21/03/2021 à 12:07, er@xs4all.nl a écrit :
On 2021.03.20. 19:48 Gilles Darold <gilles@darold.net> wrote:
This is a new version of the patch that now implements all the XQUERY
regexp functions as described in the standard, minus the differences of
PostgerSQL regular expression explain in [1].The standard SQL describe functions like_regex(), occurrences_regex(),
position_regex(), substring_regex() and translate_regex() which
correspond to the commonly named functions regexp_like(),
regexp_count(), regexp_instr(), regexp_substr() and regexp_replace() as
reported by Chapman Flack in [2]. All these function are implemented in
[v2-0001-xquery-regexp-functions.patch]Hi,
Apply, compile and (world)check are fine. I haven't found errors in functionality.
I went through the docs, and came up with these changes in func.sgml, and pg_proc.dat.
Useful functions - thanks!
Erik Rijkers
Thanks a lot Erik, here is a version of the patch with your corrections.
--
Gilles Darold
LzLabs GmbH
http://www.lzlabs.com/
Attachments:
v3-0001-xquery-regexp-functions.patchtext/x-patch; charset=UTF-8; name=v3-0001-xquery-regexp-functions.patchDownload
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index fee0561961..36e446cb7b 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3097,6 +3097,66 @@ repeat('Pg', 4) <returnvalue>PgPgPgPg</returnvalue>
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>regexp_count</primary>
+ </indexterm>
+ <function>regexp_count</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type> [, <parameter>position</parameter> <type>integer</type> ] [, <parameter>flags</parameter> <type>text</type> ] )
+ <returnvalue>integer</returnvalue>
+ </para>
+ <para>
+ Returns the number of times a pattern occurs for a match of a POSIX
+ regular expression to the <parameter>string</parameter>; see
+ <xref linkend="functions-posix-regexp"/>.
+ </para>
+ <para>
+ <literal>regexp_count('123456789012', '\d{3}', 3)</literal>
+ <returnvalue>3</returnvalue>
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>regexp_instr</primary>
+ </indexterm>
+ <function>regexp_instr</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type> [, <parameter>position</parameter> <type>integer</type> ] [, <parameter>occurence</parameter> <type>integer</type> ] [, <parameter>returnopt</parameter> <type>integer</type> ] [, <parameter>flags</parameter> <type>text</type> ] [, <parameter>group</parameter> <type>integer</type> ] )
+ <returnvalue>integer</returnvalue>
+ </para>
+ <para>
+ Returns the position within <parameter>string</parameter> where the
+ match of a POSIX regular expression occurs. It returns an integer
+ indicating the beginning or ending position of the matched substring,
+ depending on the value of the <parameter>returnopt</parameter> argument
+ (default beginning). If no match is found the function returns 0;
+ see <xref linkend="functions-posix-regexp"/>.
+ </para>
+ <para>
+ <literal>regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 4)</literal>
+ <returnvalue>7</returnvalue>
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>regexp_like</primary>
+ </indexterm>
+ <function>regexp_like</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type> [, <parameter>flags</parameter> <type>text</type> ] )
+ <returnvalue>boolean</returnvalue>
+ </para>
+ <para>
+ Evaluates the existence of a match to a POSIX regular expression
+ in <parameter>string</parameter>; see <xref linkend="functions-posix-regexp"/>.
+ </para>
+ <para>
+ <literal>regexp_like('Hello'||chr(10)||'world', '^world$', 'm')</literal>
+ <returnvalue>t</returnvalue>
+ </para></entry>
+ </row>
+
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
@@ -3144,7 +3204,7 @@ repeat('Pg', 4) <returnvalue>PgPgPgPg</returnvalue>
<indexterm>
<primary>regexp_replace</primary>
</indexterm>
- <function>regexp_replace</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type>, <parameter>replacement</parameter> <type>text</type> [, <parameter>flags</parameter> <type>text</type> ] )
+ <function>regexp_replace</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type>, <parameter>replacement</parameter> <type>text</type> [, <parameter>position</parameter> <type>integer</type> ] [, <parameter>occurence</parameter> <type>integer</type> ] [, <parameter>flags</parameter> <type>text</type> ] )
<returnvalue>text</returnvalue>
</para>
<para>
@@ -3157,6 +3217,24 @@ repeat('Pg', 4) <returnvalue>PgPgPgPg</returnvalue>
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>regexp_substr</primary>
+ </indexterm>
+ <function>regexp_substr</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type> [, <parameter>position</parameter> <type>integer</type> ] [, <parameter>occurence</parameter> <type>integer</type> ] [, <parameter>flags</parameter> <type>text</type> ] [, <parameter>group</parameter> <type>integer</type> ] )
+ <returnvalue>text</returnvalue>
+ </para>
+ <para>
+ Return the substring within <parameter>string</parameter> corresponding to the
+ match of a POSIX regular expression; see <xref linkend="functions-posix-regexp"/>.
+ </para>
+ <para>
+ <literal>regexp_substr('1234567890 1234557890', '(123)(4(5[56])(78))', 1, 2, 'i', 3)</literal>
+ <returnvalue>55</returnvalue>
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
@@ -5295,6 +5373,18 @@ substring('foobar' similar '#"o_b#"%' escape '#') <lineannotation>NULL</linea
<indexterm>
<primary>regexp_split_to_array</primary>
</indexterm>
+ <indexterm>
+ <primary>regexp_like</primary>
+ </indexterm>
+ <indexterm>
+ <primary>regexp_count</primary>
+ </indexterm>
+ <indexterm>
+ <primary>regexp_instr</primary>
+ </indexterm>
+ <indexterm>
+ <primary>regexp_substr</primary>
+ </indexterm>
<para>
<xref linkend="functions-posix-table"/> lists the available
@@ -5451,6 +5541,8 @@ substring('foobar' from 'o(.)b') <lineannotation>o</lineannotation>
It has the syntax
<function>regexp_replace</function>(<replaceable>source</replaceable>,
<replaceable>pattern</replaceable>, <replaceable>replacement</replaceable>
+ <optional>, <replaceable>position</replaceable> </optional>
+ <optional>, <replaceable>occurrence</replaceable> </optional>
<optional>, <replaceable>flags</replaceable> </optional>).
The <replaceable>source</replaceable> string is returned unchanged if
there is no match to the <replaceable>pattern</replaceable>. If there is a
@@ -5464,12 +5556,19 @@ substring('foobar' from 'o(.)b') <lineannotation>o</lineannotation>
substring matching the entire pattern should be inserted. Write
<literal>\\</literal> if you need to put a literal backslash in the replacement
text.
- The <replaceable>flags</replaceable> parameter is an optional text
- string containing zero or more single-letter flags that change the
+ <replaceable>pattern</replaceable> is searched in <replaceable>string</replaceable> starting
+ from an optional <replaceable>position</replaceable> or from the beginning
+ of <replaceable>source</replaceable> by default. Optional <replaceable>occurrence</replaceable>
+ parameter indicates which occurrence of <replaceable>pattern</replaceable> in
+ <replaceable>source</replaceable> should be replaced. Default value for <replaceable>occurrence</replaceable>
+ is 1, replace only the first occurrence. The <replaceable>flags</replaceable> parameter
+ is an optional text string containing zero or more single-letter flags that change the
function's behavior. Flag <literal>i</literal> specifies case-insensitive
matching, while flag <literal>g</literal> specifies replacement of each matching
- substring rather than only the first one. Supported flags (though
- not <literal>g</literal>) are
+ substring rather than only the first one. When <replaceable>occurrence</replaceable>
+ is set flag <literal>g</literal> has no effect. If <replaceable>occurrence</replaceable>
+ is set to zero, all occurrences are replaced which is similar to flag <literal>g</literal>.
+ Supported flags (though not <literal>g</literal>) are
described in <xref linkend="posix-embedded-options-table"/>.
</para>
@@ -5482,6 +5581,10 @@ regexp_replace('foobarbaz', 'b..', 'X', 'g')
<lineannotation>fooXX</lineannotation>
regexp_replace('foobarbaz', 'b(..)', 'X\1Y', 'g')
<lineannotation>fooXarYXazY</lineannotation>
+regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 0, 'i')
+ <lineannotation>X PXstgrXSQL fXnctXXn</lineannotation>
+regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 3, 'i')
+ <lineannotation>A PostgrXSQL function</lineannotation>
</programlisting>
</para>
@@ -5669,6 +5772,163 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', '\s*') AS foo;
in practice. Other software systems such as Perl use similar definitions.
</para>
+ <para>
+ The <function>regexp_like</function> function evaluates the existence of a match
+ to a POSIX regular expression in <parameter>string</parameter>; returns a boolean
+ resulting from matching a POSIX regular expression pattern to a string. It has
+ the syntax <function>regexp_like</function>(<replaceable>string</replaceable>,
+ <replaceable>pattern</replaceable> <optional>, <replaceable>flags</replaceable> </optional>).
+ <replaceable>pattern</replaceable> is searched in <replaceable>string</replaceable> starting
+ from the beginning of <replaceable>string</replaceable>.
+ The <replaceable>flags</replaceable> parameter is an optional text string
+ containing zero or more single-letter flags that change the function's behavior.
+ <function>regexp_like</function> accepts all the flags
+ shown in <xref linkend="posix-embedded-options-table"/>.
+ This function is similar to regexp operator <literal>~</literal> when used without
+ <replaceable>flags</replaceable> and similar to operator <literal>~*</literal> when
+ used with <replaceable>flags</replaceable> <literal>'i'</literal> only.
+ </para>
+
+ <para>
+ Some examples:
+<programlisting>
+SELECT 'found' FROM (values('Hello'||chr(10)||'world') as f(col) WHERE regexp_like(col, '^world$', 'm');
+ ?column?
+----------
+ found
+(1 row)
+</programlisting>
+ </para>
+
+
+ <para>
+ The <function>regexp_count</function> function returns the number of
+ captured substring(s) resulting from matching a POSIX regular
+ expression pattern to a string. It has the syntax <function>regexp_count</function>(
+ <replaceable>string</replaceable>, <replaceable>pattern</replaceable> <optional>,
+ <replaceable>position</replaceable> </optional> <optional>, <replaceable>flags</replaceable> </optional>).
+ <replaceable>pattern</replaceable> is searched in <replaceable>string</replaceable> starting
+ from an optional <replaceable>position</replaceable> or from the beginning of <replaceable>string</replaceable>
+ by default. The <replaceable>flags</replaceable> parameter is an optional text string
+ containing zero or more single-letter flags that change the function's behavior.
+ <function>regexp_count</function> accepts all the flags
+ shown in <xref linkend="posix-embedded-options-table"/>.
+ The <literal>g</literal> flag is forced internally to count all matches.
+ This function returns 0 if there is no match or the number of matches as
+ an integer.
+ </para>
+
+ <para>
+ Some examples:
+<programlisting>
+SELECT regexp_count('123123123123', '\d{3}', 1);
+ regexp_count
+--------------
+ 4
+(1 row)
+
+SELECT regexp_count('Hello'||CHR(10)||'world!', '^world!$', 1, 'm');
+ regexp_count
+--------------
+ 1
+(1 row)
+</programlisting>
+ </para>
+
+ <para>
+ The <function>regexp_instr</function> function returns the beginning or ending
+ position of the matched substring resulting from matching a POSIX regular
+ expression pattern to a string. It has the syntax <function>regexp_instr</function>(
+ <replaceable>string</replaceable>, <replaceable>pattern</replaceable> <optional>,
+ <replaceable>position</replaceable> </optional> <optional>, <replaceable>occurrence</replaceable> </optional>
+ <optional>, <replaceable>returnopt</replaceable> </optional>
+ <optional>, <replaceable>flags</replaceable> </optional>
+ <optional>, <replaceable>group</replaceable> </optional>).
+ <replaceable>pattern</replaceable> is searched in <replaceable>string</replaceable> starting
+ from an optional <replaceable>position</replaceable> or from the beginning
+ of <replaceable>string</replaceable> by default. Optional <replaceable>occurrence</replaceable> parameter
+ indicates which occurrence of <replaceable>pattern</replaceable> in <replaceable>string</replaceable>
+ should be searched. When optional <replaceable>returnopt</replaceable> parameter is set to 0 (default)
+ the function returns the position of the first character of the occurrence. When set to 1 returns
+ the position of the character after the occurrence.
+ The <replaceable>flags</replaceable> parameter is an optional text string
+ containing zero or more single-letter flags that change the function's behavior.
+ <function>regexp_instr</function> accepts all the flags
+ shown in <xref linkend="posix-embedded-options-table"/>.
+ The <literal>g</literal> flag is forced internally to track all matches.
+ For a pattern with capture groups, <replaceable>group</replaceable> is an integer indicating
+ which capture in <replaceable>pattern</replaceable> is the target of the function.
+ A capture group is a part of the pattern enclosed in parentheses. Capture groups can be nested.
+ They are numbered in the order in which their left parentheses appear in <replaceable>pattern</replaceable>.
+ If <replaceable>group</replaceable> is zero, then the position
+ of the entire substring that matches the pattern is returned. If <replaceable>pattern</replaceable>
+ does not have at least <replaceable>group</replaceable> capture groups, the function returns zero.
+ This function returns 0 if there is no match or the starting or ending position
+ of a match as an integer.
+ </para>
+
+ <para>
+ Some examples:
+<programlisting>
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[^ ]+', 1, 6);
+ regexp_instr
+--------------
+ 32
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 3);
+ regexp_instr
+--------------
+ 5
+(1 row)
+</programlisting>
+ </para>
+
+ <para>
+ The <function>regexp_substr</function> function returns the
+ matched substring resulting from matching a POSIX regular
+ expression pattern to a string. It has the syntax <function>regexp_substr</function>(
+ <replaceable>string</replaceable>, <replaceable>pattern</replaceable> <optional>,
+ <replaceable>position</replaceable> </optional> <optional>, <replaceable>occurrence</replaceable> </optional>
+ <optional>, <replaceable>flags</replaceable> </optional>
+ <optional>, <replaceable>group</replaceable> </optional>).
+ <replaceable>pattern</replaceable> is searched in <replaceable>string</replaceable> starting
+ from an optional <replaceable>position</replaceable> or from the beginning
+ of <replaceable>string</replaceable> by default. Optional parameter <replaceable>occurrence</replaceable>
+ indicates which occurrence of <replaceable>pattern</replaceable> in <replaceable>string</replaceable>
+ should be searched. The <replaceable>flags</replaceable> parameter is an optional text string
+ containing zero or more single-letter flags that change the function's behavior.
+ <function>regexp_substr</function> accepts all the flags
+ shown in <xref linkend="posix-embedded-options-table"/>.
+ The <literal>g</literal> flag is forced internally to track all matches.
+ For a pattern with capture groups, optional <replaceable>group</replaceable> is an integer indicating
+ which capture in <replaceable>pattern</replaceable> is the target of the function. A capture group
+ is a part of the pattern enclosed in parentheses. Capture groups can be nested. They are numbered in
+ the order in which their left parentheses appear in <replaceable>pattern</replaceable>.
+ If <replaceable>group</replaceable> is zero, then the position
+ of the entire substring that matches the pattern is returned. If <replaceable>pattern</replaceable>
+ does not have at least <replaceable>group</replaceable> capture groups, the function returns zero.
+ This function returns NULL if there is no match or the substring of the match.
+ </para>
+
+ <para>
+ Some examples:
+<programlisting>
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+', 1, 2);
+ regexp_substr
+---------------
+ , FR
+(1 row)
+
+SELECT regexp_substr('1234567890 1234557890', '(123)(4(5[56])(78))', 1, 2, 'i', 3);
+ regexp_substr
+---------------
+ 55
+(1 row)
+</programlisting>
+ </para>
+
+
<!-- derived from the re_syntax.n man page -->
<sect3 id="posix-syntax-details">
diff --git a/src/backend/utils/adt/regexp.c b/src/backend/utils/adt/regexp.c
index a32c5c82ab..624c57cec9 100644
--- a/src/backend/utils/adt/regexp.c
+++ b/src/backend/utils/adt/regexp.c
@@ -120,6 +120,7 @@ static regexp_matches_ctx *setup_regexp_matches(text *orig_str, text *pattern,
static ArrayType *build_regexp_match_result(regexp_matches_ctx *matchctx);
static Datum build_regexp_split_result(regexp_matches_ctx *splitctx);
+static text *enclose_with_parenthesis(text *str);
/*
* RE_compile_and_cache - compile a RE, caching if possible
@@ -555,7 +556,6 @@ texticregexne(PG_FUNCTION_ARGS)
0, NULL));
}
-
/*
* textregexsubstr()
* Return a substring matched by a regular expression.
@@ -1063,6 +1063,508 @@ regexp_matches_no_flags(PG_FUNCTION_ARGS)
return regexp_matches(fcinfo);
}
+/*
+ * regexp_like()
+ * Return the true if a pattern match within a string.
+ */
+Datum
+regexp_like(PG_FUNCTION_ARGS)
+{
+ text *str = PG_GETARG_TEXT_PP(0);
+ text *pattern = PG_GETARG_TEXT_PP(1);
+ text *flags = PG_GETARG_TEXT_PP_IF_EXISTS(2);
+
+ pg_re_flags re_flags;
+ regexp_matches_ctx *matchctx;
+
+ /* Determine options */
+ parse_re_flags(&re_flags, flags);
+
+ matchctx = setup_regexp_matches(str, pattern, &re_flags,
+ PG_GET_COLLATION(), true, false, false);
+
+ if (matchctx->nmatches > 0)
+ PG_RETURN_BOOL(true);
+
+ PG_RETURN_BOOL(false);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_like_no_flags(PG_FUNCTION_ARGS)
+{
+ return regexp_like(fcinfo);
+}
+
+/*
+ * regexp_count()
+ * Return the number of matches of a pattern within a string.
+ */
+Datum
+regexp_count(PG_FUNCTION_ARGS)
+{
+ text *orig_str = NULL;
+ text *pattern = PG_GETARG_TEXT_PP(1);
+ int start = 1;
+ text *flags = PG_GETARG_TEXT_PP_IF_EXISTS(3);
+
+ pg_re_flags re_flags;
+ regexp_matches_ctx *matchctx;
+
+ if (PG_NARGS() > 2)
+ start = PG_GETARG_INT32(2);
+
+ if (start < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d", "start_position", start)));
+
+ /* regexp_count(string, pattern, start[, flags]) */
+ if (start > 1)
+ orig_str = DatumGetTextPP(DirectFunctionCall2(text_substr_no_len,
+ PG_GETARG_DATUM(0),
+ Int32GetDatum(start)
+ ));
+ else
+ orig_str = PG_GETARG_TEXT_PP(0);
+
+ /* Determine options */
+ parse_re_flags(&re_flags, flags);
+
+ /* this function require flag 'g' */
+ re_flags.glob = true;
+
+ matchctx = setup_regexp_matches(orig_str, pattern, &re_flags,
+ PG_GET_COLLATION(), true, false, false);
+
+ PG_RETURN_INT32(matchctx->nmatches);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_count_no_start(PG_FUNCTION_ARGS)
+{
+ return regexp_count(fcinfo);
+}
+
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_count_no_flags(PG_FUNCTION_ARGS)
+{
+ return regexp_count(fcinfo);
+}
+
+/*
+ * textregexreplace_extended()
+ * Return a string matched by a regular expression, with replacement.
+ * Extends textregexreplace by allowing a start position and the
+ * choice of the occurrence to replace (0 means all occurrences).
+ */
+Datum
+textregexreplace_extended(PG_FUNCTION_ARGS)
+{
+ text *s = NULL;
+ text *p = PG_GETARG_TEXT_PP(1);
+ text *pattern = enclose_with_parenthesis(PG_GETARG_TEXT_PP(1));
+ text *r = PG_GETARG_TEXT_PP(2);
+ int start = 1;
+ int occurrence = 1;
+ int pos = 0;
+ text *flags = PG_GETARG_TEXT_PP_IF_EXISTS(5);
+ StringInfoData str;
+ regex_t *re;
+ pg_re_flags re_flags;
+ regexp_matches_ctx *matchctx;
+ text *after = NULL;
+
+ /* start position */
+ if (PG_NARGS() > 3)
+ start = PG_GETARG_INT32(3);
+ if (start < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d", "start_position", start)));
+ if (start > 1)
+ s = DatumGetTextPP(DirectFunctionCall2(text_substr_no_len,
+ PG_GETARG_DATUM(0),
+ Int32GetDatum(start)
+ ));
+ else
+ s = PG_GETARG_TEXT_PP(0);
+
+ /* occurrence to replace */
+ if (PG_NARGS() > 4)
+ occurrence = PG_GETARG_INT32(4);
+
+ if (occurrence < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d", "occurrence", occurrence)));
+
+ /* Determine options */
+ parse_re_flags(&re_flags, flags);
+ re_flags.glob = true;
+
+ /* lookup for pattern */
+ matchctx = setup_regexp_matches(s, pattern, &re_flags,
+ PG_GET_COLLATION(), true,
+ false, re_flags.glob);
+
+ /* If no match is found, then the function returns the original string */
+ if (matchctx->nmatches == 0)
+ PG_RETURN_DATUM(PG_GETARG_DATUM(0));
+
+ /* Get the position of the occurence to replace */
+ if (PG_NARGS() > 4 && occurrence > 0)
+ {
+ /* When occurrence exceed matches return the original string */
+ if (occurrence > matchctx->nmatches)
+ PG_RETURN_DATUM(PG_GETARG_DATUM(0));
+
+ pos = (occurrence*matchctx->npatterns*2)-(matchctx->npatterns*2);
+ pos = matchctx->match_locs[pos]+start;
+ }
+ else
+ pos = start;
+
+ /* Normal case without explicit VARIADIC marker */
+ initStringInfo(&str);
+
+ /* Get the string before the occurrence starting */
+ if (pos > 1)
+ {
+ text *before = DatumGetTextPP(DirectFunctionCall3(text_substr,
+ PG_GETARG_DATUM(0),
+ Int32GetDatum(1),
+ Int32GetDatum(pos - 1)));
+ appendStringInfoString(&str, TextDatumGetCString(before));
+ }
+
+ /* all occurences must be replaced? */
+ if (occurrence == 0)
+ re_flags.glob = true;
+ else
+ re_flags.glob = false;
+
+ /* Compile the regular expression */
+ re = RE_compile_and_cache(p, re_flags.cflags, PG_GET_COLLATION());
+
+ /* Get the substring starting at the right occurrence position */
+ after = DatumGetTextPP(DirectFunctionCall2(text_substr_no_len,
+ PG_GETARG_DATUM(0),
+ Int32GetDatum(pos)
+ ));
+
+ appendStringInfoString(&str, TextDatumGetCString(replace_text_regexp(
+ after,
+ (void *) re,
+ r,
+ re_flags.glob)));
+
+ PG_RETURN_TEXT_P(CStringGetTextDatum(str.data));
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+textregexreplace_extended_no_occurrence(PG_FUNCTION_ARGS)
+{
+ return textregexreplace_extended(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+textregexreplace_extended_no_flags(PG_FUNCTION_ARGS)
+{
+ return textregexreplace_extended(fcinfo);
+}
+
+/*
+ * Return text string between parenthesis
+ */
+static text *
+enclose_with_parenthesis(text *str)
+{
+ int len = VARSIZE_ANY_EXHDR(str);
+ text *result;
+ char *ptr;
+
+ result = palloc(len + VARHDRSZ + 2);
+ SET_VARSIZE(result, len + VARHDRSZ + 2);
+ ptr = VARDATA(result);
+ memcpy(ptr, "(", 1);
+ memcpy(ptr+1, VARDATA_ANY(str), len);
+ memcpy(ptr+len+1, ")", 1);
+
+ return result;
+}
+
+/*
+ * regexp_instr()
+ * Return the position within the string where the match was located
+ */
+Datum
+regexp_instr(PG_FUNCTION_ARGS)
+{
+ text *orig_str = NULL;
+ text *pattern = NULL;
+ text *flags = PG_GETARG_TEXT_PP_IF_EXISTS(5);
+ int start = 1;
+ int occurrence = 1;
+ int return_opt = 0;
+ int subexpr = 0;
+ int pos;
+
+ pg_re_flags re_flags;
+ regexp_matches_ctx *matchctx;
+
+ /* regexp_instr(string, pattern, start) */
+ if (PG_NARGS() > 2)
+ start = PG_GETARG_INT32(2);
+
+ if (start < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d", "start_position", start)));
+
+ if (start > 1)
+ orig_str = DatumGetTextPP(DirectFunctionCall2(text_substr_no_len,
+ PG_GETARG_DATUM(0),
+ Int32GetDatum(start)
+ ));
+ else
+ /* regexp_instr(string, pattern) */
+ orig_str = PG_GETARG_TEXT_PP(0);
+
+ if (orig_str == NULL)
+ PG_RETURN_NULL();
+
+ /* regexp_instr(string, pattern, start, occurrence) */
+ if (PG_NARGS() > 3)
+ occurrence = PG_GETARG_INT32(3);
+
+ if (occurrence <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d", "occurrence", occurrence)));
+
+ /* regexp_instr(string, pattern, start, occurrence, return_opt) */
+ if (PG_NARGS() > 4)
+ return_opt = PG_GETARG_INT32(4);
+
+ if (return_opt != 0 && return_opt != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d", "return_option", return_opt)));
+
+ /* regexp_instr(string, pattern, start, occurrence, return_opt, flags, subexpr) */
+ if (PG_NARGS() > 6)
+ subexpr = PG_GETARG_INT32(6);
+ if (subexpr < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d", "group", subexpr)));
+
+ /*
+ * If subexpr is zero (default), then the position of the entire
+ * substring that matches the pattern is returned. Otherwise we
+ * will exactly register the subexpressions given in the pattern.
+ * Enclose pattern between parenthesis to register the position
+ * of the entire substring.
+ */
+ pattern = enclose_with_parenthesis(PG_GETARG_TEXT_PP(1));
+
+ /* Determine options */
+ parse_re_flags(&re_flags, flags);
+ /* this function require flag 'g' */
+ re_flags.glob = true;
+
+ matchctx = setup_regexp_matches(orig_str, pattern, &re_flags,
+ PG_GET_COLLATION(), true, false, false);
+
+ /* If no match is found, then the function returns 0 */
+ if (matchctx->nmatches == 0)
+ PG_RETURN_INT32(0);
+
+ /* When occurrence exceed matches return 0 */
+ if (occurrence > matchctx->nmatches)
+ PG_RETURN_INT32(0);
+
+ /* When subexpr exceed number of subexpression return 0 */
+ if (subexpr > matchctx->npatterns - 1)
+ PG_RETURN_INT32(0);
+
+ /*
+ * Returns the position of the first character of the occurrence
+ * or for subexpression in this occurrence.
+ */
+ pos = (occurrence*matchctx->npatterns*2)-((matchctx->npatterns-subexpr)*2);
+ if (return_opt == 1)
+ pos += 1;
+
+ PG_RETURN_INT32(matchctx->match_locs[pos]+start);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_instr_no_start(PG_FUNCTION_ARGS)
+{
+ return regexp_instr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_instr_no_occurrence(PG_FUNCTION_ARGS)
+{
+ return regexp_instr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_instr_no_return_opt(PG_FUNCTION_ARGS)
+{
+ return regexp_instr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_instr_no_flags(PG_FUNCTION_ARGS)
+{
+ return regexp_instr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_instr_no_subexpr(PG_FUNCTION_ARGS)
+{
+ return regexp_instr(fcinfo);
+}
+
+/*
+ * regexp_substr()
+ * Return the substring within the string that match a regular
+ * expression pattern
+ */
+Datum
+regexp_substr(PG_FUNCTION_ARGS)
+{
+ text *orig_str = NULL;
+ text *pattern = NULL;
+ text *flags = PG_GETARG_TEXT_PP_IF_EXISTS(4);
+ int start = 1;
+ int occurrence = 1;
+ int subexpr = 0;
+ int so, eo, pos;
+
+ pg_re_flags re_flags;
+ regexp_matches_ctx *matchctx;
+
+ if (PG_NARGS() > 2)
+ start = PG_GETARG_INT32(2);
+
+ if (start < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d", "start_position", start)));
+
+ /* regexp_substr(string, pattern, start) */
+ if (start > 1)
+ orig_str = DatumGetTextPP(DirectFunctionCall2(text_substr_no_len,
+ PG_GETARG_DATUM(0),
+ Int32GetDatum(start)
+ ));
+ else
+ /* regexp_substr(string, pattern) */
+ orig_str = PG_GETARG_TEXT_PP(0);
+
+ if (orig_str == NULL)
+ PG_RETURN_NULL();
+
+ /* regexp_substr(string, pattern, start, occurrence) */
+ if (PG_NARGS() > 3)
+ occurrence = PG_GETARG_INT32(3);
+
+ if (occurrence <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d", "occurrence", occurrence)));
+
+ /* regexp_substr(string, pattern, start, occurrence, flags, subexpr) */
+ if (PG_NARGS() > 5)
+ subexpr = PG_GETARG_INT32(5);
+ if (subexpr < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d", "group", subexpr)));
+
+ /*
+ * If subexpr is zero (default), then the position of the entire
+ * substring that matches the pattern is returned. Otherwise we
+ * will exactly register the subexpressions given in the pattern.
+ * Enclose pattern between parenthesis to register the position
+ * of the entire substring.
+ */
+ pattern = enclose_with_parenthesis(PG_GETARG_TEXT_PP(1));
+
+ /* Determine options */
+ parse_re_flags(&re_flags, flags);
+ /* this function require flag 'g' */
+ re_flags.glob = true;
+
+ matchctx = setup_regexp_matches(orig_str, pattern, &re_flags,
+ PG_GET_COLLATION(), true, false, false);
+
+ /* If no match is found, then the function returns NULL */
+ if (matchctx->nmatches == 0)
+ PG_RETURN_NULL();
+
+ /* When occurrence exceed matches return NULL */
+ if (occurrence > matchctx->nmatches)
+ PG_RETURN_NULL();
+
+ /* When subexpr exceed number of subexpression return NULL */
+ if (subexpr > matchctx->npatterns - 1)
+ PG_RETURN_NULL();
+
+ /* Returns the substring corresponding to the occurrence. */
+ pos = (occurrence*matchctx->npatterns*2)-((matchctx->npatterns-subexpr)*2);
+ so = matchctx->match_locs[pos]+1;
+ eo = matchctx->match_locs[pos+1]+1;
+
+ PG_RETURN_DATUM(DirectFunctionCall3(text_substr,
+ PointerGetDatum(matchctx->orig_str),
+ Int32GetDatum(so),
+ Int32GetDatum(eo - so)));
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_substr_no_start(PG_FUNCTION_ARGS)
+{
+ return regexp_substr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_substr_no_occurrence(PG_FUNCTION_ARGS)
+{
+ return regexp_substr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_substr_no_flags(PG_FUNCTION_ARGS)
+{
+ return regexp_substr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_substr_no_subexpr(PG_FUNCTION_ARGS)
+{
+ return regexp_substr(fcinfo);
+}
+
/*
* setup_regexp_matches --- do the initial matching for regexp_match
* and regexp_split functions
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 59d2b71ca9..fb91c54afe 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -3560,6 +3560,63 @@
proname => 'regexp_matches', prorows => '10', proretset => 't',
prorettype => '_text', proargtypes => 'text text text',
prosrc => 'regexp_matches' },
+{ oid => '9614', descr => 'count match(es) for regexp',
+ proname => 'regexp_count', prorettype => 'int4', proargtypes => 'text text',
+ prosrc => 'regexp_count_no_start' },
+{ oid => '9615', descr => 'count match(es) for regexp',
+ proname => 'regexp_count', prorettype => 'int4', proargtypes => 'text text int4',
+ prosrc => 'regexp_count_no_flags' },
+{ oid => '9616', descr => 'count match(es) for regexp',
+ proname => 'regexp_count', prorettype => 'int4',
+ proargtypes => 'text text int4 text', prosrc => 'regexp_count' },
+{ oid => '9617', descr => 'position where the match for regexp was located',
+ proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text',
+ prosrc => 'regexp_instr_no_start' },
+{ oid => '9618', descr => 'position where the match for regexp was located',
+ proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text int4',
+ prosrc => 'regexp_instr_no_occurrence' },
+{ oid => '9619', descr => 'position where the match for regexp was located',
+ proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text int4 int4',
+ prosrc => 'regexp_instr_no_return_opt' },
+{ oid => '9620', descr => 'position where the match for regexp was located',
+ proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text int4 int4 int4',
+ prosrc => 'regexp_instr_no_flags' },
+{ oid => '9621', descr => 'position where the match for regexp was located',
+ proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text int4 int4 int4 text',
+ prosrc => 'regexp_instr_no_subexpr' },
+{ oid => '9622', descr => 'position where the match for regexp was located',
+ proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text int4 int4 int4 text int4',
+ prosrc => 'regexp_instr' },
+{ oid => '9623', descr => 'substring that matches the regexp pattern',
+ proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text',
+ prosrc => 'regexp_substr_no_start' },
+{ oid => '9624', descr => 'substring that matchies the regexp pattern',
+ proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text int4',
+ prosrc => 'regexp_substr_no_occurrence' },
+{ oid => '9625', descr => 'substring that matches the regexp pattern',
+ proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text int4 int4',
+ prosrc => 'regexp_substr_no_flags' },
+{ oid => '9626', descr => 'substring that matches the regexp pattern',
+ proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text int4 int4 text',
+ prosrc => 'regexp_substr_no_subexpr' },
+{ oid => '9627', descr => 'substring that matches the regexp pattern',
+ proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text int4 int4 text int4',
+ prosrc => 'regexp_substr' },
+{ oid => '9628', descr => 'evaluate match(es) for regexp',
+ proname => 'regexp_like', prorettype => 'bool', proargtypes => 'text text',
+ prosrc => 'regexp_like_no_flags' },
+{ oid => '9629', descr => 'evaluate match(es) for regexp',
+ proname => 'regexp_like', prorettype => 'bool',
+ proargtypes => 'text text text', prosrc => 'regexp_like' },
+{ oid => '9606', descr => 'replace text using regexp',
+ proname => 'regexp_replace', prorettype => 'text',
+ proargtypes => 'text text text int4 int4 text', prosrc => 'textregexreplace_extended' },
+{ oid => '9607', descr => 'replace text using regexp',
+ proname => 'regexp_replace', prorettype => 'text',
+ proargtypes => 'text text text int4 int4', prosrc => 'textregexreplace_extended_no_flags' },
+{ oid => '9608', descr => 'replace text using regexp',
+ proname => 'regexp_replace', prorettype => 'text',
+ proargtypes => 'text text text int4', prosrc => 'textregexreplace_extended_no_occurrence' },
{ oid => '2088', descr => 'split string by field_sep and return field_num',
proname => 'split_part', prorettype => 'text',
proargtypes => 'text text int4', prosrc => 'split_part' },
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index fb4573d85f..b9cef32e9e 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -905,6 +905,751 @@ SELECT foo, length(foo) FROM regexp_split_to_table('thE QUick bROWn FOx jUMPs ov
ERROR: regexp_split_to_table() does not support the "global" option
SELECT regexp_split_to_array('thE QUick bROWn FOx jUMPs ovEr The lazy dOG', 'e', 'g');
ERROR: regexp_split_to_array() does not support the "global" option
+-- regexp_like tests
+SELECT regexp_like('a'||CHR(10)||'d', 'a.d');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT regexp_like('a'||CHR(10)||'d', 'a.d', 'm');
+ regexp_like
+-------------
+ f
+(1 row)
+
+SELECT regexp_like('a'||CHR(10)||'d', 'a.d', 'n');
+ regexp_like
+-------------
+ f
+(1 row)
+
+SELECT regexp_like('Steven', '^Ste(v|ph)en$');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT regexp_like('foo' || chr(10) || 'bar' || chr(10) || 'bequq' || chr(10) || 'baz', '^bar');
+ regexp_like
+-------------
+ f
+(1 row)
+
+SELECT regexp_like('foo' || chr(10) || 'bar' || chr(10) || 'bequq' || chr(10) || 'baz', 'bar');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT regexp_like('foo' || chr(10) || 'bar' || chr(10) || 'bequq' || chr(10) || 'baz', '^bar', 'm');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT regexp_like('foo' || chr(10) || 'bar' || chr(10) || 'bequq' || chr(10) || 'baz', '^bar', 'n');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT regexp_like('GREEN', '([aeiou])\1');
+ regexp_like
+-------------
+ f
+(1 row)
+
+SELECT regexp_like('GREEN', '([aeiou])\1', 'i');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '([aeiou])\1', 'i');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '^..([aeiou])\1', 'i');
+ regexp_like
+-------------
+ f
+(1 row)
+
+SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '([aeiou])\1', 'in');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '^..([aeiou])\1', 'in');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '^..([aeiou])\1', 'im');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT REGEXP_LIKE('abc', 'a b c');
+ regexp_like
+-------------
+ f
+(1 row)
+
+SELECT REGEXP_LIKE('abc', 'a b c','x');
+ regexp_like
+-------------
+ t
+(1 row)
+
+-- count all matches for regexp
+SELECT regexp_count('123123123123123', '(12)3');
+ regexp_count
+--------------
+ 5
+(1 row)
+
+-- count all matches with start position
+SELECT regexp_count('123123123123', '123', 0);
+ regexp_count
+--------------
+ 4
+(1 row)
+
+SELECT regexp_count('123123123123', '123', 1);
+ regexp_count
+--------------
+ 4
+(1 row)
+
+SELECT regexp_count('123123123123', '123', 3);
+ regexp_count
+--------------
+ 3
+(1 row)
+
+SELECT regexp_count('123123123123', '123', -3);
+ERROR: invalid value for parameter "start_position": -3
+-- count all matches in NULL string with a start position
+SELECT regexp_count(NULL, '123', 3);
+ regexp_count
+--------------
+ \N
+(1 row)
+
+-- count all matches with a start position greater than string length
+SELECT regexp_count('123', '123', 10);
+ regexp_count
+--------------
+ 0
+(1 row)
+
+-- count all matches from different regexp
+SELECT regexp_count('ABC123', '[A-Z]'), regexp_count('A1B2C3', '[A-Z]');
+ regexp_count | regexp_count
+--------------+--------------
+ 3 | 3
+(1 row)
+
+SELECT regexp_count('ABC123', '[A-Z][0-9]'), regexp_count('A1B2C3', '[A-Z][0-9]');
+ regexp_count | regexp_count
+--------------+--------------
+ 1 | 3
+(1 row)
+
+SELECT regexp_count('ABC123', '[A-Z][0-9]{2}'), regexp_count('A1B2C3', '[A-Z][0-9]{2}');
+ regexp_count | regexp_count
+--------------+--------------
+ 1 | 0
+(1 row)
+
+SELECT regexp_count('ABC123', '([A-Z][0-9]){2}'), regexp_count('A1B2C3', '([A-Z][0-9]){2}');
+ regexp_count | regexp_count
+--------------+--------------
+ 0 | 1
+(1 row)
+
+SELECT regexp_count('ABC123A5', '^[A-Z][0-9]'), regexp_count('A1B2C3', '^[A-Z][0-9]');
+ regexp_count | regexp_count
+--------------+--------------
+ 0 | 1
+(1 row)
+
+SELECT regexp_count('ABC123', '[A-Z][0-9]{2}'), regexp_count('A1B2C34', '[A-Z][0-9]{2}');
+ regexp_count | regexp_count
+--------------+--------------
+ 1 | 1
+(1 row)
+
+-- count matches with newline case insensivity
+SELECT regexp_count('a'||CHR(10)||'d', 'a.d');
+ regexp_count
+--------------
+ 1
+(1 row)
+
+SELECT regexp_count('a'||CHR(10)||'d', 'a.d', 1, 's');
+ regexp_count
+--------------
+ 1
+(1 row)
+
+-- count matches with newline case sensivity
+SELECT regexp_count('a'||CHR(10)||'d', 'a.d', 1, 'n');
+ regexp_count
+--------------
+ 0
+(1 row)
+
+-- count not multiline matches
+SELECT regexp_count('a'||CHR(10)||'d', '^d$');
+ regexp_count
+--------------
+ 0
+(1 row)
+
+-- count multiline matches
+SELECT regexp_count('a'||CHR(10)||'d', '^d$', 1, 'm');
+ regexp_count
+--------------
+ 1
+(1 row)
+
+SELECT regexp_count('Hello'||CHR(10)||'world!', '^world!$', 1, 'm'); -- 1
+ regexp_count
+--------------
+ 1
+(1 row)
+
+-- Count the number of occurrences of the substring an in the string.
+SELECT regexp_count('a man, a plan, a canal: Panama', 'an');
+ regexp_count
+--------------
+ 4
+(1 row)
+
+-- Find the number of occurrences of the substring an in the string starting with the fifth character.
+SELECT regexp_count('a man, a plan, a canal: Panama', 'an',5);
+ regexp_count
+--------------
+ 3
+(1 row)
+
+-- Find the number of occurrences of a substring containing a lower-case character
+-- followed by an. In the first example, do not use a modifier. In the second example,
+-- use the i modifier to force the regular expression to ignore case.
+SELECT regexp_count('a man, a plan, a canal: Panama', '[a-z]an');
+ regexp_count
+--------------
+ 3
+(1 row)
+
+SELECT regexp_count('a man, a plan, a canal: Panama', '[a-z]an', 1, 'i');
+ regexp_count
+--------------
+ 4
+(1 row)
+
+DROP TABLE IF EXISTS regexp_temp;
+NOTICE: table "regexp_temp" does not exist, skipping
+CREATE TABLE regexp_temp(fullname varchar(20), email varchar(20));
+INSERT INTO regexp_temp (fullname, email) VALUES ('John Doe', 'johndoe@example.com');
+INSERT INTO regexp_temp (fullname, email) VALUES ('Jane Doe', 'janedoe');
+-- count matches case sensitive
+SELECT fullname, regexp_count(fullname, 'e', 1, 'c') FROM regexp_temp;
+ fullname | regexp_count
+----------+--------------
+ John Doe | 1
+ Jane Doe | 2
+(2 rows)
+
+SELECT fullname, regexp_count(fullname, 'D', 1, 'c') FROM regexp_temp;
+ fullname | regexp_count
+----------+--------------
+ John Doe | 1
+ Jane Doe | 1
+(2 rows)
+
+SELECT fullname, regexp_count(fullname, 'd', 1, 'c') FROM regexp_temp;
+ fullname | regexp_count
+----------+--------------
+ John Doe | 0
+ Jane Doe | 0
+(2 rows)
+
+-- count matches case insensitive
+SELECT fullname, regexp_count(fullname, 'E', 1, 'i') FROM regexp_temp;
+ fullname | regexp_count
+----------+--------------
+ John Doe | 1
+ Jane Doe | 2
+(2 rows)
+
+SELECT fullname, regexp_count(fullname, 'do', 1, 'i') FROM regexp_temp;
+ fullname | regexp_count
+----------+--------------
+ John Doe | 1
+ Jane Doe | 1
+(2 rows)
+
+-- return the start position of the 6th occurence starting at beginning of the string
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[^ ]+', 1, 6);
+ regexp_instr
+--------------
+ 32
+(1 row)
+
+-- return the start position of the 5th occurence starting after the first word
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[^ ]+', 7, 5, 0);
+ regexp_instr
+--------------
+ 32
+(1 row)
+
+-- return the ending position of the 5th occurence starting after the first word
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[^ ]+', 7, 5, 0);
+ regexp_instr
+--------------
+ 32
+(1 row)
+
+-- return the ending position of the 2nd occurence starting after the first word
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[T|Z|S][[:alpha:]]{5}', 7, 2, 1, 'i');
+ regexp_instr
+--------------
+ 30
+(1 row)
+
+-- return the starting position corresponding to the different capture group
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 0);
+ regexp_instr
+--------------
+ 1
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 1);
+ regexp_instr
+--------------
+ 1
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 2);
+ regexp_instr
+--------------
+ 4
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 3);
+ regexp_instr
+--------------
+ 5
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 4);
+ regexp_instr
+--------------
+ 7
+(1 row)
+
+-- return the starting position corresponding to a non existant capture group
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 5);
+ regexp_instr
+--------------
+ 0
+(1 row)
+
+-- Same but with the ending position
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 0);
+ regexp_instr
+--------------
+ 9
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 1);
+ regexp_instr
+--------------
+ 4
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 2);
+ regexp_instr
+--------------
+ 9
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 3);
+ regexp_instr
+--------------
+ 7
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 4);
+ regexp_instr
+--------------
+ 9
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 5);
+ regexp_instr
+--------------
+ 0
+(1 row)
+
+-- start position of a valid email
+SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+') "valid_email" FROM regexp_temp;
+ email | valid_email
+---------------------+-------------
+ johndoe@example.com | 1
+ janedoe | 0
+(2 rows)
+
+-- ending position of a valid email
+SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+', 1, 1, 1) "valid_email" FROM regexp_temp;
+ email | valid_email
+---------------------+-------------
+ johndoe@example.com | 20
+ janedoe | 0
+(2 rows)
+
+-- start position of first capture group in the email (the dot part)
+SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+', 1, 1, 0, 'i', 1) FROM regexp_temp;
+ email | regexp_instr
+---------------------+--------------
+ johndoe@example.com | 16
+ janedoe | 0
+(2 rows)
+
+-- ending position of first capture group in the email (the dot part)
+SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+', 1, 1, 1, 'i', 1) FROM regexp_temp;
+ email | regexp_instr
+---------------------+--------------
+ johndoe@example.com | 20
+ janedoe | 0
+(2 rows)
+
+-- test negative values
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', -1, 1, 1, 'i', 1);
+ERROR: invalid value for parameter "start_position": -1
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, -1, 1, 'i', 1);
+ERROR: invalid value for parameter "occurrence": -1
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, -1, 'i', 1);
+ERROR: invalid value for parameter "return_option": -1
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', -1);
+ERROR: invalid value for parameter "group": -1
+-- Find the first occurrence of a sequence of letters starting with the letter e
+-- and ending with the letter y in the phrase "easy come, easy go."
+SELECT regexp_instr('easy come, easy go','e\w*y');
+ regexp_instr
+--------------
+ 1
+(1 row)
+
+-- Find the first sequence of letters starting with the letter e and ending with
+-- the letter y in the string "easy come, easy go" starting at the second character
+SELECT regexp_instr('easy come, easy go','e\w*y',2);
+ regexp_instr
+--------------
+ 12
+(1 row)
+
+-- Find the second sequence of letters starting with the letter e and ending with
+-- the letter y in the string "easy come, easy go" starting at the first character.
+SELECT regexp_instr('easy come, easy go','e\w*y',1,2);
+ regexp_instr
+--------------
+ 12
+(1 row)
+
+-- Find the position of the first character after the first whitespace in the string "easy come, easy go."
+SELECT regexp_instr('easy come, easy go','\s',1,1,1);
+ regexp_instr
+--------------
+ 6
+(1 row)
+
+-- Find the position of the start of the third word in a string by capturing each
+-- word as a subexpression, and returning the third subexpression's start position.
+SELECT regexp_instr('one two three','(\w+)\s+(\w+)\s+(\w+)', 1,1,0,'',3);
+ regexp_instr
+--------------
+ 9
+(1 row)
+
+-- return the substring matching the regexp
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+');
+ regexp_substr
+----------------
+ , zipcode town
+(1 row)
+
+-- return the substring matching the regexp
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+', 24);
+ regexp_substr
+---------------
+ , FR
+(1 row)
+
+-- return the substring matching the regexp at the first occurrence
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+', 1, 1);
+ regexp_substr
+----------------
+ , zipcode town
+(1 row)
+
+-- return the substring matching the regexp at the second occurrence
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+', 1, 2);
+ regexp_substr
+---------------
+ , FR
+(1 row)
+
+-- case sensitivity substring search
+SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1);
+ regexp_substr
+---------------
+ \N
+(1 row)
+
+SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1, 'i');
+ regexp_substr
+----------------
+ , zipcode town
+(1 row)
+
+-- case sensitivity substring search with no capture group
+SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1, 'i', 0);
+ regexp_substr
+----------------
+ , zipcode town
+(1 row)
+
+-- case sensitivity substring search with non existing capture group
+SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1, 'i', 0);
+ regexp_substr
+----------------
+ , zipcode town
+(1 row)
+
+-- return the substring matching the regexp at different occurrence and capture group
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, 1, 'i', 4);
+ regexp_substr
+---------------
+ 78
+(1 row)
+
+SELECT regexp_substr('1234567890 1234557890', '(123)(4(5[56])(78))', 1, 2, 'i', 3);
+ regexp_substr
+---------------
+ 55
+(1 row)
+
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, 1, 'i', 0);
+ regexp_substr
+---------------
+ 12345678
+(1 row)
+
+-- test negative values
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', -1, 1, 'i', 4);
+ERROR: invalid value for parameter "start_position": -1
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, -1, 'i', 4);
+ERROR: invalid value for parameter "occurrence": -1
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, 1, 'i', -4);
+ERROR: invalid value for parameter "group": -4
+-- Select the first substring of letters that end with "thy."
+SELECT regexp_substr('healthy, wealthy, and wise','\w+thy');
+ regexp_substr
+---------------
+ healthy
+(1 row)
+
+-- Select the first substring of letters that ends with "thy" starting at the second character in the string.
+SELECT regexp_substr('healthy, wealthy, and wise','\w+thy',2);
+ regexp_substr
+---------------
+ ealthy
+(1 row)
+
+-- Return the contents of the third captured subexpression, which captures the third word in the string.
+SELECT regexp_substr('one two three', '(\w+)\s+(\w+)\s+(\w+)', 1, 1, '', 3);
+ regexp_substr
+---------------
+ three
+(1 row)
+
+DROP TABLE IF EXISTS regexp_temp;
+-- Regression tests for extended regexp_replace() function with start position and occurrence
+SELECT regexp_replace('512.123.4567', '([[:digit:]]{3})\.([[:digit:]]{3})\.([[:digit:]]{4})', '(\1) \2-\3', 1);
+ regexp_replace
+----------------
+ (512) 123-4567
+(1 row)
+
+SELECT regexp_replace('512.123.4567 612.123.4567', '([[:digit:]]{3})\.([[:digit:]]{3})\.([[:digit:]]{4})', '(\1) \2-\3', 1, 0);
+ regexp_replace
+-------------------------------
+ (512) 123-4567 (612) 123-4567
+(1 row)
+
+SELECT regexp_replace('number your street,'||CHR(10)||' zipcode town, FR', '( ){2,}', ' ', 1, 0);
+ regexp_replace
+---------------------
+ number your street,+
+ zipcode town, FR
+(1 row)
+
+SELECT regexp_replace('number your street, zipcode town, FR', '( ){2,}', ' ', 9);
+ regexp_replace
+--------------------------------------------
+ number your street, zipcode town, FR
+(1 row)
+
+SELECT regexp_replace('number your street, zipcode town, FR', '( ){2,}', ' ', 9, 0);
+ regexp_replace
+----------------------------------------
+ number your street, zipcode town, FR
+(1 row)
+
+SELECT regexp_replace('number your street, zipcode town, FR', '( ){2,}', ' ', 9, 2);
+ regexp_replace
+---------------------------------------------
+ number your street, zipcode town, FR
+(1 row)
+
+SELECT regexp_replace('number your street, zipcode town, FR', '( ){2,}', ' ', 9, 2, 'm');
+ regexp_replace
+---------------------------------------------
+ number your street, zipcode town, FR
+(1 row)
+
+SELECT regexp_replace('number your street, zipcode town, FR', '([EURT]){2,}', '[\1]', 9, 1, 'i');
+ regexp_replace
+-------------------------------------------------
+ number yo[r] street, zipcode town, FR
+(1 row)
+
+SELECT regexp_replace('number your street, zipcode town, FR', '([EURT]){2,}', '[\1]', 9, 2, 'i');
+ regexp_replace
+----------------------------------------------
+ number your s[t], zipcode town, FR
+(1 row)
+
+SELECT regexp_replace ('A PostgreSQL function', 'A|e|i|o|u', 'X', 1, 2);
+ regexp_replace
+-----------------------
+ A PXstgreSQL function
+(1 row)
+
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 0, 'i');
+ regexp_replace
+-----------------------
+ X PXstgrXSQL fXnctXXn
+(1 row)
+
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 1, 'i');
+ regexp_replace
+-----------------------
+ X PostgreSQL function
+(1 row)
+
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 2, 'i');
+ regexp_replace
+-----------------------
+ A PXstgreSQL function
+(1 row)
+
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 3, 'i');
+ regexp_replace
+-----------------------
+ A PostgrXSQL function
+(1 row)
+
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 9, 'i');
+ regexp_replace
+-----------------------
+ A PostgreSQL function
+(1 row)
+
+SELECT regexp_replace ('A PostgreSQL function', 'A|e|i|o|u', 'X', 1, 9);
+ regexp_replace
+-----------------------
+ A PostgreSQL function
+(1 row)
+
+-- Invalid parameter values
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', -1, 0, 'i');
+ERROR: invalid value for parameter "start_position": -1
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, -1, 'i');
+ERROR: invalid value for parameter "occurrence": -1
+-- Modifier 'g' should not be taken in account, we have an occurrence to replace
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 1, 'g');
+ regexp_replace
+-----------------------
+ A PXstgreSQL function
+(1 row)
+
+-- Find groups of "word characters" (letters, numbers and underscore) ending with
+-- "thy" in the string "healthy, wealthy, and wise" and replace them with nothing.
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', '', 'g');
+ regexp_replace
+----------------
+ , , and wise
+(1 row)
+
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', '', 1, 0);
+ regexp_replace
+----------------
+ , , and wise
+(1 row)
+
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', '', 1, 0, 'g');
+ regexp_replace
+----------------
+ , , and wise
+(1 row)
+
+-- Find groups of word characters ending with "thy" and replace with "something."
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', 'something', 'g');
+ regexp_replace
+--------------------------------
+ something, something, and wise
+(1 row)
+
+-- Find groups of word characters ending with "thy" and replace with the string
+-- "something" starting at the third character in the string.
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', 'something', 3, 0);
+ regexp_replace
+----------------------------------
+ hesomething, something, and wise
+(1 row)
+
+-- Replace the second group of word characters ending with "thy" with "something."
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', 'something', 1, 2);
+ regexp_replace
+------------------------------
+ healthy, something, and wise
+(1 row)
+
+-- Find groups of word characters ending with "thy" capturing the letters before
+-- the "thy", and replace with the captured letters plus the letters "ish."
+SELECT regexp_replace('healthy, wealthy, and wise','(\w+)thy', '\1ish', 'g');
+ regexp_replace
+----------------------------
+ healish, wealish, and wise
+(1 row)
+
+SELECT regexp_replace('healthy, wealthy, and wise','(\w+)thy', '\1ish', 1, 0);
+ regexp_replace
+----------------------------
+ healish, wealish, and wise
+(1 row)
+
-- change NULL-display back
\pset null ''
-- E021-11 position expression
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index 57a48c9d0b..7623e85f1e 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -255,6 +255,199 @@ SELECT regexp_split_to_array('thE QUick bROWn FOx jUMPs ovEr The lazy dOG', 'e',
SELECT foo, length(foo) FROM regexp_split_to_table('thE QUick bROWn FOx jUMPs ovEr The lazy dOG', 'e', 'g') AS foo;
SELECT regexp_split_to_array('thE QUick bROWn FOx jUMPs ovEr The lazy dOG', 'e', 'g');
+-- regexp_like tests
+SELECT regexp_like('a'||CHR(10)||'d', 'a.d');
+SELECT regexp_like('a'||CHR(10)||'d', 'a.d', 'm');
+SELECT regexp_like('a'||CHR(10)||'d', 'a.d', 'n');
+SELECT regexp_like('Steven', '^Ste(v|ph)en$');
+SELECT regexp_like('foo' || chr(10) || 'bar' || chr(10) || 'bequq' || chr(10) || 'baz', '^bar');
+SELECT regexp_like('foo' || chr(10) || 'bar' || chr(10) || 'bequq' || chr(10) || 'baz', 'bar');
+SELECT regexp_like('foo' || chr(10) || 'bar' || chr(10) || 'bequq' || chr(10) || 'baz', '^bar', 'm');
+SELECT regexp_like('foo' || chr(10) || 'bar' || chr(10) || 'bequq' || chr(10) || 'baz', '^bar', 'n');
+SELECT regexp_like('GREEN', '([aeiou])\1');
+SELECT regexp_like('GREEN', '([aeiou])\1', 'i');
+SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '([aeiou])\1', 'i');
+SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '^..([aeiou])\1', 'i');
+SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '([aeiou])\1', 'in');
+SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '^..([aeiou])\1', 'in');
+SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '^..([aeiou])\1', 'im');
+SELECT REGEXP_LIKE('abc', 'a b c');
+SELECT REGEXP_LIKE('abc', 'a b c','x');
+
+-- count all matches for regexp
+SELECT regexp_count('123123123123123', '(12)3');
+-- count all matches with start position
+SELECT regexp_count('123123123123', '123', 0);
+SELECT regexp_count('123123123123', '123', 1);
+SELECT regexp_count('123123123123', '123', 3);
+SELECT regexp_count('123123123123', '123', -3);
+-- count all matches in NULL string with a start position
+SELECT regexp_count(NULL, '123', 3);
+-- count all matches with a start position greater than string length
+SELECT regexp_count('123', '123', 10);
+-- count all matches from different regexp
+SELECT regexp_count('ABC123', '[A-Z]'), regexp_count('A1B2C3', '[A-Z]');
+SELECT regexp_count('ABC123', '[A-Z][0-9]'), regexp_count('A1B2C3', '[A-Z][0-9]');
+SELECT regexp_count('ABC123', '[A-Z][0-9]{2}'), regexp_count('A1B2C3', '[A-Z][0-9]{2}');
+SELECT regexp_count('ABC123', '([A-Z][0-9]){2}'), regexp_count('A1B2C3', '([A-Z][0-9]){2}');
+SELECT regexp_count('ABC123A5', '^[A-Z][0-9]'), regexp_count('A1B2C3', '^[A-Z][0-9]');
+SELECT regexp_count('ABC123', '[A-Z][0-9]{2}'), regexp_count('A1B2C34', '[A-Z][0-9]{2}');
+-- count matches with newline case insensivity
+SELECT regexp_count('a'||CHR(10)||'d', 'a.d');
+SELECT regexp_count('a'||CHR(10)||'d', 'a.d', 1, 's');
+-- count matches with newline case sensivity
+SELECT regexp_count('a'||CHR(10)||'d', 'a.d', 1, 'n');
+-- count not multiline matches
+SELECT regexp_count('a'||CHR(10)||'d', '^d$');
+-- count multiline matches
+SELECT regexp_count('a'||CHR(10)||'d', '^d$', 1, 'm');
+SELECT regexp_count('Hello'||CHR(10)||'world!', '^world!$', 1, 'm'); -- 1
+-- Count the number of occurrences of the substring an in the string.
+SELECT regexp_count('a man, a plan, a canal: Panama', 'an');
+-- Find the number of occurrences of the substring an in the string starting with the fifth character.
+SELECT regexp_count('a man, a plan, a canal: Panama', 'an',5);
+-- Find the number of occurrences of a substring containing a lower-case character
+-- followed by an. In the first example, do not use a modifier. In the second example,
+-- use the i modifier to force the regular expression to ignore case.
+SELECT regexp_count('a man, a plan, a canal: Panama', '[a-z]an');
+SELECT regexp_count('a man, a plan, a canal: Panama', '[a-z]an', 1, 'i');
+
+DROP TABLE IF EXISTS regexp_temp;
+CREATE TABLE regexp_temp(fullname varchar(20), email varchar(20));
+INSERT INTO regexp_temp (fullname, email) VALUES ('John Doe', 'johndoe@example.com');
+INSERT INTO regexp_temp (fullname, email) VALUES ('Jane Doe', 'janedoe');
+-- count matches case sensitive
+SELECT fullname, regexp_count(fullname, 'e', 1, 'c') FROM regexp_temp;
+SELECT fullname, regexp_count(fullname, 'D', 1, 'c') FROM regexp_temp;
+SELECT fullname, regexp_count(fullname, 'd', 1, 'c') FROM regexp_temp;
+-- count matches case insensitive
+SELECT fullname, regexp_count(fullname, 'E', 1, 'i') FROM regexp_temp;
+SELECT fullname, regexp_count(fullname, 'do', 1, 'i') FROM regexp_temp;
+
+-- return the start position of the 6th occurence starting at beginning of the string
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[^ ]+', 1, 6);
+-- return the start position of the 5th occurence starting after the first word
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[^ ]+', 7, 5, 0);
+-- return the ending position of the 5th occurence starting after the first word
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[^ ]+', 7, 5, 0);
+-- return the ending position of the 2nd occurence starting after the first word
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[T|Z|S][[:alpha:]]{5}', 7, 2, 1, 'i');
+-- return the starting position corresponding to the different capture group
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 0);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 1);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 2);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 3);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 4);
+-- return the starting position corresponding to a non existant capture group
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 5);
+-- Same but with the ending position
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 0);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 1);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 2);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 3);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 4);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 5);
+-- start position of a valid email
+SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+') "valid_email" FROM regexp_temp;
+-- ending position of a valid email
+SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+', 1, 1, 1) "valid_email" FROM regexp_temp;
+-- start position of first capture group in the email (the dot part)
+SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+', 1, 1, 0, 'i', 1) FROM regexp_temp;
+-- ending position of first capture group in the email (the dot part)
+SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+', 1, 1, 1, 'i', 1) FROM regexp_temp;
+-- test negative values
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', -1, 1, 1, 'i', 1);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, -1, 1, 'i', 1);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, -1, 'i', 1);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', -1);
+
+-- Find the first occurrence of a sequence of letters starting with the letter e
+-- and ending with the letter y in the phrase "easy come, easy go."
+SELECT regexp_instr('easy come, easy go','e\w*y');
+-- Find the first sequence of letters starting with the letter e and ending with
+-- the letter y in the string "easy come, easy go" starting at the second character
+SELECT regexp_instr('easy come, easy go','e\w*y',2);
+-- Find the second sequence of letters starting with the letter e and ending with
+-- the letter y in the string "easy come, easy go" starting at the first character.
+SELECT regexp_instr('easy come, easy go','e\w*y',1,2);
+-- Find the position of the first character after the first whitespace in the string "easy come, easy go."
+SELECT regexp_instr('easy come, easy go','\s',1,1,1);
+-- Find the position of the start of the third word in a string by capturing each
+-- word as a subexpression, and returning the third subexpression's start position.
+SELECT regexp_instr('one two three','(\w+)\s+(\w+)\s+(\w+)', 1,1,0,'',3);
+
+-- return the substring matching the regexp
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+');
+-- return the substring matching the regexp
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+', 24);
+-- return the substring matching the regexp at the first occurrence
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+', 1, 1);
+-- return the substring matching the regexp at the second occurrence
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+', 1, 2);
+-- case sensitivity substring search
+SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1);
+SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1, 'i');
+-- case sensitivity substring search with no capture group
+SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1, 'i', 0);
+-- case sensitivity substring search with non existing capture group
+SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1, 'i', 0);
+-- return the substring matching the regexp at different occurrence and capture group
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, 1, 'i', 4);
+SELECT regexp_substr('1234567890 1234557890', '(123)(4(5[56])(78))', 1, 2, 'i', 3);
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, 1, 'i', 0);
+-- test negative values
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', -1, 1, 'i', 4);
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, -1, 'i', 4);
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, 1, 'i', -4);
+-- Select the first substring of letters that end with "thy."
+SELECT regexp_substr('healthy, wealthy, and wise','\w+thy');
+-- Select the first substring of letters that ends with "thy" starting at the second character in the string.
+SELECT regexp_substr('healthy, wealthy, and wise','\w+thy',2);
+-- Return the contents of the third captured subexpression, which captures the third word in the string.
+SELECT regexp_substr('one two three', '(\w+)\s+(\w+)\s+(\w+)', 1, 1, '', 3);
+
+DROP TABLE IF EXISTS regexp_temp;
+
+-- Regression tests for extended regexp_replace() function with start position and occurrence
+SELECT regexp_replace('512.123.4567', '([[:digit:]]{3})\.([[:digit:]]{3})\.([[:digit:]]{4})', '(\1) \2-\3', 1);
+SELECT regexp_replace('512.123.4567 612.123.4567', '([[:digit:]]{3})\.([[:digit:]]{3})\.([[:digit:]]{4})', '(\1) \2-\3', 1, 0);
+SELECT regexp_replace('number your street,'||CHR(10)||' zipcode town, FR', '( ){2,}', ' ', 1, 0);
+SELECT regexp_replace('number your street, zipcode town, FR', '( ){2,}', ' ', 9);
+SELECT regexp_replace('number your street, zipcode town, FR', '( ){2,}', ' ', 9, 0);
+SELECT regexp_replace('number your street, zipcode town, FR', '( ){2,}', ' ', 9, 2);
+SELECT regexp_replace('number your street, zipcode town, FR', '( ){2,}', ' ', 9, 2, 'm');
+SELECT regexp_replace('number your street, zipcode town, FR', '([EURT]){2,}', '[\1]', 9, 1, 'i');
+SELECT regexp_replace('number your street, zipcode town, FR', '([EURT]){2,}', '[\1]', 9, 2, 'i');
+SELECT regexp_replace ('A PostgreSQL function', 'A|e|i|o|u', 'X', 1, 2);
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 0, 'i');
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 1, 'i');
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 2, 'i');
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 3, 'i');
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 9, 'i');
+SELECT regexp_replace ('A PostgreSQL function', 'A|e|i|o|u', 'X', 1, 9);
+-- Invalid parameter values
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', -1, 0, 'i');
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, -1, 'i');
+-- Modifier 'g' should not be taken in account, we have an occurrence to replace
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 1, 'g');
+-- Find groups of "word characters" (letters, numbers and underscore) ending with
+-- "thy" in the string "healthy, wealthy, and wise" and replace them with nothing.
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', '', 'g');
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', '', 1, 0);
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', '', 1, 0, 'g');
+-- Find groups of word characters ending with "thy" and replace with "something."
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', 'something', 'g');
+-- Find groups of word characters ending with "thy" and replace with the string
+-- "something" starting at the third character in the string.
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', 'something', 3, 0);
+-- Replace the second group of word characters ending with "thy" with "something."
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', 'something', 1, 2);
+-- Find groups of word characters ending with "thy" capturing the letters before
+-- the "thy", and replace with the captured letters plus the letters "ish."
+SELECT regexp_replace('healthy, wealthy, and wise','(\w+)thy', '\1ish', 'g');
+SELECT regexp_replace('healthy, wealthy, and wise','(\w+)thy', '\1ish', 1, 0);
+
+
-- change NULL-display back
\pset null ''
On 03/21/21 09:19, Gilles Darold wrote:
On 2021.03.20. 19:48 Gilles Darold <gilles@darold.net> wrote:
This is a new version of the patch that now implements all the XQUERY
regexp functions as described in the standard, minus the differences of
PostgerSQL regular expression explain in [1].The standard SQL describe functions like_regex(), occurrences_regex(),
position_regex(), substring_regex() and translate_regex() which
correspond to the commonly named functions regexp_like(),
regexp_count(), regexp_instr(), regexp_substr() and regexp_replace() as
reported by Chapman Flack in [2]. All these function are implemented in
[v2-0001-xquery-regexp-functions.patch]
I quickly looked over this patch preparing to object if it actually
purported to implement the ISO foo_regex() named functions without
the ISO semantics, but a quick grep reassured me that it doesn't
implement any of those functions. It only supplies functions in
the alternative, apparently common de facto naming scheme regexp_foo().
To be clear, I think that's the right call. I do not think it would be
a good idea to supply functions that have the ISO names but not the
specified regex dialect.
A set of functions analogous to the ISO ones but differently named and
with a different regex dialect seems fine to me, especially if these
different names are de facto common, and as far as I can tell, that is
what this patch provides. So I have no objection to that. :)
It might then be fair to say that the /description/ of the patch as
implementing the XQuery-based foo_regex functions isn't quite right,
or at least carries a risk of jarring some readers into hasty
double-takes on Sunday mornings before coffee.
It might be clearer to just mention the close correspondence between
the functions in this differently-named set and the corresponding ISO ones.
If this turns out to be a case of "attached the wrong patch, here's
the one that does implement foo_regex functions!" then I reserve an
objection to that. :)
Regards,
-Chap
Chapman Flack <chap@anastigmatix.net> writes:
If this turns out to be a case of "attached the wrong patch, here's
the one that does implement foo_regex functions!" then I reserve an
objection to that. :)
+1 to that. Just to add a note, I do have some ideas about extending
our regex parser so that it could duplicate the XQuery syntax --- none
of the points we mention in 9.7.3.8 seem insurmountable. I'm not
planning to work on that in the near future, mind you, but I definitely
think that we don't want to paint ourselves into a corner where we've
already implemented the XQuery regex functions with the wrong behavior.
regards, tom lane
Le 21/03/2021 à 15:53, Tom Lane a écrit :
Chapman Flack <chap@anastigmatix.net> writes:
If this turns out to be a case of "attached the wrong patch, here's
the one that does implement foo_regex functions!" then I reserve an
objection to that. :)+1 to that. Just to add a note, I do have some ideas about extending
our regex parser so that it could duplicate the XQuery syntax --- none
of the points we mention in 9.7.3.8 seem insurmountable. I'm not
planning to work on that in the near future, mind you, but I definitely
think that we don't want to paint ourselves into a corner where we've
already implemented the XQuery regex functions with the wrong behavior.regards, tom lane
I apologize for confusing with the words and phrases I have used. This
patch implements the regexp_foo () functions which are available in most
RDBMS with the behavior described in the documentation. I have modified
the title of the patch in the commitfest to removed wrong use of XQUERY.
I don't know too if the other RDBMS respect the XQUERY behavior but for
what I've seen for Oracle they are using limited regexp modifiers with
sometime not the same letter than PostgreSQL for the same behavior. I
have implemented these functions with the Oracle behavior in Orafce [1]https://github.com/orafce/orafce/blob/master/orafce--3.14--3.15.sql
with a function that checks the modifiers used. This patch doesn't mimic
the Oracle behavior, it use the PostgreSQL behavior with regexp, the one
used by regex_replace() and regex_matches(). All regexp modifiers can be
used.
[1]: https://github.com/orafce/orafce/blob/master/orafce--3.14--3.15.sql
--
Gilles Darold
http://www.darold.net/
Le 21/03/2021 à 15:53, Tom Lane a écrit :
Chapman Flack <chap@anastigmatix.net> writes:
If this turns out to be a case of "attached the wrong patch, here's
the one that does implement foo_regex functions!" then I reserve an
objection to that. :)
And the patch renamed.
Attachments:
v4-0001-regexp-foo-functions.patchtext/x-patch; charset=UTF-8; name=v4-0001-regexp-foo-functions.patchDownload
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index fee0561961..36e446cb7b 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3097,6 +3097,66 @@ repeat('Pg', 4) <returnvalue>PgPgPgPg</returnvalue>
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>regexp_count</primary>
+ </indexterm>
+ <function>regexp_count</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type> [, <parameter>position</parameter> <type>integer</type> ] [, <parameter>flags</parameter> <type>text</type> ] )
+ <returnvalue>integer</returnvalue>
+ </para>
+ <para>
+ Returns the number of times a pattern occurs for a match of a POSIX
+ regular expression to the <parameter>string</parameter>; see
+ <xref linkend="functions-posix-regexp"/>.
+ </para>
+ <para>
+ <literal>regexp_count('123456789012', '\d{3}', 3)</literal>
+ <returnvalue>3</returnvalue>
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>regexp_instr</primary>
+ </indexterm>
+ <function>regexp_instr</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type> [, <parameter>position</parameter> <type>integer</type> ] [, <parameter>occurence</parameter> <type>integer</type> ] [, <parameter>returnopt</parameter> <type>integer</type> ] [, <parameter>flags</parameter> <type>text</type> ] [, <parameter>group</parameter> <type>integer</type> ] )
+ <returnvalue>integer</returnvalue>
+ </para>
+ <para>
+ Returns the position within <parameter>string</parameter> where the
+ match of a POSIX regular expression occurs. It returns an integer
+ indicating the beginning or ending position of the matched substring,
+ depending on the value of the <parameter>returnopt</parameter> argument
+ (default beginning). If no match is found the function returns 0;
+ see <xref linkend="functions-posix-regexp"/>.
+ </para>
+ <para>
+ <literal>regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 4)</literal>
+ <returnvalue>7</returnvalue>
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>regexp_like</primary>
+ </indexterm>
+ <function>regexp_like</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type> [, <parameter>flags</parameter> <type>text</type> ] )
+ <returnvalue>boolean</returnvalue>
+ </para>
+ <para>
+ Evaluates the existence of a match to a POSIX regular expression
+ in <parameter>string</parameter>; see <xref linkend="functions-posix-regexp"/>.
+ </para>
+ <para>
+ <literal>regexp_like('Hello'||chr(10)||'world', '^world$', 'm')</literal>
+ <returnvalue>t</returnvalue>
+ </para></entry>
+ </row>
+
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
@@ -3144,7 +3204,7 @@ repeat('Pg', 4) <returnvalue>PgPgPgPg</returnvalue>
<indexterm>
<primary>regexp_replace</primary>
</indexterm>
- <function>regexp_replace</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type>, <parameter>replacement</parameter> <type>text</type> [, <parameter>flags</parameter> <type>text</type> ] )
+ <function>regexp_replace</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type>, <parameter>replacement</parameter> <type>text</type> [, <parameter>position</parameter> <type>integer</type> ] [, <parameter>occurence</parameter> <type>integer</type> ] [, <parameter>flags</parameter> <type>text</type> ] )
<returnvalue>text</returnvalue>
</para>
<para>
@@ -3157,6 +3217,24 @@ repeat('Pg', 4) <returnvalue>PgPgPgPg</returnvalue>
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>regexp_substr</primary>
+ </indexterm>
+ <function>regexp_substr</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type> [, <parameter>position</parameter> <type>integer</type> ] [, <parameter>occurence</parameter> <type>integer</type> ] [, <parameter>flags</parameter> <type>text</type> ] [, <parameter>group</parameter> <type>integer</type> ] )
+ <returnvalue>text</returnvalue>
+ </para>
+ <para>
+ Return the substring within <parameter>string</parameter> corresponding to the
+ match of a POSIX regular expression; see <xref linkend="functions-posix-regexp"/>.
+ </para>
+ <para>
+ <literal>regexp_substr('1234567890 1234557890', '(123)(4(5[56])(78))', 1, 2, 'i', 3)</literal>
+ <returnvalue>55</returnvalue>
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
@@ -5295,6 +5373,18 @@ substring('foobar' similar '#"o_b#"%' escape '#') <lineannotation>NULL</linea
<indexterm>
<primary>regexp_split_to_array</primary>
</indexterm>
+ <indexterm>
+ <primary>regexp_like</primary>
+ </indexterm>
+ <indexterm>
+ <primary>regexp_count</primary>
+ </indexterm>
+ <indexterm>
+ <primary>regexp_instr</primary>
+ </indexterm>
+ <indexterm>
+ <primary>regexp_substr</primary>
+ </indexterm>
<para>
<xref linkend="functions-posix-table"/> lists the available
@@ -5451,6 +5541,8 @@ substring('foobar' from 'o(.)b') <lineannotation>o</lineannotation>
It has the syntax
<function>regexp_replace</function>(<replaceable>source</replaceable>,
<replaceable>pattern</replaceable>, <replaceable>replacement</replaceable>
+ <optional>, <replaceable>position</replaceable> </optional>
+ <optional>, <replaceable>occurrence</replaceable> </optional>
<optional>, <replaceable>flags</replaceable> </optional>).
The <replaceable>source</replaceable> string is returned unchanged if
there is no match to the <replaceable>pattern</replaceable>. If there is a
@@ -5464,12 +5556,19 @@ substring('foobar' from 'o(.)b') <lineannotation>o</lineannotation>
substring matching the entire pattern should be inserted. Write
<literal>\\</literal> if you need to put a literal backslash in the replacement
text.
- The <replaceable>flags</replaceable> parameter is an optional text
- string containing zero or more single-letter flags that change the
+ <replaceable>pattern</replaceable> is searched in <replaceable>string</replaceable> starting
+ from an optional <replaceable>position</replaceable> or from the beginning
+ of <replaceable>source</replaceable> by default. Optional <replaceable>occurrence</replaceable>
+ parameter indicates which occurrence of <replaceable>pattern</replaceable> in
+ <replaceable>source</replaceable> should be replaced. Default value for <replaceable>occurrence</replaceable>
+ is 1, replace only the first occurrence. The <replaceable>flags</replaceable> parameter
+ is an optional text string containing zero or more single-letter flags that change the
function's behavior. Flag <literal>i</literal> specifies case-insensitive
matching, while flag <literal>g</literal> specifies replacement of each matching
- substring rather than only the first one. Supported flags (though
- not <literal>g</literal>) are
+ substring rather than only the first one. When <replaceable>occurrence</replaceable>
+ is set flag <literal>g</literal> has no effect. If <replaceable>occurrence</replaceable>
+ is set to zero, all occurrences are replaced which is similar to flag <literal>g</literal>.
+ Supported flags (though not <literal>g</literal>) are
described in <xref linkend="posix-embedded-options-table"/>.
</para>
@@ -5482,6 +5581,10 @@ regexp_replace('foobarbaz', 'b..', 'X', 'g')
<lineannotation>fooXX</lineannotation>
regexp_replace('foobarbaz', 'b(..)', 'X\1Y', 'g')
<lineannotation>fooXarYXazY</lineannotation>
+regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 0, 'i')
+ <lineannotation>X PXstgrXSQL fXnctXXn</lineannotation>
+regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 3, 'i')
+ <lineannotation>A PostgrXSQL function</lineannotation>
</programlisting>
</para>
@@ -5669,6 +5772,163 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', '\s*') AS foo;
in practice. Other software systems such as Perl use similar definitions.
</para>
+ <para>
+ The <function>regexp_like</function> function evaluates the existence of a match
+ to a POSIX regular expression in <parameter>string</parameter>; returns a boolean
+ resulting from matching a POSIX regular expression pattern to a string. It has
+ the syntax <function>regexp_like</function>(<replaceable>string</replaceable>,
+ <replaceable>pattern</replaceable> <optional>, <replaceable>flags</replaceable> </optional>).
+ <replaceable>pattern</replaceable> is searched in <replaceable>string</replaceable> starting
+ from the beginning of <replaceable>string</replaceable>.
+ The <replaceable>flags</replaceable> parameter is an optional text string
+ containing zero or more single-letter flags that change the function's behavior.
+ <function>regexp_like</function> accepts all the flags
+ shown in <xref linkend="posix-embedded-options-table"/>.
+ This function is similar to regexp operator <literal>~</literal> when used without
+ <replaceable>flags</replaceable> and similar to operator <literal>~*</literal> when
+ used with <replaceable>flags</replaceable> <literal>'i'</literal> only.
+ </para>
+
+ <para>
+ Some examples:
+<programlisting>
+SELECT 'found' FROM (values('Hello'||chr(10)||'world') as f(col) WHERE regexp_like(col, '^world$', 'm');
+ ?column?
+----------
+ found
+(1 row)
+</programlisting>
+ </para>
+
+
+ <para>
+ The <function>regexp_count</function> function returns the number of
+ captured substring(s) resulting from matching a POSIX regular
+ expression pattern to a string. It has the syntax <function>regexp_count</function>(
+ <replaceable>string</replaceable>, <replaceable>pattern</replaceable> <optional>,
+ <replaceable>position</replaceable> </optional> <optional>, <replaceable>flags</replaceable> </optional>).
+ <replaceable>pattern</replaceable> is searched in <replaceable>string</replaceable> starting
+ from an optional <replaceable>position</replaceable> or from the beginning of <replaceable>string</replaceable>
+ by default. The <replaceable>flags</replaceable> parameter is an optional text string
+ containing zero or more single-letter flags that change the function's behavior.
+ <function>regexp_count</function> accepts all the flags
+ shown in <xref linkend="posix-embedded-options-table"/>.
+ The <literal>g</literal> flag is forced internally to count all matches.
+ This function returns 0 if there is no match or the number of matches as
+ an integer.
+ </para>
+
+ <para>
+ Some examples:
+<programlisting>
+SELECT regexp_count('123123123123', '\d{3}', 1);
+ regexp_count
+--------------
+ 4
+(1 row)
+
+SELECT regexp_count('Hello'||CHR(10)||'world!', '^world!$', 1, 'm');
+ regexp_count
+--------------
+ 1
+(1 row)
+</programlisting>
+ </para>
+
+ <para>
+ The <function>regexp_instr</function> function returns the beginning or ending
+ position of the matched substring resulting from matching a POSIX regular
+ expression pattern to a string. It has the syntax <function>regexp_instr</function>(
+ <replaceable>string</replaceable>, <replaceable>pattern</replaceable> <optional>,
+ <replaceable>position</replaceable> </optional> <optional>, <replaceable>occurrence</replaceable> </optional>
+ <optional>, <replaceable>returnopt</replaceable> </optional>
+ <optional>, <replaceable>flags</replaceable> </optional>
+ <optional>, <replaceable>group</replaceable> </optional>).
+ <replaceable>pattern</replaceable> is searched in <replaceable>string</replaceable> starting
+ from an optional <replaceable>position</replaceable> or from the beginning
+ of <replaceable>string</replaceable> by default. Optional <replaceable>occurrence</replaceable> parameter
+ indicates which occurrence of <replaceable>pattern</replaceable> in <replaceable>string</replaceable>
+ should be searched. When optional <replaceable>returnopt</replaceable> parameter is set to 0 (default)
+ the function returns the position of the first character of the occurrence. When set to 1 returns
+ the position of the character after the occurrence.
+ The <replaceable>flags</replaceable> parameter is an optional text string
+ containing zero or more single-letter flags that change the function's behavior.
+ <function>regexp_instr</function> accepts all the flags
+ shown in <xref linkend="posix-embedded-options-table"/>.
+ The <literal>g</literal> flag is forced internally to track all matches.
+ For a pattern with capture groups, <replaceable>group</replaceable> is an integer indicating
+ which capture in <replaceable>pattern</replaceable> is the target of the function.
+ A capture group is a part of the pattern enclosed in parentheses. Capture groups can be nested.
+ They are numbered in the order in which their left parentheses appear in <replaceable>pattern</replaceable>.
+ If <replaceable>group</replaceable> is zero, then the position
+ of the entire substring that matches the pattern is returned. If <replaceable>pattern</replaceable>
+ does not have at least <replaceable>group</replaceable> capture groups, the function returns zero.
+ This function returns 0 if there is no match or the starting or ending position
+ of a match as an integer.
+ </para>
+
+ <para>
+ Some examples:
+<programlisting>
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[^ ]+', 1, 6);
+ regexp_instr
+--------------
+ 32
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 3);
+ regexp_instr
+--------------
+ 5
+(1 row)
+</programlisting>
+ </para>
+
+ <para>
+ The <function>regexp_substr</function> function returns the
+ matched substring resulting from matching a POSIX regular
+ expression pattern to a string. It has the syntax <function>regexp_substr</function>(
+ <replaceable>string</replaceable>, <replaceable>pattern</replaceable> <optional>,
+ <replaceable>position</replaceable> </optional> <optional>, <replaceable>occurrence</replaceable> </optional>
+ <optional>, <replaceable>flags</replaceable> </optional>
+ <optional>, <replaceable>group</replaceable> </optional>).
+ <replaceable>pattern</replaceable> is searched in <replaceable>string</replaceable> starting
+ from an optional <replaceable>position</replaceable> or from the beginning
+ of <replaceable>string</replaceable> by default. Optional parameter <replaceable>occurrence</replaceable>
+ indicates which occurrence of <replaceable>pattern</replaceable> in <replaceable>string</replaceable>
+ should be searched. The <replaceable>flags</replaceable> parameter is an optional text string
+ containing zero or more single-letter flags that change the function's behavior.
+ <function>regexp_substr</function> accepts all the flags
+ shown in <xref linkend="posix-embedded-options-table"/>.
+ The <literal>g</literal> flag is forced internally to track all matches.
+ For a pattern with capture groups, optional <replaceable>group</replaceable> is an integer indicating
+ which capture in <replaceable>pattern</replaceable> is the target of the function. A capture group
+ is a part of the pattern enclosed in parentheses. Capture groups can be nested. They are numbered in
+ the order in which their left parentheses appear in <replaceable>pattern</replaceable>.
+ If <replaceable>group</replaceable> is zero, then the position
+ of the entire substring that matches the pattern is returned. If <replaceable>pattern</replaceable>
+ does not have at least <replaceable>group</replaceable> capture groups, the function returns zero.
+ This function returns NULL if there is no match or the substring of the match.
+ </para>
+
+ <para>
+ Some examples:
+<programlisting>
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+', 1, 2);
+ regexp_substr
+---------------
+ , FR
+(1 row)
+
+SELECT regexp_substr('1234567890 1234557890', '(123)(4(5[56])(78))', 1, 2, 'i', 3);
+ regexp_substr
+---------------
+ 55
+(1 row)
+</programlisting>
+ </para>
+
+
<!-- derived from the re_syntax.n man page -->
<sect3 id="posix-syntax-details">
diff --git a/src/backend/utils/adt/regexp.c b/src/backend/utils/adt/regexp.c
index a32c5c82ab..624c57cec9 100644
--- a/src/backend/utils/adt/regexp.c
+++ b/src/backend/utils/adt/regexp.c
@@ -120,6 +120,7 @@ static regexp_matches_ctx *setup_regexp_matches(text *orig_str, text *pattern,
static ArrayType *build_regexp_match_result(regexp_matches_ctx *matchctx);
static Datum build_regexp_split_result(regexp_matches_ctx *splitctx);
+static text *enclose_with_parenthesis(text *str);
/*
* RE_compile_and_cache - compile a RE, caching if possible
@@ -555,7 +556,6 @@ texticregexne(PG_FUNCTION_ARGS)
0, NULL));
}
-
/*
* textregexsubstr()
* Return a substring matched by a regular expression.
@@ -1063,6 +1063,508 @@ regexp_matches_no_flags(PG_FUNCTION_ARGS)
return regexp_matches(fcinfo);
}
+/*
+ * regexp_like()
+ * Return the true if a pattern match within a string.
+ */
+Datum
+regexp_like(PG_FUNCTION_ARGS)
+{
+ text *str = PG_GETARG_TEXT_PP(0);
+ text *pattern = PG_GETARG_TEXT_PP(1);
+ text *flags = PG_GETARG_TEXT_PP_IF_EXISTS(2);
+
+ pg_re_flags re_flags;
+ regexp_matches_ctx *matchctx;
+
+ /* Determine options */
+ parse_re_flags(&re_flags, flags);
+
+ matchctx = setup_regexp_matches(str, pattern, &re_flags,
+ PG_GET_COLLATION(), true, false, false);
+
+ if (matchctx->nmatches > 0)
+ PG_RETURN_BOOL(true);
+
+ PG_RETURN_BOOL(false);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_like_no_flags(PG_FUNCTION_ARGS)
+{
+ return regexp_like(fcinfo);
+}
+
+/*
+ * regexp_count()
+ * Return the number of matches of a pattern within a string.
+ */
+Datum
+regexp_count(PG_FUNCTION_ARGS)
+{
+ text *orig_str = NULL;
+ text *pattern = PG_GETARG_TEXT_PP(1);
+ int start = 1;
+ text *flags = PG_GETARG_TEXT_PP_IF_EXISTS(3);
+
+ pg_re_flags re_flags;
+ regexp_matches_ctx *matchctx;
+
+ if (PG_NARGS() > 2)
+ start = PG_GETARG_INT32(2);
+
+ if (start < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d", "start_position", start)));
+
+ /* regexp_count(string, pattern, start[, flags]) */
+ if (start > 1)
+ orig_str = DatumGetTextPP(DirectFunctionCall2(text_substr_no_len,
+ PG_GETARG_DATUM(0),
+ Int32GetDatum(start)
+ ));
+ else
+ orig_str = PG_GETARG_TEXT_PP(0);
+
+ /* Determine options */
+ parse_re_flags(&re_flags, flags);
+
+ /* this function require flag 'g' */
+ re_flags.glob = true;
+
+ matchctx = setup_regexp_matches(orig_str, pattern, &re_flags,
+ PG_GET_COLLATION(), true, false, false);
+
+ PG_RETURN_INT32(matchctx->nmatches);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_count_no_start(PG_FUNCTION_ARGS)
+{
+ return regexp_count(fcinfo);
+}
+
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_count_no_flags(PG_FUNCTION_ARGS)
+{
+ return regexp_count(fcinfo);
+}
+
+/*
+ * textregexreplace_extended()
+ * Return a string matched by a regular expression, with replacement.
+ * Extends textregexreplace by allowing a start position and the
+ * choice of the occurrence to replace (0 means all occurrences).
+ */
+Datum
+textregexreplace_extended(PG_FUNCTION_ARGS)
+{
+ text *s = NULL;
+ text *p = PG_GETARG_TEXT_PP(1);
+ text *pattern = enclose_with_parenthesis(PG_GETARG_TEXT_PP(1));
+ text *r = PG_GETARG_TEXT_PP(2);
+ int start = 1;
+ int occurrence = 1;
+ int pos = 0;
+ text *flags = PG_GETARG_TEXT_PP_IF_EXISTS(5);
+ StringInfoData str;
+ regex_t *re;
+ pg_re_flags re_flags;
+ regexp_matches_ctx *matchctx;
+ text *after = NULL;
+
+ /* start position */
+ if (PG_NARGS() > 3)
+ start = PG_GETARG_INT32(3);
+ if (start < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d", "start_position", start)));
+ if (start > 1)
+ s = DatumGetTextPP(DirectFunctionCall2(text_substr_no_len,
+ PG_GETARG_DATUM(0),
+ Int32GetDatum(start)
+ ));
+ else
+ s = PG_GETARG_TEXT_PP(0);
+
+ /* occurrence to replace */
+ if (PG_NARGS() > 4)
+ occurrence = PG_GETARG_INT32(4);
+
+ if (occurrence < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d", "occurrence", occurrence)));
+
+ /* Determine options */
+ parse_re_flags(&re_flags, flags);
+ re_flags.glob = true;
+
+ /* lookup for pattern */
+ matchctx = setup_regexp_matches(s, pattern, &re_flags,
+ PG_GET_COLLATION(), true,
+ false, re_flags.glob);
+
+ /* If no match is found, then the function returns the original string */
+ if (matchctx->nmatches == 0)
+ PG_RETURN_DATUM(PG_GETARG_DATUM(0));
+
+ /* Get the position of the occurence to replace */
+ if (PG_NARGS() > 4 && occurrence > 0)
+ {
+ /* When occurrence exceed matches return the original string */
+ if (occurrence > matchctx->nmatches)
+ PG_RETURN_DATUM(PG_GETARG_DATUM(0));
+
+ pos = (occurrence*matchctx->npatterns*2)-(matchctx->npatterns*2);
+ pos = matchctx->match_locs[pos]+start;
+ }
+ else
+ pos = start;
+
+ /* Normal case without explicit VARIADIC marker */
+ initStringInfo(&str);
+
+ /* Get the string before the occurrence starting */
+ if (pos > 1)
+ {
+ text *before = DatumGetTextPP(DirectFunctionCall3(text_substr,
+ PG_GETARG_DATUM(0),
+ Int32GetDatum(1),
+ Int32GetDatum(pos - 1)));
+ appendStringInfoString(&str, TextDatumGetCString(before));
+ }
+
+ /* all occurences must be replaced? */
+ if (occurrence == 0)
+ re_flags.glob = true;
+ else
+ re_flags.glob = false;
+
+ /* Compile the regular expression */
+ re = RE_compile_and_cache(p, re_flags.cflags, PG_GET_COLLATION());
+
+ /* Get the substring starting at the right occurrence position */
+ after = DatumGetTextPP(DirectFunctionCall2(text_substr_no_len,
+ PG_GETARG_DATUM(0),
+ Int32GetDatum(pos)
+ ));
+
+ appendStringInfoString(&str, TextDatumGetCString(replace_text_regexp(
+ after,
+ (void *) re,
+ r,
+ re_flags.glob)));
+
+ PG_RETURN_TEXT_P(CStringGetTextDatum(str.data));
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+textregexreplace_extended_no_occurrence(PG_FUNCTION_ARGS)
+{
+ return textregexreplace_extended(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+textregexreplace_extended_no_flags(PG_FUNCTION_ARGS)
+{
+ return textregexreplace_extended(fcinfo);
+}
+
+/*
+ * Return text string between parenthesis
+ */
+static text *
+enclose_with_parenthesis(text *str)
+{
+ int len = VARSIZE_ANY_EXHDR(str);
+ text *result;
+ char *ptr;
+
+ result = palloc(len + VARHDRSZ + 2);
+ SET_VARSIZE(result, len + VARHDRSZ + 2);
+ ptr = VARDATA(result);
+ memcpy(ptr, "(", 1);
+ memcpy(ptr+1, VARDATA_ANY(str), len);
+ memcpy(ptr+len+1, ")", 1);
+
+ return result;
+}
+
+/*
+ * regexp_instr()
+ * Return the position within the string where the match was located
+ */
+Datum
+regexp_instr(PG_FUNCTION_ARGS)
+{
+ text *orig_str = NULL;
+ text *pattern = NULL;
+ text *flags = PG_GETARG_TEXT_PP_IF_EXISTS(5);
+ int start = 1;
+ int occurrence = 1;
+ int return_opt = 0;
+ int subexpr = 0;
+ int pos;
+
+ pg_re_flags re_flags;
+ regexp_matches_ctx *matchctx;
+
+ /* regexp_instr(string, pattern, start) */
+ if (PG_NARGS() > 2)
+ start = PG_GETARG_INT32(2);
+
+ if (start < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d", "start_position", start)));
+
+ if (start > 1)
+ orig_str = DatumGetTextPP(DirectFunctionCall2(text_substr_no_len,
+ PG_GETARG_DATUM(0),
+ Int32GetDatum(start)
+ ));
+ else
+ /* regexp_instr(string, pattern) */
+ orig_str = PG_GETARG_TEXT_PP(0);
+
+ if (orig_str == NULL)
+ PG_RETURN_NULL();
+
+ /* regexp_instr(string, pattern, start, occurrence) */
+ if (PG_NARGS() > 3)
+ occurrence = PG_GETARG_INT32(3);
+
+ if (occurrence <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d", "occurrence", occurrence)));
+
+ /* regexp_instr(string, pattern, start, occurrence, return_opt) */
+ if (PG_NARGS() > 4)
+ return_opt = PG_GETARG_INT32(4);
+
+ if (return_opt != 0 && return_opt != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d", "return_option", return_opt)));
+
+ /* regexp_instr(string, pattern, start, occurrence, return_opt, flags, subexpr) */
+ if (PG_NARGS() > 6)
+ subexpr = PG_GETARG_INT32(6);
+ if (subexpr < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d", "group", subexpr)));
+
+ /*
+ * If subexpr is zero (default), then the position of the entire
+ * substring that matches the pattern is returned. Otherwise we
+ * will exactly register the subexpressions given in the pattern.
+ * Enclose pattern between parenthesis to register the position
+ * of the entire substring.
+ */
+ pattern = enclose_with_parenthesis(PG_GETARG_TEXT_PP(1));
+
+ /* Determine options */
+ parse_re_flags(&re_flags, flags);
+ /* this function require flag 'g' */
+ re_flags.glob = true;
+
+ matchctx = setup_regexp_matches(orig_str, pattern, &re_flags,
+ PG_GET_COLLATION(), true, false, false);
+
+ /* If no match is found, then the function returns 0 */
+ if (matchctx->nmatches == 0)
+ PG_RETURN_INT32(0);
+
+ /* When occurrence exceed matches return 0 */
+ if (occurrence > matchctx->nmatches)
+ PG_RETURN_INT32(0);
+
+ /* When subexpr exceed number of subexpression return 0 */
+ if (subexpr > matchctx->npatterns - 1)
+ PG_RETURN_INT32(0);
+
+ /*
+ * Returns the position of the first character of the occurrence
+ * or for subexpression in this occurrence.
+ */
+ pos = (occurrence*matchctx->npatterns*2)-((matchctx->npatterns-subexpr)*2);
+ if (return_opt == 1)
+ pos += 1;
+
+ PG_RETURN_INT32(matchctx->match_locs[pos]+start);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_instr_no_start(PG_FUNCTION_ARGS)
+{
+ return regexp_instr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_instr_no_occurrence(PG_FUNCTION_ARGS)
+{
+ return regexp_instr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_instr_no_return_opt(PG_FUNCTION_ARGS)
+{
+ return regexp_instr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_instr_no_flags(PG_FUNCTION_ARGS)
+{
+ return regexp_instr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_instr_no_subexpr(PG_FUNCTION_ARGS)
+{
+ return regexp_instr(fcinfo);
+}
+
+/*
+ * regexp_substr()
+ * Return the substring within the string that match a regular
+ * expression pattern
+ */
+Datum
+regexp_substr(PG_FUNCTION_ARGS)
+{
+ text *orig_str = NULL;
+ text *pattern = NULL;
+ text *flags = PG_GETARG_TEXT_PP_IF_EXISTS(4);
+ int start = 1;
+ int occurrence = 1;
+ int subexpr = 0;
+ int so, eo, pos;
+
+ pg_re_flags re_flags;
+ regexp_matches_ctx *matchctx;
+
+ if (PG_NARGS() > 2)
+ start = PG_GETARG_INT32(2);
+
+ if (start < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d", "start_position", start)));
+
+ /* regexp_substr(string, pattern, start) */
+ if (start > 1)
+ orig_str = DatumGetTextPP(DirectFunctionCall2(text_substr_no_len,
+ PG_GETARG_DATUM(0),
+ Int32GetDatum(start)
+ ));
+ else
+ /* regexp_substr(string, pattern) */
+ orig_str = PG_GETARG_TEXT_PP(0);
+
+ if (orig_str == NULL)
+ PG_RETURN_NULL();
+
+ /* regexp_substr(string, pattern, start, occurrence) */
+ if (PG_NARGS() > 3)
+ occurrence = PG_GETARG_INT32(3);
+
+ if (occurrence <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d", "occurrence", occurrence)));
+
+ /* regexp_substr(string, pattern, start, occurrence, flags, subexpr) */
+ if (PG_NARGS() > 5)
+ subexpr = PG_GETARG_INT32(5);
+ if (subexpr < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d", "group", subexpr)));
+
+ /*
+ * If subexpr is zero (default), then the position of the entire
+ * substring that matches the pattern is returned. Otherwise we
+ * will exactly register the subexpressions given in the pattern.
+ * Enclose pattern between parenthesis to register the position
+ * of the entire substring.
+ */
+ pattern = enclose_with_parenthesis(PG_GETARG_TEXT_PP(1));
+
+ /* Determine options */
+ parse_re_flags(&re_flags, flags);
+ /* this function require flag 'g' */
+ re_flags.glob = true;
+
+ matchctx = setup_regexp_matches(orig_str, pattern, &re_flags,
+ PG_GET_COLLATION(), true, false, false);
+
+ /* If no match is found, then the function returns NULL */
+ if (matchctx->nmatches == 0)
+ PG_RETURN_NULL();
+
+ /* When occurrence exceed matches return NULL */
+ if (occurrence > matchctx->nmatches)
+ PG_RETURN_NULL();
+
+ /* When subexpr exceed number of subexpression return NULL */
+ if (subexpr > matchctx->npatterns - 1)
+ PG_RETURN_NULL();
+
+ /* Returns the substring corresponding to the occurrence. */
+ pos = (occurrence*matchctx->npatterns*2)-((matchctx->npatterns-subexpr)*2);
+ so = matchctx->match_locs[pos]+1;
+ eo = matchctx->match_locs[pos+1]+1;
+
+ PG_RETURN_DATUM(DirectFunctionCall3(text_substr,
+ PointerGetDatum(matchctx->orig_str),
+ Int32GetDatum(so),
+ Int32GetDatum(eo - so)));
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_substr_no_start(PG_FUNCTION_ARGS)
+{
+ return regexp_substr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_substr_no_occurrence(PG_FUNCTION_ARGS)
+{
+ return regexp_substr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_substr_no_flags(PG_FUNCTION_ARGS)
+{
+ return regexp_substr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_substr_no_subexpr(PG_FUNCTION_ARGS)
+{
+ return regexp_substr(fcinfo);
+}
+
/*
* setup_regexp_matches --- do the initial matching for regexp_match
* and regexp_split functions
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 59d2b71ca9..fb91c54afe 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -3560,6 +3560,63 @@
proname => 'regexp_matches', prorows => '10', proretset => 't',
prorettype => '_text', proargtypes => 'text text text',
prosrc => 'regexp_matches' },
+{ oid => '9614', descr => 'count match(es) for regexp',
+ proname => 'regexp_count', prorettype => 'int4', proargtypes => 'text text',
+ prosrc => 'regexp_count_no_start' },
+{ oid => '9615', descr => 'count match(es) for regexp',
+ proname => 'regexp_count', prorettype => 'int4', proargtypes => 'text text int4',
+ prosrc => 'regexp_count_no_flags' },
+{ oid => '9616', descr => 'count match(es) for regexp',
+ proname => 'regexp_count', prorettype => 'int4',
+ proargtypes => 'text text int4 text', prosrc => 'regexp_count' },
+{ oid => '9617', descr => 'position where the match for regexp was located',
+ proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text',
+ prosrc => 'regexp_instr_no_start' },
+{ oid => '9618', descr => 'position where the match for regexp was located',
+ proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text int4',
+ prosrc => 'regexp_instr_no_occurrence' },
+{ oid => '9619', descr => 'position where the match for regexp was located',
+ proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text int4 int4',
+ prosrc => 'regexp_instr_no_return_opt' },
+{ oid => '9620', descr => 'position where the match for regexp was located',
+ proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text int4 int4 int4',
+ prosrc => 'regexp_instr_no_flags' },
+{ oid => '9621', descr => 'position where the match for regexp was located',
+ proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text int4 int4 int4 text',
+ prosrc => 'regexp_instr_no_subexpr' },
+{ oid => '9622', descr => 'position where the match for regexp was located',
+ proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text int4 int4 int4 text int4',
+ prosrc => 'regexp_instr' },
+{ oid => '9623', descr => 'substring that matches the regexp pattern',
+ proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text',
+ prosrc => 'regexp_substr_no_start' },
+{ oid => '9624', descr => 'substring that matchies the regexp pattern',
+ proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text int4',
+ prosrc => 'regexp_substr_no_occurrence' },
+{ oid => '9625', descr => 'substring that matches the regexp pattern',
+ proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text int4 int4',
+ prosrc => 'regexp_substr_no_flags' },
+{ oid => '9626', descr => 'substring that matches the regexp pattern',
+ proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text int4 int4 text',
+ prosrc => 'regexp_substr_no_subexpr' },
+{ oid => '9627', descr => 'substring that matches the regexp pattern',
+ proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text int4 int4 text int4',
+ prosrc => 'regexp_substr' },
+{ oid => '9628', descr => 'evaluate match(es) for regexp',
+ proname => 'regexp_like', prorettype => 'bool', proargtypes => 'text text',
+ prosrc => 'regexp_like_no_flags' },
+{ oid => '9629', descr => 'evaluate match(es) for regexp',
+ proname => 'regexp_like', prorettype => 'bool',
+ proargtypes => 'text text text', prosrc => 'regexp_like' },
+{ oid => '9606', descr => 'replace text using regexp',
+ proname => 'regexp_replace', prorettype => 'text',
+ proargtypes => 'text text text int4 int4 text', prosrc => 'textregexreplace_extended' },
+{ oid => '9607', descr => 'replace text using regexp',
+ proname => 'regexp_replace', prorettype => 'text',
+ proargtypes => 'text text text int4 int4', prosrc => 'textregexreplace_extended_no_flags' },
+{ oid => '9608', descr => 'replace text using regexp',
+ proname => 'regexp_replace', prorettype => 'text',
+ proargtypes => 'text text text int4', prosrc => 'textregexreplace_extended_no_occurrence' },
{ oid => '2088', descr => 'split string by field_sep and return field_num',
proname => 'split_part', prorettype => 'text',
proargtypes => 'text text int4', prosrc => 'split_part' },
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index fb4573d85f..b9cef32e9e 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -905,6 +905,751 @@ SELECT foo, length(foo) FROM regexp_split_to_table('thE QUick bROWn FOx jUMPs ov
ERROR: regexp_split_to_table() does not support the "global" option
SELECT regexp_split_to_array('thE QUick bROWn FOx jUMPs ovEr The lazy dOG', 'e', 'g');
ERROR: regexp_split_to_array() does not support the "global" option
+-- regexp_like tests
+SELECT regexp_like('a'||CHR(10)||'d', 'a.d');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT regexp_like('a'||CHR(10)||'d', 'a.d', 'm');
+ regexp_like
+-------------
+ f
+(1 row)
+
+SELECT regexp_like('a'||CHR(10)||'d', 'a.d', 'n');
+ regexp_like
+-------------
+ f
+(1 row)
+
+SELECT regexp_like('Steven', '^Ste(v|ph)en$');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT regexp_like('foo' || chr(10) || 'bar' || chr(10) || 'bequq' || chr(10) || 'baz', '^bar');
+ regexp_like
+-------------
+ f
+(1 row)
+
+SELECT regexp_like('foo' || chr(10) || 'bar' || chr(10) || 'bequq' || chr(10) || 'baz', 'bar');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT regexp_like('foo' || chr(10) || 'bar' || chr(10) || 'bequq' || chr(10) || 'baz', '^bar', 'm');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT regexp_like('foo' || chr(10) || 'bar' || chr(10) || 'bequq' || chr(10) || 'baz', '^bar', 'n');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT regexp_like('GREEN', '([aeiou])\1');
+ regexp_like
+-------------
+ f
+(1 row)
+
+SELECT regexp_like('GREEN', '([aeiou])\1', 'i');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '([aeiou])\1', 'i');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '^..([aeiou])\1', 'i');
+ regexp_like
+-------------
+ f
+(1 row)
+
+SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '([aeiou])\1', 'in');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '^..([aeiou])\1', 'in');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '^..([aeiou])\1', 'im');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT REGEXP_LIKE('abc', 'a b c');
+ regexp_like
+-------------
+ f
+(1 row)
+
+SELECT REGEXP_LIKE('abc', 'a b c','x');
+ regexp_like
+-------------
+ t
+(1 row)
+
+-- count all matches for regexp
+SELECT regexp_count('123123123123123', '(12)3');
+ regexp_count
+--------------
+ 5
+(1 row)
+
+-- count all matches with start position
+SELECT regexp_count('123123123123', '123', 0);
+ regexp_count
+--------------
+ 4
+(1 row)
+
+SELECT regexp_count('123123123123', '123', 1);
+ regexp_count
+--------------
+ 4
+(1 row)
+
+SELECT regexp_count('123123123123', '123', 3);
+ regexp_count
+--------------
+ 3
+(1 row)
+
+SELECT regexp_count('123123123123', '123', -3);
+ERROR: invalid value for parameter "start_position": -3
+-- count all matches in NULL string with a start position
+SELECT regexp_count(NULL, '123', 3);
+ regexp_count
+--------------
+ \N
+(1 row)
+
+-- count all matches with a start position greater than string length
+SELECT regexp_count('123', '123', 10);
+ regexp_count
+--------------
+ 0
+(1 row)
+
+-- count all matches from different regexp
+SELECT regexp_count('ABC123', '[A-Z]'), regexp_count('A1B2C3', '[A-Z]');
+ regexp_count | regexp_count
+--------------+--------------
+ 3 | 3
+(1 row)
+
+SELECT regexp_count('ABC123', '[A-Z][0-9]'), regexp_count('A1B2C3', '[A-Z][0-9]');
+ regexp_count | regexp_count
+--------------+--------------
+ 1 | 3
+(1 row)
+
+SELECT regexp_count('ABC123', '[A-Z][0-9]{2}'), regexp_count('A1B2C3', '[A-Z][0-9]{2}');
+ regexp_count | regexp_count
+--------------+--------------
+ 1 | 0
+(1 row)
+
+SELECT regexp_count('ABC123', '([A-Z][0-9]){2}'), regexp_count('A1B2C3', '([A-Z][0-9]){2}');
+ regexp_count | regexp_count
+--------------+--------------
+ 0 | 1
+(1 row)
+
+SELECT regexp_count('ABC123A5', '^[A-Z][0-9]'), regexp_count('A1B2C3', '^[A-Z][0-9]');
+ regexp_count | regexp_count
+--------------+--------------
+ 0 | 1
+(1 row)
+
+SELECT regexp_count('ABC123', '[A-Z][0-9]{2}'), regexp_count('A1B2C34', '[A-Z][0-9]{2}');
+ regexp_count | regexp_count
+--------------+--------------
+ 1 | 1
+(1 row)
+
+-- count matches with newline case insensivity
+SELECT regexp_count('a'||CHR(10)||'d', 'a.d');
+ regexp_count
+--------------
+ 1
+(1 row)
+
+SELECT regexp_count('a'||CHR(10)||'d', 'a.d', 1, 's');
+ regexp_count
+--------------
+ 1
+(1 row)
+
+-- count matches with newline case sensivity
+SELECT regexp_count('a'||CHR(10)||'d', 'a.d', 1, 'n');
+ regexp_count
+--------------
+ 0
+(1 row)
+
+-- count not multiline matches
+SELECT regexp_count('a'||CHR(10)||'d', '^d$');
+ regexp_count
+--------------
+ 0
+(1 row)
+
+-- count multiline matches
+SELECT regexp_count('a'||CHR(10)||'d', '^d$', 1, 'm');
+ regexp_count
+--------------
+ 1
+(1 row)
+
+SELECT regexp_count('Hello'||CHR(10)||'world!', '^world!$', 1, 'm'); -- 1
+ regexp_count
+--------------
+ 1
+(1 row)
+
+-- Count the number of occurrences of the substring an in the string.
+SELECT regexp_count('a man, a plan, a canal: Panama', 'an');
+ regexp_count
+--------------
+ 4
+(1 row)
+
+-- Find the number of occurrences of the substring an in the string starting with the fifth character.
+SELECT regexp_count('a man, a plan, a canal: Panama', 'an',5);
+ regexp_count
+--------------
+ 3
+(1 row)
+
+-- Find the number of occurrences of a substring containing a lower-case character
+-- followed by an. In the first example, do not use a modifier. In the second example,
+-- use the i modifier to force the regular expression to ignore case.
+SELECT regexp_count('a man, a plan, a canal: Panama', '[a-z]an');
+ regexp_count
+--------------
+ 3
+(1 row)
+
+SELECT regexp_count('a man, a plan, a canal: Panama', '[a-z]an', 1, 'i');
+ regexp_count
+--------------
+ 4
+(1 row)
+
+DROP TABLE IF EXISTS regexp_temp;
+NOTICE: table "regexp_temp" does not exist, skipping
+CREATE TABLE regexp_temp(fullname varchar(20), email varchar(20));
+INSERT INTO regexp_temp (fullname, email) VALUES ('John Doe', 'johndoe@example.com');
+INSERT INTO regexp_temp (fullname, email) VALUES ('Jane Doe', 'janedoe');
+-- count matches case sensitive
+SELECT fullname, regexp_count(fullname, 'e', 1, 'c') FROM regexp_temp;
+ fullname | regexp_count
+----------+--------------
+ John Doe | 1
+ Jane Doe | 2
+(2 rows)
+
+SELECT fullname, regexp_count(fullname, 'D', 1, 'c') FROM regexp_temp;
+ fullname | regexp_count
+----------+--------------
+ John Doe | 1
+ Jane Doe | 1
+(2 rows)
+
+SELECT fullname, regexp_count(fullname, 'd', 1, 'c') FROM regexp_temp;
+ fullname | regexp_count
+----------+--------------
+ John Doe | 0
+ Jane Doe | 0
+(2 rows)
+
+-- count matches case insensitive
+SELECT fullname, regexp_count(fullname, 'E', 1, 'i') FROM regexp_temp;
+ fullname | regexp_count
+----------+--------------
+ John Doe | 1
+ Jane Doe | 2
+(2 rows)
+
+SELECT fullname, regexp_count(fullname, 'do', 1, 'i') FROM regexp_temp;
+ fullname | regexp_count
+----------+--------------
+ John Doe | 1
+ Jane Doe | 1
+(2 rows)
+
+-- return the start position of the 6th occurence starting at beginning of the string
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[^ ]+', 1, 6);
+ regexp_instr
+--------------
+ 32
+(1 row)
+
+-- return the start position of the 5th occurence starting after the first word
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[^ ]+', 7, 5, 0);
+ regexp_instr
+--------------
+ 32
+(1 row)
+
+-- return the ending position of the 5th occurence starting after the first word
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[^ ]+', 7, 5, 0);
+ regexp_instr
+--------------
+ 32
+(1 row)
+
+-- return the ending position of the 2nd occurence starting after the first word
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[T|Z|S][[:alpha:]]{5}', 7, 2, 1, 'i');
+ regexp_instr
+--------------
+ 30
+(1 row)
+
+-- return the starting position corresponding to the different capture group
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 0);
+ regexp_instr
+--------------
+ 1
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 1);
+ regexp_instr
+--------------
+ 1
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 2);
+ regexp_instr
+--------------
+ 4
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 3);
+ regexp_instr
+--------------
+ 5
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 4);
+ regexp_instr
+--------------
+ 7
+(1 row)
+
+-- return the starting position corresponding to a non existant capture group
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 5);
+ regexp_instr
+--------------
+ 0
+(1 row)
+
+-- Same but with the ending position
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 0);
+ regexp_instr
+--------------
+ 9
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 1);
+ regexp_instr
+--------------
+ 4
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 2);
+ regexp_instr
+--------------
+ 9
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 3);
+ regexp_instr
+--------------
+ 7
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 4);
+ regexp_instr
+--------------
+ 9
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 5);
+ regexp_instr
+--------------
+ 0
+(1 row)
+
+-- start position of a valid email
+SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+') "valid_email" FROM regexp_temp;
+ email | valid_email
+---------------------+-------------
+ johndoe@example.com | 1
+ janedoe | 0
+(2 rows)
+
+-- ending position of a valid email
+SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+', 1, 1, 1) "valid_email" FROM regexp_temp;
+ email | valid_email
+---------------------+-------------
+ johndoe@example.com | 20
+ janedoe | 0
+(2 rows)
+
+-- start position of first capture group in the email (the dot part)
+SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+', 1, 1, 0, 'i', 1) FROM regexp_temp;
+ email | regexp_instr
+---------------------+--------------
+ johndoe@example.com | 16
+ janedoe | 0
+(2 rows)
+
+-- ending position of first capture group in the email (the dot part)
+SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+', 1, 1, 1, 'i', 1) FROM regexp_temp;
+ email | regexp_instr
+---------------------+--------------
+ johndoe@example.com | 20
+ janedoe | 0
+(2 rows)
+
+-- test negative values
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', -1, 1, 1, 'i', 1);
+ERROR: invalid value for parameter "start_position": -1
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, -1, 1, 'i', 1);
+ERROR: invalid value for parameter "occurrence": -1
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, -1, 'i', 1);
+ERROR: invalid value for parameter "return_option": -1
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', -1);
+ERROR: invalid value for parameter "group": -1
+-- Find the first occurrence of a sequence of letters starting with the letter e
+-- and ending with the letter y in the phrase "easy come, easy go."
+SELECT regexp_instr('easy come, easy go','e\w*y');
+ regexp_instr
+--------------
+ 1
+(1 row)
+
+-- Find the first sequence of letters starting with the letter e and ending with
+-- the letter y in the string "easy come, easy go" starting at the second character
+SELECT regexp_instr('easy come, easy go','e\w*y',2);
+ regexp_instr
+--------------
+ 12
+(1 row)
+
+-- Find the second sequence of letters starting with the letter e and ending with
+-- the letter y in the string "easy come, easy go" starting at the first character.
+SELECT regexp_instr('easy come, easy go','e\w*y',1,2);
+ regexp_instr
+--------------
+ 12
+(1 row)
+
+-- Find the position of the first character after the first whitespace in the string "easy come, easy go."
+SELECT regexp_instr('easy come, easy go','\s',1,1,1);
+ regexp_instr
+--------------
+ 6
+(1 row)
+
+-- Find the position of the start of the third word in a string by capturing each
+-- word as a subexpression, and returning the third subexpression's start position.
+SELECT regexp_instr('one two three','(\w+)\s+(\w+)\s+(\w+)', 1,1,0,'',3);
+ regexp_instr
+--------------
+ 9
+(1 row)
+
+-- return the substring matching the regexp
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+');
+ regexp_substr
+----------------
+ , zipcode town
+(1 row)
+
+-- return the substring matching the regexp
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+', 24);
+ regexp_substr
+---------------
+ , FR
+(1 row)
+
+-- return the substring matching the regexp at the first occurrence
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+', 1, 1);
+ regexp_substr
+----------------
+ , zipcode town
+(1 row)
+
+-- return the substring matching the regexp at the second occurrence
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+', 1, 2);
+ regexp_substr
+---------------
+ , FR
+(1 row)
+
+-- case sensitivity substring search
+SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1);
+ regexp_substr
+---------------
+ \N
+(1 row)
+
+SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1, 'i');
+ regexp_substr
+----------------
+ , zipcode town
+(1 row)
+
+-- case sensitivity substring search with no capture group
+SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1, 'i', 0);
+ regexp_substr
+----------------
+ , zipcode town
+(1 row)
+
+-- case sensitivity substring search with non existing capture group
+SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1, 'i', 0);
+ regexp_substr
+----------------
+ , zipcode town
+(1 row)
+
+-- return the substring matching the regexp at different occurrence and capture group
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, 1, 'i', 4);
+ regexp_substr
+---------------
+ 78
+(1 row)
+
+SELECT regexp_substr('1234567890 1234557890', '(123)(4(5[56])(78))', 1, 2, 'i', 3);
+ regexp_substr
+---------------
+ 55
+(1 row)
+
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, 1, 'i', 0);
+ regexp_substr
+---------------
+ 12345678
+(1 row)
+
+-- test negative values
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', -1, 1, 'i', 4);
+ERROR: invalid value for parameter "start_position": -1
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, -1, 'i', 4);
+ERROR: invalid value for parameter "occurrence": -1
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, 1, 'i', -4);
+ERROR: invalid value for parameter "group": -4
+-- Select the first substring of letters that end with "thy."
+SELECT regexp_substr('healthy, wealthy, and wise','\w+thy');
+ regexp_substr
+---------------
+ healthy
+(1 row)
+
+-- Select the first substring of letters that ends with "thy" starting at the second character in the string.
+SELECT regexp_substr('healthy, wealthy, and wise','\w+thy',2);
+ regexp_substr
+---------------
+ ealthy
+(1 row)
+
+-- Return the contents of the third captured subexpression, which captures the third word in the string.
+SELECT regexp_substr('one two three', '(\w+)\s+(\w+)\s+(\w+)', 1, 1, '', 3);
+ regexp_substr
+---------------
+ three
+(1 row)
+
+DROP TABLE IF EXISTS regexp_temp;
+-- Regression tests for extended regexp_replace() function with start position and occurrence
+SELECT regexp_replace('512.123.4567', '([[:digit:]]{3})\.([[:digit:]]{3})\.([[:digit:]]{4})', '(\1) \2-\3', 1);
+ regexp_replace
+----------------
+ (512) 123-4567
+(1 row)
+
+SELECT regexp_replace('512.123.4567 612.123.4567', '([[:digit:]]{3})\.([[:digit:]]{3})\.([[:digit:]]{4})', '(\1) \2-\3', 1, 0);
+ regexp_replace
+-------------------------------
+ (512) 123-4567 (612) 123-4567
+(1 row)
+
+SELECT regexp_replace('number your street,'||CHR(10)||' zipcode town, FR', '( ){2,}', ' ', 1, 0);
+ regexp_replace
+---------------------
+ number your street,+
+ zipcode town, FR
+(1 row)
+
+SELECT regexp_replace('number your street, zipcode town, FR', '( ){2,}', ' ', 9);
+ regexp_replace
+--------------------------------------------
+ number your street, zipcode town, FR
+(1 row)
+
+SELECT regexp_replace('number your street, zipcode town, FR', '( ){2,}', ' ', 9, 0);
+ regexp_replace
+----------------------------------------
+ number your street, zipcode town, FR
+(1 row)
+
+SELECT regexp_replace('number your street, zipcode town, FR', '( ){2,}', ' ', 9, 2);
+ regexp_replace
+---------------------------------------------
+ number your street, zipcode town, FR
+(1 row)
+
+SELECT regexp_replace('number your street, zipcode town, FR', '( ){2,}', ' ', 9, 2, 'm');
+ regexp_replace
+---------------------------------------------
+ number your street, zipcode town, FR
+(1 row)
+
+SELECT regexp_replace('number your street, zipcode town, FR', '([EURT]){2,}', '[\1]', 9, 1, 'i');
+ regexp_replace
+-------------------------------------------------
+ number yo[r] street, zipcode town, FR
+(1 row)
+
+SELECT regexp_replace('number your street, zipcode town, FR', '([EURT]){2,}', '[\1]', 9, 2, 'i');
+ regexp_replace
+----------------------------------------------
+ number your s[t], zipcode town, FR
+(1 row)
+
+SELECT regexp_replace ('A PostgreSQL function', 'A|e|i|o|u', 'X', 1, 2);
+ regexp_replace
+-----------------------
+ A PXstgreSQL function
+(1 row)
+
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 0, 'i');
+ regexp_replace
+-----------------------
+ X PXstgrXSQL fXnctXXn
+(1 row)
+
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 1, 'i');
+ regexp_replace
+-----------------------
+ X PostgreSQL function
+(1 row)
+
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 2, 'i');
+ regexp_replace
+-----------------------
+ A PXstgreSQL function
+(1 row)
+
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 3, 'i');
+ regexp_replace
+-----------------------
+ A PostgrXSQL function
+(1 row)
+
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 9, 'i');
+ regexp_replace
+-----------------------
+ A PostgreSQL function
+(1 row)
+
+SELECT regexp_replace ('A PostgreSQL function', 'A|e|i|o|u', 'X', 1, 9);
+ regexp_replace
+-----------------------
+ A PostgreSQL function
+(1 row)
+
+-- Invalid parameter values
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', -1, 0, 'i');
+ERROR: invalid value for parameter "start_position": -1
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, -1, 'i');
+ERROR: invalid value for parameter "occurrence": -1
+-- Modifier 'g' should not be taken in account, we have an occurrence to replace
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 1, 'g');
+ regexp_replace
+-----------------------
+ A PXstgreSQL function
+(1 row)
+
+-- Find groups of "word characters" (letters, numbers and underscore) ending with
+-- "thy" in the string "healthy, wealthy, and wise" and replace them with nothing.
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', '', 'g');
+ regexp_replace
+----------------
+ , , and wise
+(1 row)
+
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', '', 1, 0);
+ regexp_replace
+----------------
+ , , and wise
+(1 row)
+
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', '', 1, 0, 'g');
+ regexp_replace
+----------------
+ , , and wise
+(1 row)
+
+-- Find groups of word characters ending with "thy" and replace with "something."
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', 'something', 'g');
+ regexp_replace
+--------------------------------
+ something, something, and wise
+(1 row)
+
+-- Find groups of word characters ending with "thy" and replace with the string
+-- "something" starting at the third character in the string.
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', 'something', 3, 0);
+ regexp_replace
+----------------------------------
+ hesomething, something, and wise
+(1 row)
+
+-- Replace the second group of word characters ending with "thy" with "something."
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', 'something', 1, 2);
+ regexp_replace
+------------------------------
+ healthy, something, and wise
+(1 row)
+
+-- Find groups of word characters ending with "thy" capturing the letters before
+-- the "thy", and replace with the captured letters plus the letters "ish."
+SELECT regexp_replace('healthy, wealthy, and wise','(\w+)thy', '\1ish', 'g');
+ regexp_replace
+----------------------------
+ healish, wealish, and wise
+(1 row)
+
+SELECT regexp_replace('healthy, wealthy, and wise','(\w+)thy', '\1ish', 1, 0);
+ regexp_replace
+----------------------------
+ healish, wealish, and wise
+(1 row)
+
-- change NULL-display back
\pset null ''
-- E021-11 position expression
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index 57a48c9d0b..7623e85f1e 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -255,6 +255,199 @@ SELECT regexp_split_to_array('thE QUick bROWn FOx jUMPs ovEr The lazy dOG', 'e',
SELECT foo, length(foo) FROM regexp_split_to_table('thE QUick bROWn FOx jUMPs ovEr The lazy dOG', 'e', 'g') AS foo;
SELECT regexp_split_to_array('thE QUick bROWn FOx jUMPs ovEr The lazy dOG', 'e', 'g');
+-- regexp_like tests
+SELECT regexp_like('a'||CHR(10)||'d', 'a.d');
+SELECT regexp_like('a'||CHR(10)||'d', 'a.d', 'm');
+SELECT regexp_like('a'||CHR(10)||'d', 'a.d', 'n');
+SELECT regexp_like('Steven', '^Ste(v|ph)en$');
+SELECT regexp_like('foo' || chr(10) || 'bar' || chr(10) || 'bequq' || chr(10) || 'baz', '^bar');
+SELECT regexp_like('foo' || chr(10) || 'bar' || chr(10) || 'bequq' || chr(10) || 'baz', 'bar');
+SELECT regexp_like('foo' || chr(10) || 'bar' || chr(10) || 'bequq' || chr(10) || 'baz', '^bar', 'm');
+SELECT regexp_like('foo' || chr(10) || 'bar' || chr(10) || 'bequq' || chr(10) || 'baz', '^bar', 'n');
+SELECT regexp_like('GREEN', '([aeiou])\1');
+SELECT regexp_like('GREEN', '([aeiou])\1', 'i');
+SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '([aeiou])\1', 'i');
+SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '^..([aeiou])\1', 'i');
+SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '([aeiou])\1', 'in');
+SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '^..([aeiou])\1', 'in');
+SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '^..([aeiou])\1', 'im');
+SELECT REGEXP_LIKE('abc', 'a b c');
+SELECT REGEXP_LIKE('abc', 'a b c','x');
+
+-- count all matches for regexp
+SELECT regexp_count('123123123123123', '(12)3');
+-- count all matches with start position
+SELECT regexp_count('123123123123', '123', 0);
+SELECT regexp_count('123123123123', '123', 1);
+SELECT regexp_count('123123123123', '123', 3);
+SELECT regexp_count('123123123123', '123', -3);
+-- count all matches in NULL string with a start position
+SELECT regexp_count(NULL, '123', 3);
+-- count all matches with a start position greater than string length
+SELECT regexp_count('123', '123', 10);
+-- count all matches from different regexp
+SELECT regexp_count('ABC123', '[A-Z]'), regexp_count('A1B2C3', '[A-Z]');
+SELECT regexp_count('ABC123', '[A-Z][0-9]'), regexp_count('A1B2C3', '[A-Z][0-9]');
+SELECT regexp_count('ABC123', '[A-Z][0-9]{2}'), regexp_count('A1B2C3', '[A-Z][0-9]{2}');
+SELECT regexp_count('ABC123', '([A-Z][0-9]){2}'), regexp_count('A1B2C3', '([A-Z][0-9]){2}');
+SELECT regexp_count('ABC123A5', '^[A-Z][0-9]'), regexp_count('A1B2C3', '^[A-Z][0-9]');
+SELECT regexp_count('ABC123', '[A-Z][0-9]{2}'), regexp_count('A1B2C34', '[A-Z][0-9]{2}');
+-- count matches with newline case insensivity
+SELECT regexp_count('a'||CHR(10)||'d', 'a.d');
+SELECT regexp_count('a'||CHR(10)||'d', 'a.d', 1, 's');
+-- count matches with newline case sensivity
+SELECT regexp_count('a'||CHR(10)||'d', 'a.d', 1, 'n');
+-- count not multiline matches
+SELECT regexp_count('a'||CHR(10)||'d', '^d$');
+-- count multiline matches
+SELECT regexp_count('a'||CHR(10)||'d', '^d$', 1, 'm');
+SELECT regexp_count('Hello'||CHR(10)||'world!', '^world!$', 1, 'm'); -- 1
+-- Count the number of occurrences of the substring an in the string.
+SELECT regexp_count('a man, a plan, a canal: Panama', 'an');
+-- Find the number of occurrences of the substring an in the string starting with the fifth character.
+SELECT regexp_count('a man, a plan, a canal: Panama', 'an',5);
+-- Find the number of occurrences of a substring containing a lower-case character
+-- followed by an. In the first example, do not use a modifier. In the second example,
+-- use the i modifier to force the regular expression to ignore case.
+SELECT regexp_count('a man, a plan, a canal: Panama', '[a-z]an');
+SELECT regexp_count('a man, a plan, a canal: Panama', '[a-z]an', 1, 'i');
+
+DROP TABLE IF EXISTS regexp_temp;
+CREATE TABLE regexp_temp(fullname varchar(20), email varchar(20));
+INSERT INTO regexp_temp (fullname, email) VALUES ('John Doe', 'johndoe@example.com');
+INSERT INTO regexp_temp (fullname, email) VALUES ('Jane Doe', 'janedoe');
+-- count matches case sensitive
+SELECT fullname, regexp_count(fullname, 'e', 1, 'c') FROM regexp_temp;
+SELECT fullname, regexp_count(fullname, 'D', 1, 'c') FROM regexp_temp;
+SELECT fullname, regexp_count(fullname, 'd', 1, 'c') FROM regexp_temp;
+-- count matches case insensitive
+SELECT fullname, regexp_count(fullname, 'E', 1, 'i') FROM regexp_temp;
+SELECT fullname, regexp_count(fullname, 'do', 1, 'i') FROM regexp_temp;
+
+-- return the start position of the 6th occurence starting at beginning of the string
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[^ ]+', 1, 6);
+-- return the start position of the 5th occurence starting after the first word
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[^ ]+', 7, 5, 0);
+-- return the ending position of the 5th occurence starting after the first word
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[^ ]+', 7, 5, 0);
+-- return the ending position of the 2nd occurence starting after the first word
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[T|Z|S][[:alpha:]]{5}', 7, 2, 1, 'i');
+-- return the starting position corresponding to the different capture group
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 0);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 1);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 2);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 3);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 4);
+-- return the starting position corresponding to a non existant capture group
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 5);
+-- Same but with the ending position
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 0);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 1);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 2);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 3);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 4);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 5);
+-- start position of a valid email
+SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+') "valid_email" FROM regexp_temp;
+-- ending position of a valid email
+SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+', 1, 1, 1) "valid_email" FROM regexp_temp;
+-- start position of first capture group in the email (the dot part)
+SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+', 1, 1, 0, 'i', 1) FROM regexp_temp;
+-- ending position of first capture group in the email (the dot part)
+SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+', 1, 1, 1, 'i', 1) FROM regexp_temp;
+-- test negative values
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', -1, 1, 1, 'i', 1);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, -1, 1, 'i', 1);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, -1, 'i', 1);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', -1);
+
+-- Find the first occurrence of a sequence of letters starting with the letter e
+-- and ending with the letter y in the phrase "easy come, easy go."
+SELECT regexp_instr('easy come, easy go','e\w*y');
+-- Find the first sequence of letters starting with the letter e and ending with
+-- the letter y in the string "easy come, easy go" starting at the second character
+SELECT regexp_instr('easy come, easy go','e\w*y',2);
+-- Find the second sequence of letters starting with the letter e and ending with
+-- the letter y in the string "easy come, easy go" starting at the first character.
+SELECT regexp_instr('easy come, easy go','e\w*y',1,2);
+-- Find the position of the first character after the first whitespace in the string "easy come, easy go."
+SELECT regexp_instr('easy come, easy go','\s',1,1,1);
+-- Find the position of the start of the third word in a string by capturing each
+-- word as a subexpression, and returning the third subexpression's start position.
+SELECT regexp_instr('one two three','(\w+)\s+(\w+)\s+(\w+)', 1,1,0,'',3);
+
+-- return the substring matching the regexp
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+');
+-- return the substring matching the regexp
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+', 24);
+-- return the substring matching the regexp at the first occurrence
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+', 1, 1);
+-- return the substring matching the regexp at the second occurrence
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+', 1, 2);
+-- case sensitivity substring search
+SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1);
+SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1, 'i');
+-- case sensitivity substring search with no capture group
+SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1, 'i', 0);
+-- case sensitivity substring search with non existing capture group
+SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1, 'i', 0);
+-- return the substring matching the regexp at different occurrence and capture group
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, 1, 'i', 4);
+SELECT regexp_substr('1234567890 1234557890', '(123)(4(5[56])(78))', 1, 2, 'i', 3);
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, 1, 'i', 0);
+-- test negative values
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', -1, 1, 'i', 4);
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, -1, 'i', 4);
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, 1, 'i', -4);
+-- Select the first substring of letters that end with "thy."
+SELECT regexp_substr('healthy, wealthy, and wise','\w+thy');
+-- Select the first substring of letters that ends with "thy" starting at the second character in the string.
+SELECT regexp_substr('healthy, wealthy, and wise','\w+thy',2);
+-- Return the contents of the third captured subexpression, which captures the third word in the string.
+SELECT regexp_substr('one two three', '(\w+)\s+(\w+)\s+(\w+)', 1, 1, '', 3);
+
+DROP TABLE IF EXISTS regexp_temp;
+
+-- Regression tests for extended regexp_replace() function with start position and occurrence
+SELECT regexp_replace('512.123.4567', '([[:digit:]]{3})\.([[:digit:]]{3})\.([[:digit:]]{4})', '(\1) \2-\3', 1);
+SELECT regexp_replace('512.123.4567 612.123.4567', '([[:digit:]]{3})\.([[:digit:]]{3})\.([[:digit:]]{4})', '(\1) \2-\3', 1, 0);
+SELECT regexp_replace('number your street,'||CHR(10)||' zipcode town, FR', '( ){2,}', ' ', 1, 0);
+SELECT regexp_replace('number your street, zipcode town, FR', '( ){2,}', ' ', 9);
+SELECT regexp_replace('number your street, zipcode town, FR', '( ){2,}', ' ', 9, 0);
+SELECT regexp_replace('number your street, zipcode town, FR', '( ){2,}', ' ', 9, 2);
+SELECT regexp_replace('number your street, zipcode town, FR', '( ){2,}', ' ', 9, 2, 'm');
+SELECT regexp_replace('number your street, zipcode town, FR', '([EURT]){2,}', '[\1]', 9, 1, 'i');
+SELECT regexp_replace('number your street, zipcode town, FR', '([EURT]){2,}', '[\1]', 9, 2, 'i');
+SELECT regexp_replace ('A PostgreSQL function', 'A|e|i|o|u', 'X', 1, 2);
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 0, 'i');
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 1, 'i');
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 2, 'i');
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 3, 'i');
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 9, 'i');
+SELECT regexp_replace ('A PostgreSQL function', 'A|e|i|o|u', 'X', 1, 9);
+-- Invalid parameter values
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', -1, 0, 'i');
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, -1, 'i');
+-- Modifier 'g' should not be taken in account, we have an occurrence to replace
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 1, 'g');
+-- Find groups of "word characters" (letters, numbers and underscore) ending with
+-- "thy" in the string "healthy, wealthy, and wise" and replace them with nothing.
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', '', 'g');
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', '', 1, 0);
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', '', 1, 0, 'g');
+-- Find groups of word characters ending with "thy" and replace with "something."
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', 'something', 'g');
+-- Find groups of word characters ending with "thy" and replace with the string
+-- "something" starting at the third character in the string.
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', 'something', 3, 0);
+-- Replace the second group of word characters ending with "thy" with "something."
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', 'something', 1, 2);
+-- Find groups of word characters ending with "thy" capturing the letters before
+-- the "thy", and replace with the captured letters plus the letters "ish."
+SELECT regexp_replace('healthy, wealthy, and wise','(\w+)thy', '\1ish', 'g');
+SELECT regexp_replace('healthy, wealthy, and wise','(\w+)thy', '\1ish', 1, 0);
+
+
-- change NULL-display back
\pset null ''
Gilles Darold <gillesdarold@gmail.com> writes:
[ v4-0001-regexp-foo-functions.patch ]
I started to work through this and was distressed to realize that
it's trying to redefine regexp_replace() in an incompatible way.
We already have
regression=# \df regexp_replace
List of functions
Schema | Name | Result data type | Argument data types | Type
------------+----------------+------------------+------------------------+------
pg_catalog | regexp_replace | text | text, text, text | func
pg_catalog | regexp_replace | text | text, text, text, text | func
(2 rows)
The patch proposes to add (among other alternatives)
+{ oid => '9608', descr => 'replace text using regexp',
+ proname => 'regexp_replace', prorettype => 'text',
+ proargtypes => 'text text text int4', prosrc => 'textregexreplace_extended_no_occurrence' },
which is going to be impossibly confusing for both humans and machines.
I don't think we should go there. Even if you managed to construct
examples that didn't result in "ambiguous function" failures, that
doesn't mean that ordinary mortals won't get bit that way.
I'm inclined to just drop the regexp_replace additions. I don't think
that the extra parameters Oracle provides here are especially useful.
They're definitely not useful enough to justify creating compatibility
hazards for.
regards, tom lane
Le 26/07/2021 à 21:56, Tom Lane a écrit :
Gilles Darold <gillesdarold@gmail.com> writes:
[ v4-0001-regexp-foo-functions.patch ]
I started to work through this and was distressed to realize that
it's trying to redefine regexp_replace() in an incompatible way.
We already haveregression=# \df regexp_replace
List of functions
Schema | Name | Result data type | Argument data types | Type
------------+----------------+------------------+------------------------+------
pg_catalog | regexp_replace | text | text, text, text | func
pg_catalog | regexp_replace | text | text, text, text, text | func
(2 rows)The patch proposes to add (among other alternatives)
+{ oid => '9608', descr => 'replace text using regexp', + proname => 'regexp_replace', prorettype => 'text', + proargtypes => 'text text text int4', prosrc => 'textregexreplace_extended_no_occurrence' },which is going to be impossibly confusing for both humans and machines.
I don't think we should go there. Even if you managed to construct
examples that didn't result in "ambiguous function" failures, that
doesn't mean that ordinary mortals won't get bit that way.I'm inclined to just drop the regexp_replace additions. I don't think
that the extra parameters Oracle provides here are especially useful.
They're definitely not useful enough to justify creating compatibility
hazards for.
I would not say that being able to replace the Nth occurrence of a
pattern matching is not useful but i agree that this is not a common
case with replacement. Both Oracle [1]https://docs.oracle.com/database/121/SQLRF/functions163.htm#SQLRF06302 and IBM DB2 [2]https://www.ibm.com/docs/en/db2oc?topic=functions-regexp-replace propose this form
and I have though that we can not have compatibility issues because of
the different data type at the 4th parameter. Anyway, maybe we can just
rename the function even if I would prefer that regexp_replace() be
extended. For example:
regexp_replace(source, pattern, replacement [, flags ]);
regexp_substitute(source, pattern, replacement [, position ] [,
occurrence ] [, flags ]);
of course with only 3 parameters the two functions are the same.
What do you think about the renaming proposal instead of simply drop the
extended form of the function?
Best regards,
[1]: https://docs.oracle.com/database/121/SQLRF/functions163.htm#SQLRF06302
[2]: https://www.ibm.com/docs/en/db2oc?topic=functions-regexp-replace
--
Gilles Darold
http://www.darold.net/
Gilles Darold <gilles@darold.net> writes:
Le 26/07/2021 à 21:56, Tom Lane a écrit :
I'm inclined to just drop the regexp_replace additions. I don't think
that the extra parameters Oracle provides here are especially useful.
They're definitely not useful enough to justify creating compatibility
hazards for.
I would not say that being able to replace the Nth occurrence of a
pattern matching is not useful but i agree that this is not a common
case with replacement. Both Oracle [1] and IBM DB2 [2] propose this form
and I have though that we can not have compatibility issues because of
the different data type at the 4th parameter.
Well, here's an example of the potential issues:
regression=# create function rr(text,text,text,text) returns text
regression-# language sql as $$select 'text'$$;
CREATE FUNCTION
regression=# create function rr(text,text,text,int4) returns text
language sql as $$select 'int4'$$;
CREATE FUNCTION
regression=# select rr('a','b','c','d');
rr
------
text
(1 row)
regression=# select rr('a','b','c',42);
rr
------
int4
(1 row)
So far so good, but:
regression=# prepare rr as select rr('a','b','c',$1);
PREPARE
regression=# execute rr(12);
rr
------
text
(1 row)
So somebody trying to use the 4-parameter Oracle form from, say, JDBC
would get bit if they were sloppy about specifying parameter types.
The one saving grace is that digits aren't valid regexp flags,
so the outcome would be something like
regression=# select regexp_replace('a','b','c','12');
ERROR: invalid regular expression option: "1"
which'd be less difficult to debug than silent misbehavior.
Conversely, if you thought you were passing flags but it somehow
got interpreted as a start position, that would fail too:
regression=# prepare rri as select rr('a','b','c', $1::int);
PREPARE
regression=# execute rri('gi');
ERROR: invalid input syntax for type integer: "gi"
LINE 1: execute rri('gi');
^
Still, I bet a lot that we'd see periodic bug reports complaining
that it doesn't work.
Anyway, maybe we can just
rename the function even if I would prefer that regexp_replace() be
extended. For example:
regexp_replace(source, pattern, replacement [, flags ]);
regexp_substitute(source, pattern, replacement [, position ] [,
occurrence ] [, flags ]);
Hmm. Of course the entire selling point of this patch seems to be
bug-compatibility with Oracle, so using different names is largely
defeating the point :-(
Maybe we should just hold our noses and do it. The point that
you'd get a recognizable failure if the wrong function were chosen
reassures me a little bit. We've seen a lot of cases where this
sort of ambiguity results in the system just silently doing something
different from what you expected, and I was afraid that that could
happen here.
regards, tom lane
I've been working through this patch, and trying to verify
compatibility against Oracle and DB2, and I see some points that need
discussion or at least recording for the archives.
* In Oracle, while the documentation for regexp_instr says that
return_option should only be 0 or 1, experimentation with sqlfiddle
shows that any nonzero value is silently treated as 1. The patch
raises an error for other values, which I think is a good idea.
(IBM's docs say that DB2 raises an error too, though I can't test
that.) We don't need to be bug-compatible to that extent.
* What should happen when the subexpression/capture group number of
regexp_instr or regexp_substr exceeds the number of parenthesized
subexpressions of the regexp? Oracle silently returns a no-match
result (0 or NULL), as does this patch. However, IBM's docs say
that DB2 raises an error. I'm inclined to think that this is
likewise taking bug-compatibility too far, and that we should
raise an error like DB2. There are clearly cases where throwing
an error would help debug a faulty call, while I'm less clear on
a use-case where not throwing an error would be useful.
* IBM's docs say that both regexp_count and regexp_like have
arguments "string, pattern [, start] [, flags]" --- that is,
each of start and flags can be independently specified or omitted.
The patch follows Oracle, which has no start option for
regexp_like, and where you can't write flags for regexp_count
without writing start. This is fine by me, because doing these
like DB2 would introduce the same which-argument-is-this issues
as we're being forced to cope with for regexp_replace. I don't
think we need to accept ambiguity in these cases too. But it's
worth memorializing this decision in the thread.
* The patch has most of these functions silently ignoring the 'g'
flag, but I think they should raise errors instead. Oracle doesn't
accept a 'g' flag for these, so why should we? The only case where
that logic doesn't hold is regexp_replace, because depending on which
syntax you use the 'g' flag might or might not be meaningful. So
for regexp_replace, I'd vote for silently ignoring 'g' if the
occurrence-number parameter is given, while honoring it if not.
I've already made changes in my local copy per the last item,
but I've not done anything about throwing errors for out-of-range
subexpression numbers. Anybody have an opinion about that one?
regards, tom lane
Le 30/07/2021 à 23:38, Tom Lane a écrit :
Gilles Darold <gilles@darold.net> writes:
Le 26/07/2021 à 21:56, Tom Lane a écrit :
I'm inclined to just drop the regexp_replace additions. I don't think
that the extra parameters Oracle provides here are especially useful.
They're definitely not useful enough to justify creating compatibility
hazards for.I would not say that being able to replace the Nth occurrence of a
pattern matching is not useful but i agree that this is not a common
case with replacement. Both Oracle [1] and IBM DB2 [2] propose this form
and I have though that we can not have compatibility issues because of
the different data type at the 4th parameter.Well, here's an example of the potential issues:
[...]
Thanks for pointing me this case, I did not think that the prepared
statement could lead to this confusion.
Anyway, maybe we can just
rename the function even if I would prefer that regexp_replace() be
extended. For example:
regexp_replace(source, pattern, replacement [, flags ]);
regexp_substitute(source, pattern, replacement [, position ] [,
occurrence ] [, flags ]);Hmm. Of course the entire selling point of this patch seems to be
bug-compatibility with Oracle, so using different names is largely
defeating the point :-(Maybe we should just hold our noses and do it. The point that
you'd get a recognizable failure if the wrong function were chosen
reassures me a little bit. We've seen a lot of cases where this
sort of ambiguity results in the system just silently doing something
different from what you expected, and I was afraid that that could
happen here.
I join a new version of the patch that include a check of the option
parameter in the basic form of regexp_replace() and return an error in
ambiguous cases.
PREPARE rr AS SELECT regexp_replace('healthy, wealthy, and
wise','(\w+)thy', '\1ish', $1);
EXECUTE rr(1);
ERROR: ambiguous use of the option parameter in regex_replace(),
value: 1
HINT: you might set the occurrence parameter to force the use of
the extended form of regex_replace()
This is done by checking if the option parameter value is an integer and
throw the error in this case. I don't think of anything better.
Best regards,
--
Gilles Darold
Attachments:
v5-0001-regexp-foo-functions.patchtext/x-patch; charset=UTF-8; name=v5-0001-regexp-foo-functions.patchDownload
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index a5b6adc4bb..02d1f72e1e 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3108,6 +3108,66 @@ repeat('Pg', 4) <returnvalue>PgPgPgPg</returnvalue>
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>regexp_count</primary>
+ </indexterm>
+ <function>regexp_count</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type> [, <parameter>position</parameter> <type>integer</type> ] [, <parameter>flags</parameter> <type>text</type> ] )
+ <returnvalue>integer</returnvalue>
+ </para>
+ <para>
+ Returns the number of times a pattern occurs for a match of a POSIX
+ regular expression to the <parameter>string</parameter>; see
+ <xref linkend="functions-posix-regexp"/>.
+ </para>
+ <para>
+ <literal>regexp_count('123456789012', '\d{3}', 3)</literal>
+ <returnvalue>3</returnvalue>
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>regexp_instr</primary>
+ </indexterm>
+ <function>regexp_instr</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type> [, <parameter>position</parameter> <type>integer</type> ] [, <parameter>occurence</parameter> <type>integer</type> ] [, <parameter>returnopt</parameter> <type>integer</type> ] [, <parameter>flags</parameter> <type>text</type> ] [, <parameter>group</parameter> <type>integer</type> ] )
+ <returnvalue>integer</returnvalue>
+ </para>
+ <para>
+ Returns the position within <parameter>string</parameter> where the
+ match of a POSIX regular expression occurs. It returns an integer
+ indicating the beginning or ending position of the matched substring,
+ depending on the value of the <parameter>returnopt</parameter> argument
+ (default beginning). If no match is found the function returns 0;
+ see <xref linkend="functions-posix-regexp"/>.
+ </para>
+ <para>
+ <literal>regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 4)</literal>
+ <returnvalue>7</returnvalue>
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>regexp_like</primary>
+ </indexterm>
+ <function>regexp_like</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type> [, <parameter>flags</parameter> <type>text</type> ] )
+ <returnvalue>boolean</returnvalue>
+ </para>
+ <para>
+ Evaluates the existence of a match to a POSIX regular expression
+ in <parameter>string</parameter>; see <xref linkend="functions-posix-regexp"/>.
+ </para>
+ <para>
+ <literal>regexp_like('Hello'||chr(10)||'world', '^world$', 'm')</literal>
+ <returnvalue>t</returnvalue>
+ </para></entry>
+ </row>
+
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
@@ -3156,7 +3216,7 @@ repeat('Pg', 4) <returnvalue>PgPgPgPg</returnvalue>
<indexterm>
<primary>regexp_replace</primary>
</indexterm>
- <function>regexp_replace</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type>, <parameter>replacement</parameter> <type>text</type> [, <parameter>flags</parameter> <type>text</type> ] )
+ <function>regexp_replace</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type>, <parameter>replacement</parameter> <type>text</type> [, <parameter>position</parameter> <type>integer</type> ] [, <parameter>occurence</parameter> <type>integer</type> ] [, <parameter>flags</parameter> <type>text</type> ] )
<returnvalue>text</returnvalue>
</para>
<para>
@@ -3171,6 +3231,24 @@ repeat('Pg', 4) <returnvalue>PgPgPgPg</returnvalue>
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>regexp_substr</primary>
+ </indexterm>
+ <function>regexp_substr</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type> [, <parameter>position</parameter> <type>integer</type> ] [, <parameter>occurence</parameter> <type>integer</type> ] [, <parameter>flags</parameter> <type>text</type> ] [, <parameter>group</parameter> <type>integer</type> ] )
+ <returnvalue>text</returnvalue>
+ </para>
+ <para>
+ Return the substring within <parameter>string</parameter> corresponding to the
+ match of a POSIX regular expression; see <xref linkend="functions-posix-regexp"/>.
+ </para>
+ <para>
+ <literal>regexp_substr('1234567890 1234557890', '(123)(4(5[56])(78))', 1, 2, 'i', 3)</literal>
+ <returnvalue>55</returnvalue>
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
@@ -5392,6 +5470,18 @@ substring('foobar' similar '#"o_b#"%' escape '#') <lineannotation>NULL</linea
<indexterm>
<primary>regexp_split_to_array</primary>
</indexterm>
+ <indexterm>
+ <primary>regexp_like</primary>
+ </indexterm>
+ <indexterm>
+ <primary>regexp_count</primary>
+ </indexterm>
+ <indexterm>
+ <primary>regexp_instr</primary>
+ </indexterm>
+ <indexterm>
+ <primary>regexp_substr</primary>
+ </indexterm>
<para>
<xref linkend="functions-posix-table"/> lists the available
@@ -5650,6 +5740,8 @@ SELECT col1, (SELECT regexp_matches(col2, '(bar)(beque)')) FROM tab;
It has the syntax
<function>regexp_replace</function>(<replaceable>source</replaceable>,
<replaceable>pattern</replaceable>, <replaceable>replacement</replaceable>
+ <optional>, <replaceable>position</replaceable> </optional>
+ <optional>, <replaceable>occurrence</replaceable> </optional>
<optional>, <replaceable>flags</replaceable> </optional>).
The <replaceable>source</replaceable> string is returned unchanged if
there is no match to the <replaceable>pattern</replaceable>. If there is a
@@ -5663,12 +5755,19 @@ SELECT col1, (SELECT regexp_matches(col2, '(bar)(beque)')) FROM tab;
substring matching the entire pattern should be inserted. Write
<literal>\\</literal> if you need to put a literal backslash in the replacement
text.
- The <replaceable>flags</replaceable> parameter is an optional text
- string containing zero or more single-letter flags that change the
+ <replaceable>pattern</replaceable> is searched in <replaceable>string</replaceable> starting
+ from an optional <replaceable>position</replaceable> or from the beginning
+ of <replaceable>source</replaceable> by default. Optional <replaceable>occurrence</replaceable>
+ parameter indicates which occurrence of <replaceable>pattern</replaceable> in
+ <replaceable>source</replaceable> should be replaced. Default value for <replaceable>occurrence</replaceable>
+ is 1, replace only the first occurrence. The <replaceable>flags</replaceable> parameter
+ is an optional text string containing zero or more single-letter flags that change the
function's behavior. Flag <literal>i</literal> specifies case-insensitive
matching, while flag <literal>g</literal> specifies replacement of each matching
- substring rather than only the first one. Supported flags (though
- not <literal>g</literal>) are
+ substring rather than only the first one. When <replaceable>occurrence</replaceable>
+ is set flag <literal>g</literal> has no effect. If <replaceable>occurrence</replaceable>
+ is set to zero, all occurrences are replaced which is similar to flag <literal>g</literal>.
+ Supported flags (though not <literal>g</literal>) are
described in <xref linkend="posix-embedded-options-table"/>.
</para>
@@ -5681,6 +5780,10 @@ regexp_replace('foobarbaz', 'b..', 'X', 'g')
<lineannotation>fooXX</lineannotation>
regexp_replace('foobarbaz', 'b(..)', 'X\1Y', 'g')
<lineannotation>fooXarYXazY</lineannotation>
+regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 0, 'i')
+ <lineannotation>X PXstgrXSQL fXnctXXn</lineannotation>
+regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 3, 'i')
+ <lineannotation>A PostgrXSQL function</lineannotation>
</programlisting>
</para>
@@ -5766,6 +5869,163 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', '\s*') AS foo;
in practice. Other software systems such as Perl use similar definitions.
</para>
+ <para>
+ The <function>regexp_like</function> function evaluates the existence of a match
+ to a POSIX regular expression in <parameter>string</parameter>; returns a boolean
+ resulting from matching a POSIX regular expression pattern to a string. It has
+ the syntax <function>regexp_like</function>(<replaceable>string</replaceable>,
+ <replaceable>pattern</replaceable> <optional>, <replaceable>flags</replaceable> </optional>).
+ <replaceable>pattern</replaceable> is searched in <replaceable>string</replaceable> starting
+ from the beginning of <replaceable>string</replaceable>.
+ The <replaceable>flags</replaceable> parameter is an optional text string
+ containing zero or more single-letter flags that change the function's behavior.
+ <function>regexp_like</function> accepts all the flags
+ shown in <xref linkend="posix-embedded-options-table"/>.
+ This function is similar to regexp operator <literal>~</literal> when used without
+ <replaceable>flags</replaceable> and similar to operator <literal>~*</literal> when
+ used with <replaceable>flags</replaceable> <literal>'i'</literal> only.
+ </para>
+
+ <para>
+ Some examples:
+<programlisting>
+SELECT 'found' FROM (values('Hello'||chr(10)||'world') as f(col) WHERE regexp_like(col, '^world$', 'm');
+ ?column?
+----------
+ found
+(1 row)
+</programlisting>
+ </para>
+
+
+ <para>
+ The <function>regexp_count</function> function returns the number of
+ captured substring(s) resulting from matching a POSIX regular
+ expression pattern to a string. It has the syntax <function>regexp_count</function>(
+ <replaceable>string</replaceable>, <replaceable>pattern</replaceable> <optional>,
+ <replaceable>position</replaceable> </optional> <optional>, <replaceable>flags</replaceable> </optional>).
+ <replaceable>pattern</replaceable> is searched in <replaceable>string</replaceable> starting
+ from an optional <replaceable>position</replaceable> or from the beginning of <replaceable>string</replaceable>
+ by default. The <replaceable>flags</replaceable> parameter is an optional text string
+ containing zero or more single-letter flags that change the function's behavior.
+ <function>regexp_count</function> accepts all the flags
+ shown in <xref linkend="posix-embedded-options-table"/>.
+ The <literal>g</literal> flag is forced internally to count all matches.
+ This function returns 0 if there is no match or the number of matches as
+ an integer.
+ </para>
+
+ <para>
+ Some examples:
+<programlisting>
+SELECT regexp_count('123123123123', '\d{3}', 1);
+ regexp_count
+--------------
+ 4
+(1 row)
+
+SELECT regexp_count('Hello'||CHR(10)||'world!', '^world!$', 1, 'm');
+ regexp_count
+--------------
+ 1
+(1 row)
+</programlisting>
+ </para>
+
+ <para>
+ The <function>regexp_instr</function> function returns the beginning or ending
+ position of the matched substring resulting from matching a POSIX regular
+ expression pattern to a string. It has the syntax <function>regexp_instr</function>(
+ <replaceable>string</replaceable>, <replaceable>pattern</replaceable> <optional>,
+ <replaceable>position</replaceable> </optional> <optional>, <replaceable>occurrence</replaceable> </optional>
+ <optional>, <replaceable>returnopt</replaceable> </optional>
+ <optional>, <replaceable>flags</replaceable> </optional>
+ <optional>, <replaceable>group</replaceable> </optional>).
+ <replaceable>pattern</replaceable> is searched in <replaceable>string</replaceable> starting
+ from an optional <replaceable>position</replaceable> or from the beginning
+ of <replaceable>string</replaceable> by default. Optional <replaceable>occurrence</replaceable> parameter
+ indicates which occurrence of <replaceable>pattern</replaceable> in <replaceable>string</replaceable>
+ should be searched. When optional <replaceable>returnopt</replaceable> parameter is set to 0 (default)
+ the function returns the position of the first character of the occurrence. When set to 1 returns
+ the position of the character after the occurrence.
+ The <replaceable>flags</replaceable> parameter is an optional text string
+ containing zero or more single-letter flags that change the function's behavior.
+ <function>regexp_instr</function> accepts all the flags
+ shown in <xref linkend="posix-embedded-options-table"/>.
+ The <literal>g</literal> flag is forced internally to track all matches.
+ For a pattern with capture groups, <replaceable>group</replaceable> is an integer indicating
+ which capture in <replaceable>pattern</replaceable> is the target of the function.
+ A capture group is a part of the pattern enclosed in parentheses. Capture groups can be nested.
+ They are numbered in the order in which their left parentheses appear in <replaceable>pattern</replaceable>.
+ If <replaceable>group</replaceable> is zero, then the position
+ of the entire substring that matches the pattern is returned. If <replaceable>pattern</replaceable>
+ does not have at least <replaceable>group</replaceable> capture groups, the function returns zero.
+ This function returns 0 if there is no match or the starting or ending position
+ of a match as an integer.
+ </para>
+
+ <para>
+ Some examples:
+<programlisting>
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[^ ]+', 1, 6);
+ regexp_instr
+--------------
+ 32
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 3);
+ regexp_instr
+--------------
+ 5
+(1 row)
+</programlisting>
+ </para>
+
+ <para>
+ The <function>regexp_substr</function> function returns the
+ matched substring resulting from matching a POSIX regular
+ expression pattern to a string. It has the syntax <function>regexp_substr</function>(
+ <replaceable>string</replaceable>, <replaceable>pattern</replaceable> <optional>,
+ <replaceable>position</replaceable> </optional> <optional>, <replaceable>occurrence</replaceable> </optional>
+ <optional>, <replaceable>flags</replaceable> </optional>
+ <optional>, <replaceable>group</replaceable> </optional>).
+ <replaceable>pattern</replaceable> is searched in <replaceable>string</replaceable> starting
+ from an optional <replaceable>position</replaceable> or from the beginning
+ of <replaceable>string</replaceable> by default. Optional parameter <replaceable>occurrence</replaceable>
+ indicates which occurrence of <replaceable>pattern</replaceable> in <replaceable>string</replaceable>
+ should be searched. The <replaceable>flags</replaceable> parameter is an optional text string
+ containing zero or more single-letter flags that change the function's behavior.
+ <function>regexp_substr</function> accepts all the flags
+ shown in <xref linkend="posix-embedded-options-table"/>.
+ The <literal>g</literal> flag is forced internally to track all matches.
+ For a pattern with capture groups, optional <replaceable>group</replaceable> is an integer indicating
+ which capture in <replaceable>pattern</replaceable> is the target of the function. A capture group
+ is a part of the pattern enclosed in parentheses. Capture groups can be nested. They are numbered in
+ the order in which their left parentheses appear in <replaceable>pattern</replaceable>.
+ If <replaceable>group</replaceable> is zero, then the position
+ of the entire substring that matches the pattern is returned. If <replaceable>pattern</replaceable>
+ does not have at least <replaceable>group</replaceable> capture groups, the function returns zero.
+ This function returns NULL if there is no match or the substring of the match.
+ </para>
+
+ <para>
+ Some examples:
+<programlisting>
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+', 1, 2);
+ regexp_substr
+---------------
+ , FR
+(1 row)
+
+SELECT regexp_substr('1234567890 1234557890', '(123)(4(5[56])(78))', 1, 2, 'i', 3);
+ regexp_substr
+---------------
+ 55
+(1 row)
+</programlisting>
+ </para>
+
+
<!-- derived from the re_syntax.n man page -->
<sect3 id="posix-syntax-details">
diff --git a/src/backend/utils/adt/regexp.c b/src/backend/utils/adt/regexp.c
index a32c5c82ab..f34cce18ef 100644
--- a/src/backend/utils/adt/regexp.c
+++ b/src/backend/utils/adt/regexp.c
@@ -120,6 +120,7 @@ static regexp_matches_ctx *setup_regexp_matches(text *orig_str, text *pattern,
static ArrayType *build_regexp_match_result(regexp_matches_ctx *matchctx);
static Datum build_regexp_split_result(regexp_matches_ctx *splitctx);
+static text *enclose_with_parenthesis(text *str);
/*
* RE_compile_and_cache - compile a RE, caching if possible
@@ -555,7 +556,6 @@ texticregexne(PG_FUNCTION_ARGS)
0, NULL));
}
-
/*
* textregexsubstr()
* Return a substring matched by a regular expression.
@@ -646,6 +646,32 @@ textregexreplace(PG_FUNCTION_ARGS)
regex_t *re;
pg_re_flags flags;
+
+ /*
+ * When the function is called with four parameters in a prepared
+ * statement the basic regexp_replace() function can be called
+ * instead of the extended form. This is because the parameter type
+ * is unknown at prepare time and the basic form is chosen. Check
+ * that the fourth parameter is not an integer otherwise return an
+ * error that we call the extended form.
+ */
+ if (PG_NARGS() == 4)
+ {
+ char *badopt;
+
+ (void) strtol(TextDatumGetCString(opt), &badopt, 10);
+ if (strcmp(badopt, "") == 0)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("ambiguous use of the option parameter"
+ " in regex_replace(), value: %s",
+ TextDatumGetCString(opt)),
+ errhint("you might set the occurrence parameter to force"
+ " the use of the extended form of regex_replace()")));
+ }
+ }
+
parse_re_flags(&flags, opt);
re = RE_compile_and_cache(p, flags.cflags, PG_GET_COLLATION());
@@ -1063,6 +1089,508 @@ regexp_matches_no_flags(PG_FUNCTION_ARGS)
return regexp_matches(fcinfo);
}
+/*
+ * regexp_like()
+ * Return the true if a pattern match within a string.
+ */
+Datum
+regexp_like(PG_FUNCTION_ARGS)
+{
+ text *str = PG_GETARG_TEXT_PP(0);
+ text *pattern = PG_GETARG_TEXT_PP(1);
+ text *flags = PG_GETARG_TEXT_PP_IF_EXISTS(2);
+
+ pg_re_flags re_flags;
+ regexp_matches_ctx *matchctx;
+
+ /* Determine options */
+ parse_re_flags(&re_flags, flags);
+
+ matchctx = setup_regexp_matches(str, pattern, &re_flags,
+ PG_GET_COLLATION(), true, false, false);
+
+ if (matchctx->nmatches > 0)
+ PG_RETURN_BOOL(true);
+
+ PG_RETURN_BOOL(false);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_like_no_flags(PG_FUNCTION_ARGS)
+{
+ return regexp_like(fcinfo);
+}
+
+/*
+ * regexp_count()
+ * Return the number of matches of a pattern within a string.
+ */
+Datum
+regexp_count(PG_FUNCTION_ARGS)
+{
+ text *orig_str = NULL;
+ text *pattern = PG_GETARG_TEXT_PP(1);
+ int start = 1;
+ text *flags = PG_GETARG_TEXT_PP_IF_EXISTS(3);
+
+ pg_re_flags re_flags;
+ regexp_matches_ctx *matchctx;
+
+ if (PG_NARGS() > 2)
+ start = PG_GETARG_INT32(2);
+
+ if (start < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d", "start_position", start)));
+
+ /* regexp_count(string, pattern, start[, flags]) */
+ if (start > 1)
+ orig_str = DatumGetTextPP(DirectFunctionCall2(text_substr_no_len,
+ PG_GETARG_DATUM(0),
+ Int32GetDatum(start)
+ ));
+ else
+ orig_str = PG_GETARG_TEXT_PP(0);
+
+ /* Determine options */
+ parse_re_flags(&re_flags, flags);
+
+ /* this function require flag 'g' */
+ re_flags.glob = true;
+
+ matchctx = setup_regexp_matches(orig_str, pattern, &re_flags,
+ PG_GET_COLLATION(), true, false, false);
+
+ PG_RETURN_INT32(matchctx->nmatches);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_count_no_start(PG_FUNCTION_ARGS)
+{
+ return regexp_count(fcinfo);
+}
+
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_count_no_flags(PG_FUNCTION_ARGS)
+{
+ return regexp_count(fcinfo);
+}
+
+/*
+ * textregexreplace_extended()
+ * Return a string matched by a regular expression, with replacement.
+ * Extends textregexreplace by allowing a start position and the
+ * choice of the occurrence to replace (0 means all occurrences).
+ */
+Datum
+textregexreplace_extended(PG_FUNCTION_ARGS)
+{
+ text *s = NULL;
+ text *p = PG_GETARG_TEXT_PP(1);
+ text *pattern = enclose_with_parenthesis(PG_GETARG_TEXT_PP(1));
+ text *r = PG_GETARG_TEXT_PP(2);
+ int start = 1;
+ int occurrence = 1;
+ int pos = 0;
+ text *flags = PG_GETARG_TEXT_PP_IF_EXISTS(5);
+ StringInfoData str;
+ regex_t *re;
+ pg_re_flags re_flags;
+ regexp_matches_ctx *matchctx;
+ text *after = NULL;
+
+ /* start position */
+ if (PG_NARGS() > 3)
+ start = PG_GETARG_INT32(3);
+ if (start < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d", "start_position", start)));
+ if (start > 1)
+ s = DatumGetTextPP(DirectFunctionCall2(text_substr_no_len,
+ PG_GETARG_DATUM(0),
+ Int32GetDatum(start)
+ ));
+ else
+ s = PG_GETARG_TEXT_PP(0);
+
+ /* occurrence to replace */
+ if (PG_NARGS() > 4)
+ occurrence = PG_GETARG_INT32(4);
+
+ if (occurrence < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d", "occurrence", occurrence)));
+
+ /* Determine options */
+ parse_re_flags(&re_flags, flags);
+ re_flags.glob = true;
+
+ /* lookup for pattern */
+ matchctx = setup_regexp_matches(s, pattern, &re_flags,
+ PG_GET_COLLATION(), true,
+ false, re_flags.glob);
+
+ /* If no match is found, then the function returns the original string */
+ if (matchctx->nmatches == 0)
+ PG_RETURN_DATUM(PG_GETARG_DATUM(0));
+
+ /* Get the position of the occurence to replace */
+ if (PG_NARGS() > 4 && occurrence > 0)
+ {
+ /* When occurrence exceed matches return the original string */
+ if (occurrence > matchctx->nmatches)
+ PG_RETURN_DATUM(PG_GETARG_DATUM(0));
+
+ pos = (occurrence*matchctx->npatterns*2)-(matchctx->npatterns*2);
+ pos = matchctx->match_locs[pos]+start;
+ }
+ else
+ pos = start;
+
+ /* Normal case without explicit VARIADIC marker */
+ initStringInfo(&str);
+
+ /* Get the string before the occurrence starting */
+ if (pos > 1)
+ {
+ text *before = DatumGetTextPP(DirectFunctionCall3(text_substr,
+ PG_GETARG_DATUM(0),
+ Int32GetDatum(1),
+ Int32GetDatum(pos - 1)));
+ appendStringInfoString(&str, TextDatumGetCString(before));
+ }
+
+ /* all occurences must be replaced? */
+ if (occurrence == 0)
+ re_flags.glob = true;
+ else
+ re_flags.glob = false;
+
+ /* Compile the regular expression */
+ re = RE_compile_and_cache(p, re_flags.cflags, PG_GET_COLLATION());
+
+ /* Get the substring starting at the right occurrence position */
+ after = DatumGetTextPP(DirectFunctionCall2(text_substr_no_len,
+ PG_GETARG_DATUM(0),
+ Int32GetDatum(pos)
+ ));
+
+ appendStringInfoString(&str, TextDatumGetCString(replace_text_regexp(
+ after,
+ (void *) re,
+ r,
+ re_flags.glob)));
+
+ PG_RETURN_TEXT_P(CStringGetTextDatum(str.data));
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+textregexreplace_extended_no_occurrence(PG_FUNCTION_ARGS)
+{
+ return textregexreplace_extended(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+textregexreplace_extended_no_flags(PG_FUNCTION_ARGS)
+{
+ return textregexreplace_extended(fcinfo);
+}
+
+/*
+ * Return text string between parenthesis
+ */
+static text *
+enclose_with_parenthesis(text *str)
+{
+ int len = VARSIZE_ANY_EXHDR(str);
+ text *result;
+ char *ptr;
+
+ result = palloc(len + VARHDRSZ + 2);
+ SET_VARSIZE(result, len + VARHDRSZ + 2);
+ ptr = VARDATA(result);
+ memcpy(ptr, "(", 1);
+ memcpy(ptr+1, VARDATA_ANY(str), len);
+ memcpy(ptr+len+1, ")", 1);
+
+ return result;
+}
+
+/*
+ * regexp_instr()
+ * Return the position within the string where the match was located
+ */
+Datum
+regexp_instr(PG_FUNCTION_ARGS)
+{
+ text *orig_str = NULL;
+ text *pattern = NULL;
+ text *flags = PG_GETARG_TEXT_PP_IF_EXISTS(5);
+ int start = 1;
+ int occurrence = 1;
+ int return_opt = 0;
+ int subexpr = 0;
+ int pos;
+
+ pg_re_flags re_flags;
+ regexp_matches_ctx *matchctx;
+
+ /* regexp_instr(string, pattern, start) */
+ if (PG_NARGS() > 2)
+ start = PG_GETARG_INT32(2);
+
+ if (start < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d", "start_position", start)));
+
+ if (start > 1)
+ orig_str = DatumGetTextPP(DirectFunctionCall2(text_substr_no_len,
+ PG_GETARG_DATUM(0),
+ Int32GetDatum(start)
+ ));
+ else
+ /* regexp_instr(string, pattern) */
+ orig_str = PG_GETARG_TEXT_PP(0);
+
+ if (orig_str == NULL)
+ PG_RETURN_NULL();
+
+ /* regexp_instr(string, pattern, start, occurrence) */
+ if (PG_NARGS() > 3)
+ occurrence = PG_GETARG_INT32(3);
+
+ if (occurrence <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d", "occurrence", occurrence)));
+
+ /* regexp_instr(string, pattern, start, occurrence, return_opt) */
+ if (PG_NARGS() > 4)
+ return_opt = PG_GETARG_INT32(4);
+
+ if (return_opt != 0 && return_opt != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d", "return_option", return_opt)));
+
+ /* regexp_instr(string, pattern, start, occurrence, return_opt, flags, subexpr) */
+ if (PG_NARGS() > 6)
+ subexpr = PG_GETARG_INT32(6);
+ if (subexpr < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d", "group", subexpr)));
+
+ /*
+ * If subexpr is zero (default), then the position of the entire
+ * substring that matches the pattern is returned. Otherwise we
+ * will exactly register the subexpressions given in the pattern.
+ * Enclose pattern between parenthesis to register the position
+ * of the entire substring.
+ */
+ pattern = enclose_with_parenthesis(PG_GETARG_TEXT_PP(1));
+
+ /* Determine options */
+ parse_re_flags(&re_flags, flags);
+ /* this function require flag 'g' */
+ re_flags.glob = true;
+
+ matchctx = setup_regexp_matches(orig_str, pattern, &re_flags,
+ PG_GET_COLLATION(), true, false, false);
+
+ /* If no match is found, then the function returns 0 */
+ if (matchctx->nmatches == 0)
+ PG_RETURN_INT32(0);
+
+ /* When occurrence exceed matches return 0 */
+ if (occurrence > matchctx->nmatches)
+ PG_RETURN_INT32(0);
+
+ /* When subexpr exceed number of subexpression return 0 */
+ if (subexpr > matchctx->npatterns - 1)
+ PG_RETURN_INT32(0);
+
+ /*
+ * Returns the position of the first character of the occurrence
+ * or for subexpression in this occurrence.
+ */
+ pos = (occurrence*matchctx->npatterns*2)-((matchctx->npatterns-subexpr)*2);
+ if (return_opt == 1)
+ pos += 1;
+
+ PG_RETURN_INT32(matchctx->match_locs[pos]+start);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_instr_no_start(PG_FUNCTION_ARGS)
+{
+ return regexp_instr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_instr_no_occurrence(PG_FUNCTION_ARGS)
+{
+ return regexp_instr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_instr_no_return_opt(PG_FUNCTION_ARGS)
+{
+ return regexp_instr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_instr_no_flags(PG_FUNCTION_ARGS)
+{
+ return regexp_instr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_instr_no_subexpr(PG_FUNCTION_ARGS)
+{
+ return regexp_instr(fcinfo);
+}
+
+/*
+ * regexp_substr()
+ * Return the substring within the string that match a regular
+ * expression pattern
+ */
+Datum
+regexp_substr(PG_FUNCTION_ARGS)
+{
+ text *orig_str = NULL;
+ text *pattern = NULL;
+ text *flags = PG_GETARG_TEXT_PP_IF_EXISTS(4);
+ int start = 1;
+ int occurrence = 1;
+ int subexpr = 0;
+ int so, eo, pos;
+
+ pg_re_flags re_flags;
+ regexp_matches_ctx *matchctx;
+
+ if (PG_NARGS() > 2)
+ start = PG_GETARG_INT32(2);
+
+ if (start < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d", "start_position", start)));
+
+ /* regexp_substr(string, pattern, start) */
+ if (start > 1)
+ orig_str = DatumGetTextPP(DirectFunctionCall2(text_substr_no_len,
+ PG_GETARG_DATUM(0),
+ Int32GetDatum(start)
+ ));
+ else
+ /* regexp_substr(string, pattern) */
+ orig_str = PG_GETARG_TEXT_PP(0);
+
+ if (orig_str == NULL)
+ PG_RETURN_NULL();
+
+ /* regexp_substr(string, pattern, start, occurrence) */
+ if (PG_NARGS() > 3)
+ occurrence = PG_GETARG_INT32(3);
+
+ if (occurrence <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d", "occurrence", occurrence)));
+
+ /* regexp_substr(string, pattern, start, occurrence, flags, subexpr) */
+ if (PG_NARGS() > 5)
+ subexpr = PG_GETARG_INT32(5);
+ if (subexpr < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d", "group", subexpr)));
+
+ /*
+ * If subexpr is zero (default), then the position of the entire
+ * substring that matches the pattern is returned. Otherwise we
+ * will exactly register the subexpressions given in the pattern.
+ * Enclose pattern between parenthesis to register the position
+ * of the entire substring.
+ */
+ pattern = enclose_with_parenthesis(PG_GETARG_TEXT_PP(1));
+
+ /* Determine options */
+ parse_re_flags(&re_flags, flags);
+ /* this function require flag 'g' */
+ re_flags.glob = true;
+
+ matchctx = setup_regexp_matches(orig_str, pattern, &re_flags,
+ PG_GET_COLLATION(), true, false, false);
+
+ /* If no match is found, then the function returns NULL */
+ if (matchctx->nmatches == 0)
+ PG_RETURN_NULL();
+
+ /* When occurrence exceed matches return NULL */
+ if (occurrence > matchctx->nmatches)
+ PG_RETURN_NULL();
+
+ /* When subexpr exceed number of subexpression return NULL */
+ if (subexpr > matchctx->npatterns - 1)
+ PG_RETURN_NULL();
+
+ /* Returns the substring corresponding to the occurrence. */
+ pos = (occurrence*matchctx->npatterns*2)-((matchctx->npatterns-subexpr)*2);
+ so = matchctx->match_locs[pos]+1;
+ eo = matchctx->match_locs[pos+1]+1;
+
+ PG_RETURN_DATUM(DirectFunctionCall3(text_substr,
+ PointerGetDatum(matchctx->orig_str),
+ Int32GetDatum(so),
+ Int32GetDatum(eo - so)));
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_substr_no_start(PG_FUNCTION_ARGS)
+{
+ return regexp_substr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_substr_no_occurrence(PG_FUNCTION_ARGS)
+{
+ return regexp_substr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_substr_no_flags(PG_FUNCTION_ARGS)
+{
+ return regexp_substr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_substr_no_subexpr(PG_FUNCTION_ARGS)
+{
+ return regexp_substr(fcinfo);
+}
+
/*
* setup_regexp_matches --- do the initial matching for regexp_match
* and regexp_split functions
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 8cd0252082..f45c98c05f 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -3579,6 +3579,63 @@
proname => 'regexp_matches', prorows => '10', proretset => 't',
prorettype => '_text', proargtypes => 'text text text',
prosrc => 'regexp_matches' },
+{ oid => '9614', descr => 'count match(es) for regexp',
+ proname => 'regexp_count', prorettype => 'int4', proargtypes => 'text text',
+ prosrc => 'regexp_count_no_start' },
+{ oid => '9615', descr => 'count match(es) for regexp',
+ proname => 'regexp_count', prorettype => 'int4', proargtypes => 'text text int4',
+ prosrc => 'regexp_count_no_flags' },
+{ oid => '9616', descr => 'count match(es) for regexp',
+ proname => 'regexp_count', prorettype => 'int4',
+ proargtypes => 'text text int4 text', prosrc => 'regexp_count' },
+{ oid => '9617', descr => 'position where the match for regexp was located',
+ proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text',
+ prosrc => 'regexp_instr_no_start' },
+{ oid => '9618', descr => 'position where the match for regexp was located',
+ proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text int4',
+ prosrc => 'regexp_instr_no_occurrence' },
+{ oid => '9619', descr => 'position where the match for regexp was located',
+ proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text int4 int4',
+ prosrc => 'regexp_instr_no_return_opt' },
+{ oid => '9620', descr => 'position where the match for regexp was located',
+ proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text int4 int4 int4',
+ prosrc => 'regexp_instr_no_flags' },
+{ oid => '9621', descr => 'position where the match for regexp was located',
+ proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text int4 int4 int4 text',
+ prosrc => 'regexp_instr_no_subexpr' },
+{ oid => '9622', descr => 'position where the match for regexp was located',
+ proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text int4 int4 int4 text int4',
+ prosrc => 'regexp_instr' },
+{ oid => '9623', descr => 'substring that matches the regexp pattern',
+ proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text',
+ prosrc => 'regexp_substr_no_start' },
+{ oid => '9624', descr => 'substring that matchies the regexp pattern',
+ proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text int4',
+ prosrc => 'regexp_substr_no_occurrence' },
+{ oid => '9625', descr => 'substring that matches the regexp pattern',
+ proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text int4 int4',
+ prosrc => 'regexp_substr_no_flags' },
+{ oid => '9626', descr => 'substring that matches the regexp pattern',
+ proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text int4 int4 text',
+ prosrc => 'regexp_substr_no_subexpr' },
+{ oid => '9627', descr => 'substring that matches the regexp pattern',
+ proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text int4 int4 text int4',
+ prosrc => 'regexp_substr' },
+{ oid => '9628', descr => 'evaluate match(es) for regexp',
+ proname => 'regexp_like', prorettype => 'bool', proargtypes => 'text text',
+ prosrc => 'regexp_like_no_flags' },
+{ oid => '9629', descr => 'evaluate match(es) for regexp',
+ proname => 'regexp_like', prorettype => 'bool',
+ proargtypes => 'text text text', prosrc => 'regexp_like' },
+{ oid => '9606', descr => 'replace text using regexp',
+ proname => 'regexp_replace', prorettype => 'text',
+ proargtypes => 'text text text int4 int4 text', prosrc => 'textregexreplace_extended' },
+{ oid => '9607', descr => 'replace text using regexp',
+ proname => 'regexp_replace', prorettype => 'text',
+ proargtypes => 'text text text int4 int4', prosrc => 'textregexreplace_extended_no_flags' },
+{ oid => '9608', descr => 'replace text using regexp',
+ proname => 'regexp_replace', prorettype => 'text',
+ proargtypes => 'text text text int4', prosrc => 'textregexreplace_extended_no_occurrence' },
{ oid => '2088', descr => 'split string by field_sep and return field_num',
proname => 'split_part', prorettype => 'text',
proargtypes => 'text text int4', prosrc => 'split_part' },
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index 91aa819804..b4e3501556 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -905,6 +905,757 @@ SELECT foo, length(foo) FROM regexp_split_to_table('thE QUick bROWn FOx jUMPs ov
ERROR: regexp_split_to_table() does not support the "global" option
SELECT regexp_split_to_array('thE QUick bROWn FOx jUMPs ovEr The lazy dOG', 'e', 'g');
ERROR: regexp_split_to_array() does not support the "global" option
+-- regexp_like tests
+SELECT regexp_like('a'||CHR(10)||'d', 'a.d');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT regexp_like('a'||CHR(10)||'d', 'a.d', 'm');
+ regexp_like
+-------------
+ f
+(1 row)
+
+SELECT regexp_like('a'||CHR(10)||'d', 'a.d', 'n');
+ regexp_like
+-------------
+ f
+(1 row)
+
+SELECT regexp_like('Steven', '^Ste(v|ph)en$');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT regexp_like('foo' || chr(10) || 'bar' || chr(10) || 'bequq' || chr(10) || 'baz', '^bar');
+ regexp_like
+-------------
+ f
+(1 row)
+
+SELECT regexp_like('foo' || chr(10) || 'bar' || chr(10) || 'bequq' || chr(10) || 'baz', 'bar');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT regexp_like('foo' || chr(10) || 'bar' || chr(10) || 'bequq' || chr(10) || 'baz', '^bar', 'm');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT regexp_like('foo' || chr(10) || 'bar' || chr(10) || 'bequq' || chr(10) || 'baz', '^bar', 'n');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT regexp_like('GREEN', '([aeiou])\1');
+ regexp_like
+-------------
+ f
+(1 row)
+
+SELECT regexp_like('GREEN', '([aeiou])\1', 'i');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '([aeiou])\1', 'i');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '^..([aeiou])\1', 'i');
+ regexp_like
+-------------
+ f
+(1 row)
+
+SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '([aeiou])\1', 'in');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '^..([aeiou])\1', 'in');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '^..([aeiou])\1', 'im');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT REGEXP_LIKE('abc', 'a b c');
+ regexp_like
+-------------
+ f
+(1 row)
+
+SELECT REGEXP_LIKE('abc', 'a b c','x');
+ regexp_like
+-------------
+ t
+(1 row)
+
+-- count all matches for regexp
+SELECT regexp_count('123123123123123', '(12)3');
+ regexp_count
+--------------
+ 5
+(1 row)
+
+-- count all matches with start position
+SELECT regexp_count('123123123123', '123', 0);
+ regexp_count
+--------------
+ 4
+(1 row)
+
+SELECT regexp_count('123123123123', '123', 1);
+ regexp_count
+--------------
+ 4
+(1 row)
+
+SELECT regexp_count('123123123123', '123', 3);
+ regexp_count
+--------------
+ 3
+(1 row)
+
+SELECT regexp_count('123123123123', '123', -3);
+ERROR: invalid value for parameter "start_position": -3
+-- count all matches in NULL string with a start position
+SELECT regexp_count(NULL, '123', 3);
+ regexp_count
+--------------
+ \N
+(1 row)
+
+-- count all matches with a start position greater than string length
+SELECT regexp_count('123', '123', 10);
+ regexp_count
+--------------
+ 0
+(1 row)
+
+-- count all matches from different regexp
+SELECT regexp_count('ABC123', '[A-Z]'), regexp_count('A1B2C3', '[A-Z]');
+ regexp_count | regexp_count
+--------------+--------------
+ 3 | 3
+(1 row)
+
+SELECT regexp_count('ABC123', '[A-Z][0-9]'), regexp_count('A1B2C3', '[A-Z][0-9]');
+ regexp_count | regexp_count
+--------------+--------------
+ 1 | 3
+(1 row)
+
+SELECT regexp_count('ABC123', '[A-Z][0-9]{2}'), regexp_count('A1B2C3', '[A-Z][0-9]{2}');
+ regexp_count | regexp_count
+--------------+--------------
+ 1 | 0
+(1 row)
+
+SELECT regexp_count('ABC123', '([A-Z][0-9]){2}'), regexp_count('A1B2C3', '([A-Z][0-9]){2}');
+ regexp_count | regexp_count
+--------------+--------------
+ 0 | 1
+(1 row)
+
+SELECT regexp_count('ABC123A5', '^[A-Z][0-9]'), regexp_count('A1B2C3', '^[A-Z][0-9]');
+ regexp_count | regexp_count
+--------------+--------------
+ 0 | 1
+(1 row)
+
+SELECT regexp_count('ABC123', '[A-Z][0-9]{2}'), regexp_count('A1B2C34', '[A-Z][0-9]{2}');
+ regexp_count | regexp_count
+--------------+--------------
+ 1 | 1
+(1 row)
+
+-- count matches with newline case insensivity
+SELECT regexp_count('a'||CHR(10)||'d', 'a.d');
+ regexp_count
+--------------
+ 1
+(1 row)
+
+SELECT regexp_count('a'||CHR(10)||'d', 'a.d', 1, 's');
+ regexp_count
+--------------
+ 1
+(1 row)
+
+-- count matches with newline case sensivity
+SELECT regexp_count('a'||CHR(10)||'d', 'a.d', 1, 'n');
+ regexp_count
+--------------
+ 0
+(1 row)
+
+-- count not multiline matches
+SELECT regexp_count('a'||CHR(10)||'d', '^d$');
+ regexp_count
+--------------
+ 0
+(1 row)
+
+-- count multiline matches
+SELECT regexp_count('a'||CHR(10)||'d', '^d$', 1, 'm');
+ regexp_count
+--------------
+ 1
+(1 row)
+
+SELECT regexp_count('Hello'||CHR(10)||'world!', '^world!$', 1, 'm'); -- 1
+ regexp_count
+--------------
+ 1
+(1 row)
+
+-- Count the number of occurrences of the substring an in the string.
+SELECT regexp_count('a man, a plan, a canal: Panama', 'an');
+ regexp_count
+--------------
+ 4
+(1 row)
+
+-- Find the number of occurrences of the substring an in the string starting with the fifth character.
+SELECT regexp_count('a man, a plan, a canal: Panama', 'an',5);
+ regexp_count
+--------------
+ 3
+(1 row)
+
+-- Find the number of occurrences of a substring containing a lower-case character
+-- followed by an. In the first example, do not use a modifier. In the second example,
+-- use the i modifier to force the regular expression to ignore case.
+SELECT regexp_count('a man, a plan, a canal: Panama', '[a-z]an');
+ regexp_count
+--------------
+ 3
+(1 row)
+
+SELECT regexp_count('a man, a plan, a canal: Panama', '[a-z]an', 1, 'i');
+ regexp_count
+--------------
+ 4
+(1 row)
+
+DROP TABLE IF EXISTS regexp_temp;
+NOTICE: table "regexp_temp" does not exist, skipping
+CREATE TABLE regexp_temp(fullname varchar(20), email varchar(20));
+INSERT INTO regexp_temp (fullname, email) VALUES ('John Doe', 'johndoe@example.com');
+INSERT INTO regexp_temp (fullname, email) VALUES ('Jane Doe', 'janedoe');
+-- count matches case sensitive
+SELECT fullname, regexp_count(fullname, 'e', 1, 'c') FROM regexp_temp;
+ fullname | regexp_count
+----------+--------------
+ John Doe | 1
+ Jane Doe | 2
+(2 rows)
+
+SELECT fullname, regexp_count(fullname, 'D', 1, 'c') FROM regexp_temp;
+ fullname | regexp_count
+----------+--------------
+ John Doe | 1
+ Jane Doe | 1
+(2 rows)
+
+SELECT fullname, regexp_count(fullname, 'd', 1, 'c') FROM regexp_temp;
+ fullname | regexp_count
+----------+--------------
+ John Doe | 0
+ Jane Doe | 0
+(2 rows)
+
+-- count matches case insensitive
+SELECT fullname, regexp_count(fullname, 'E', 1, 'i') FROM regexp_temp;
+ fullname | regexp_count
+----------+--------------
+ John Doe | 1
+ Jane Doe | 2
+(2 rows)
+
+SELECT fullname, regexp_count(fullname, 'do', 1, 'i') FROM regexp_temp;
+ fullname | regexp_count
+----------+--------------
+ John Doe | 1
+ Jane Doe | 1
+(2 rows)
+
+-- return the start position of the 6th occurence starting at beginning of the string
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[^ ]+', 1, 6);
+ regexp_instr
+--------------
+ 32
+(1 row)
+
+-- return the start position of the 5th occurence starting after the first word
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[^ ]+', 7, 5, 0);
+ regexp_instr
+--------------
+ 32
+(1 row)
+
+-- return the ending position of the 5th occurence starting after the first word
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[^ ]+', 7, 5, 0);
+ regexp_instr
+--------------
+ 32
+(1 row)
+
+-- return the ending position of the 2nd occurence starting after the first word
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[T|Z|S][[:alpha:]]{5}', 7, 2, 1, 'i');
+ regexp_instr
+--------------
+ 30
+(1 row)
+
+-- return the starting position corresponding to the different capture group
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 0);
+ regexp_instr
+--------------
+ 1
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 1);
+ regexp_instr
+--------------
+ 1
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 2);
+ regexp_instr
+--------------
+ 4
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 3);
+ regexp_instr
+--------------
+ 5
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 4);
+ regexp_instr
+--------------
+ 7
+(1 row)
+
+-- return the starting position corresponding to a non existant capture group
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 5);
+ regexp_instr
+--------------
+ 0
+(1 row)
+
+-- Same but with the ending position
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 0);
+ regexp_instr
+--------------
+ 9
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 1);
+ regexp_instr
+--------------
+ 4
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 2);
+ regexp_instr
+--------------
+ 9
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 3);
+ regexp_instr
+--------------
+ 7
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 4);
+ regexp_instr
+--------------
+ 9
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 5);
+ regexp_instr
+--------------
+ 0
+(1 row)
+
+-- start position of a valid email
+SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+') "valid_email" FROM regexp_temp;
+ email | valid_email
+---------------------+-------------
+ johndoe@example.com | 1
+ janedoe | 0
+(2 rows)
+
+-- ending position of a valid email
+SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+', 1, 1, 1) "valid_email" FROM regexp_temp;
+ email | valid_email
+---------------------+-------------
+ johndoe@example.com | 20
+ janedoe | 0
+(2 rows)
+
+-- start position of first capture group in the email (the dot part)
+SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+', 1, 1, 0, 'i', 1) FROM regexp_temp;
+ email | regexp_instr
+---------------------+--------------
+ johndoe@example.com | 16
+ janedoe | 0
+(2 rows)
+
+-- ending position of first capture group in the email (the dot part)
+SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+', 1, 1, 1, 'i', 1) FROM regexp_temp;
+ email | regexp_instr
+---------------------+--------------
+ johndoe@example.com | 20
+ janedoe | 0
+(2 rows)
+
+-- test negative values
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', -1, 1, 1, 'i', 1);
+ERROR: invalid value for parameter "start_position": -1
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, -1, 1, 'i', 1);
+ERROR: invalid value for parameter "occurrence": -1
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, -1, 'i', 1);
+ERROR: invalid value for parameter "return_option": -1
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', -1);
+ERROR: invalid value for parameter "group": -1
+-- Find the first occurrence of a sequence of letters starting with the letter e
+-- and ending with the letter y in the phrase "easy come, easy go."
+SELECT regexp_instr('easy come, easy go','e\w*y');
+ regexp_instr
+--------------
+ 1
+(1 row)
+
+-- Find the first sequence of letters starting with the letter e and ending with
+-- the letter y in the string "easy come, easy go" starting at the second character
+SELECT regexp_instr('easy come, easy go','e\w*y',2);
+ regexp_instr
+--------------
+ 12
+(1 row)
+
+-- Find the second sequence of letters starting with the letter e and ending with
+-- the letter y in the string "easy come, easy go" starting at the first character.
+SELECT regexp_instr('easy come, easy go','e\w*y',1,2);
+ regexp_instr
+--------------
+ 12
+(1 row)
+
+-- Find the position of the first character after the first whitespace in the string "easy come, easy go."
+SELECT regexp_instr('easy come, easy go','\s',1,1,1);
+ regexp_instr
+--------------
+ 6
+(1 row)
+
+-- Find the position of the start of the third word in a string by capturing each
+-- word as a subexpression, and returning the third subexpression's start position.
+SELECT regexp_instr('one two three','(\w+)\s+(\w+)\s+(\w+)', 1,1,0,'',3);
+ regexp_instr
+--------------
+ 9
+(1 row)
+
+-- return the substring matching the regexp
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+');
+ regexp_substr
+----------------
+ , zipcode town
+(1 row)
+
+-- return the substring matching the regexp
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+', 24);
+ regexp_substr
+---------------
+ , FR
+(1 row)
+
+-- return the substring matching the regexp at the first occurrence
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+', 1, 1);
+ regexp_substr
+----------------
+ , zipcode town
+(1 row)
+
+-- return the substring matching the regexp at the second occurrence
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+', 1, 2);
+ regexp_substr
+---------------
+ , FR
+(1 row)
+
+-- case sensitivity substring search
+SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1);
+ regexp_substr
+---------------
+ \N
+(1 row)
+
+SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1, 'i');
+ regexp_substr
+----------------
+ , zipcode town
+(1 row)
+
+-- case sensitivity substring search with no capture group
+SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1, 'i', 0);
+ regexp_substr
+----------------
+ , zipcode town
+(1 row)
+
+-- case sensitivity substring search with non existing capture group
+SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1, 'i', 0);
+ regexp_substr
+----------------
+ , zipcode town
+(1 row)
+
+-- return the substring matching the regexp at different occurrence and capture group
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, 1, 'i', 4);
+ regexp_substr
+---------------
+ 78
+(1 row)
+
+SELECT regexp_substr('1234567890 1234557890', '(123)(4(5[56])(78))', 1, 2, 'i', 3);
+ regexp_substr
+---------------
+ 55
+(1 row)
+
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, 1, 'i', 0);
+ regexp_substr
+---------------
+ 12345678
+(1 row)
+
+-- test negative values
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', -1, 1, 'i', 4);
+ERROR: invalid value for parameter "start_position": -1
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, -1, 'i', 4);
+ERROR: invalid value for parameter "occurrence": -1
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, 1, 'i', -4);
+ERROR: invalid value for parameter "group": -4
+-- Select the first substring of letters that end with "thy."
+SELECT regexp_substr('healthy, wealthy, and wise','\w+thy');
+ regexp_substr
+---------------
+ healthy
+(1 row)
+
+-- Select the first substring of letters that ends with "thy" starting at the second character in the string.
+SELECT regexp_substr('healthy, wealthy, and wise','\w+thy',2);
+ regexp_substr
+---------------
+ ealthy
+(1 row)
+
+-- Return the contents of the third captured subexpression, which captures the third word in the string.
+SELECT regexp_substr('one two three', '(\w+)\s+(\w+)\s+(\w+)', 1, 1, '', 3);
+ regexp_substr
+---------------
+ three
+(1 row)
+
+DROP TABLE IF EXISTS regexp_temp;
+-- Regression tests for extended regexp_replace() function with start position and occurrence
+SELECT regexp_replace('512.123.4567', '([[:digit:]]{3})\.([[:digit:]]{3})\.([[:digit:]]{4})', '(\1) \2-\3', 1);
+ regexp_replace
+----------------
+ (512) 123-4567
+(1 row)
+
+SELECT regexp_replace('512.123.4567 612.123.4567', '([[:digit:]]{3})\.([[:digit:]]{3})\.([[:digit:]]{4})', '(\1) \2-\3', 1, 0);
+ regexp_replace
+-------------------------------
+ (512) 123-4567 (612) 123-4567
+(1 row)
+
+SELECT regexp_replace('number your street,'||CHR(10)||' zipcode town, FR', '( ){2,}', ' ', 1, 0);
+ regexp_replace
+---------------------
+ number your street,+
+ zipcode town, FR
+(1 row)
+
+SELECT regexp_replace('number your street, zipcode town, FR', '( ){2,}', ' ', 9);
+ regexp_replace
+--------------------------------------------
+ number your street, zipcode town, FR
+(1 row)
+
+SELECT regexp_replace('number your street, zipcode town, FR', '( ){2,}', ' ', 9, 0);
+ regexp_replace
+----------------------------------------
+ number your street, zipcode town, FR
+(1 row)
+
+SELECT regexp_replace('number your street, zipcode town, FR', '( ){2,}', ' ', 9, 2);
+ regexp_replace
+---------------------------------------------
+ number your street, zipcode town, FR
+(1 row)
+
+SELECT regexp_replace('number your street, zipcode town, FR', '( ){2,}', ' ', 9, 2, 'm');
+ regexp_replace
+---------------------------------------------
+ number your street, zipcode town, FR
+(1 row)
+
+SELECT regexp_replace('number your street, zipcode town, FR', '([EURT]){2,}', '[\1]', 9, 1, 'i');
+ regexp_replace
+-------------------------------------------------
+ number yo[r] street, zipcode town, FR
+(1 row)
+
+SELECT regexp_replace('number your street, zipcode town, FR', '([EURT]){2,}', '[\1]', 9, 2, 'i');
+ regexp_replace
+----------------------------------------------
+ number your s[t], zipcode town, FR
+(1 row)
+
+SELECT regexp_replace ('A PostgreSQL function', 'A|e|i|o|u', 'X', 1, 2);
+ regexp_replace
+-----------------------
+ A PXstgreSQL function
+(1 row)
+
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 0, 'i');
+ regexp_replace
+-----------------------
+ X PXstgrXSQL fXnctXXn
+(1 row)
+
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 1, 'i');
+ regexp_replace
+-----------------------
+ X PostgreSQL function
+(1 row)
+
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 2, 'i');
+ regexp_replace
+-----------------------
+ A PXstgreSQL function
+(1 row)
+
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 3, 'i');
+ regexp_replace
+-----------------------
+ A PostgrXSQL function
+(1 row)
+
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 9, 'i');
+ regexp_replace
+-----------------------
+ A PostgreSQL function
+(1 row)
+
+SELECT regexp_replace ('A PostgreSQL function', 'A|e|i|o|u', 'X', 1, 9);
+ regexp_replace
+-----------------------
+ A PostgreSQL function
+(1 row)
+
+-- Invalid parameter values
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', -1, 0, 'i');
+ERROR: invalid value for parameter "start_position": -1
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, -1, 'i');
+ERROR: invalid value for parameter "occurrence": -1
+-- Modifier 'g' should not be taken in account, we have an occurrence to replace
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 1, 'g');
+ regexp_replace
+-----------------------
+ A PXstgreSQL function
+(1 row)
+
+-- Find groups of "word characters" (letters, numbers and underscore) ending with
+-- "thy" in the string "healthy, wealthy, and wise" and replace them with nothing.
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', '', 'g');
+ regexp_replace
+----------------
+ , , and wise
+(1 row)
+
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', '', 1, 0);
+ regexp_replace
+----------------
+ , , and wise
+(1 row)
+
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', '', 1, 0, 'g');
+ regexp_replace
+----------------
+ , , and wise
+(1 row)
+
+-- Find groups of word characters ending with "thy" and replace with "something."
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', 'something', 'g');
+ regexp_replace
+--------------------------------
+ something, something, and wise
+(1 row)
+
+-- Find groups of word characters ending with "thy" and replace with the string
+-- "something" starting at the third character in the string.
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', 'something', 3, 0);
+ regexp_replace
+----------------------------------
+ hesomething, something, and wise
+(1 row)
+
+-- Replace the second group of word characters ending with "thy" with "something."
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', 'something', 1, 2);
+ regexp_replace
+------------------------------
+ healthy, something, and wise
+(1 row)
+
+-- Find groups of word characters ending with "thy" capturing the letters before
+-- the "thy", and replace with the captured letters plus the letters "ish."
+SELECT regexp_replace('healthy, wealthy, and wise','(\w+)thy', '\1ish', 'g');
+ regexp_replace
+----------------------------
+ healish, wealish, and wise
+(1 row)
+
+SELECT regexp_replace('healthy, wealthy, and wise','(\w+)thy', '\1ish', 1, 0);
+ regexp_replace
+----------------------------
+ healish, wealish, and wise
+(1 row)
+
+-- Ambiguous use of option parameter with regexp_replace()
+PREPARE rr AS SELECT regexp_replace('healthy, wealthy, and wise','(\w+)thy', '\1ish', $1);
+EXECUTE rr(1);
+ERROR: ambiguous use of the option parameter in regex_replace(), value: 1
+HINT: you might set the occurrence parameter to force the use of the extended form of regex_replace()
+DEALLOCATE rr;
-- change NULL-display back
\pset null ''
-- E021-11 position expression
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index 2c502534c2..f50131896f 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -255,6 +255,202 @@ SELECT regexp_split_to_array('thE QUick bROWn FOx jUMPs ovEr The lazy dOG', 'e',
SELECT foo, length(foo) FROM regexp_split_to_table('thE QUick bROWn FOx jUMPs ovEr The lazy dOG', 'e', 'g') AS foo;
SELECT regexp_split_to_array('thE QUick bROWn FOx jUMPs ovEr The lazy dOG', 'e', 'g');
+-- regexp_like tests
+SELECT regexp_like('a'||CHR(10)||'d', 'a.d');
+SELECT regexp_like('a'||CHR(10)||'d', 'a.d', 'm');
+SELECT regexp_like('a'||CHR(10)||'d', 'a.d', 'n');
+SELECT regexp_like('Steven', '^Ste(v|ph)en$');
+SELECT regexp_like('foo' || chr(10) || 'bar' || chr(10) || 'bequq' || chr(10) || 'baz', '^bar');
+SELECT regexp_like('foo' || chr(10) || 'bar' || chr(10) || 'bequq' || chr(10) || 'baz', 'bar');
+SELECT regexp_like('foo' || chr(10) || 'bar' || chr(10) || 'bequq' || chr(10) || 'baz', '^bar', 'm');
+SELECT regexp_like('foo' || chr(10) || 'bar' || chr(10) || 'bequq' || chr(10) || 'baz', '^bar', 'n');
+SELECT regexp_like('GREEN', '([aeiou])\1');
+SELECT regexp_like('GREEN', '([aeiou])\1', 'i');
+SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '([aeiou])\1', 'i');
+SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '^..([aeiou])\1', 'i');
+SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '([aeiou])\1', 'in');
+SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '^..([aeiou])\1', 'in');
+SELECT regexp_like('ORANGE' || chr(10) || 'GREEN', '^..([aeiou])\1', 'im');
+SELECT REGEXP_LIKE('abc', 'a b c');
+SELECT REGEXP_LIKE('abc', 'a b c','x');
+
+-- count all matches for regexp
+SELECT regexp_count('123123123123123', '(12)3');
+-- count all matches with start position
+SELECT regexp_count('123123123123', '123', 0);
+SELECT regexp_count('123123123123', '123', 1);
+SELECT regexp_count('123123123123', '123', 3);
+SELECT regexp_count('123123123123', '123', -3);
+-- count all matches in NULL string with a start position
+SELECT regexp_count(NULL, '123', 3);
+-- count all matches with a start position greater than string length
+SELECT regexp_count('123', '123', 10);
+-- count all matches from different regexp
+SELECT regexp_count('ABC123', '[A-Z]'), regexp_count('A1B2C3', '[A-Z]');
+SELECT regexp_count('ABC123', '[A-Z][0-9]'), regexp_count('A1B2C3', '[A-Z][0-9]');
+SELECT regexp_count('ABC123', '[A-Z][0-9]{2}'), regexp_count('A1B2C3', '[A-Z][0-9]{2}');
+SELECT regexp_count('ABC123', '([A-Z][0-9]){2}'), regexp_count('A1B2C3', '([A-Z][0-9]){2}');
+SELECT regexp_count('ABC123A5', '^[A-Z][0-9]'), regexp_count('A1B2C3', '^[A-Z][0-9]');
+SELECT regexp_count('ABC123', '[A-Z][0-9]{2}'), regexp_count('A1B2C34', '[A-Z][0-9]{2}');
+-- count matches with newline case insensivity
+SELECT regexp_count('a'||CHR(10)||'d', 'a.d');
+SELECT regexp_count('a'||CHR(10)||'d', 'a.d', 1, 's');
+-- count matches with newline case sensivity
+SELECT regexp_count('a'||CHR(10)||'d', 'a.d', 1, 'n');
+-- count not multiline matches
+SELECT regexp_count('a'||CHR(10)||'d', '^d$');
+-- count multiline matches
+SELECT regexp_count('a'||CHR(10)||'d', '^d$', 1, 'm');
+SELECT regexp_count('Hello'||CHR(10)||'world!', '^world!$', 1, 'm'); -- 1
+-- Count the number of occurrences of the substring an in the string.
+SELECT regexp_count('a man, a plan, a canal: Panama', 'an');
+-- Find the number of occurrences of the substring an in the string starting with the fifth character.
+SELECT regexp_count('a man, a plan, a canal: Panama', 'an',5);
+-- Find the number of occurrences of a substring containing a lower-case character
+-- followed by an. In the first example, do not use a modifier. In the second example,
+-- use the i modifier to force the regular expression to ignore case.
+SELECT regexp_count('a man, a plan, a canal: Panama', '[a-z]an');
+SELECT regexp_count('a man, a plan, a canal: Panama', '[a-z]an', 1, 'i');
+
+DROP TABLE IF EXISTS regexp_temp;
+CREATE TABLE regexp_temp(fullname varchar(20), email varchar(20));
+INSERT INTO regexp_temp (fullname, email) VALUES ('John Doe', 'johndoe@example.com');
+INSERT INTO regexp_temp (fullname, email) VALUES ('Jane Doe', 'janedoe');
+-- count matches case sensitive
+SELECT fullname, regexp_count(fullname, 'e', 1, 'c') FROM regexp_temp;
+SELECT fullname, regexp_count(fullname, 'D', 1, 'c') FROM regexp_temp;
+SELECT fullname, regexp_count(fullname, 'd', 1, 'c') FROM regexp_temp;
+-- count matches case insensitive
+SELECT fullname, regexp_count(fullname, 'E', 1, 'i') FROM regexp_temp;
+SELECT fullname, regexp_count(fullname, 'do', 1, 'i') FROM regexp_temp;
+
+-- return the start position of the 6th occurence starting at beginning of the string
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[^ ]+', 1, 6);
+-- return the start position of the 5th occurence starting after the first word
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[^ ]+', 7, 5, 0);
+-- return the ending position of the 5th occurence starting after the first word
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[^ ]+', 7, 5, 0);
+-- return the ending position of the 2nd occurence starting after the first word
+SELECT regexp_instr('number of your street, zipcode thetown, FR', '[T|Z|S][[:alpha:]]{5}', 7, 2, 1, 'i');
+-- return the starting position corresponding to the different capture group
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 0);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 1);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 2);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 3);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 4);
+-- return the starting position corresponding to a non existant capture group
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 5);
+-- Same but with the ending position
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 0);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 1);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 2);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 3);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 4);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 5);
+-- start position of a valid email
+SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+') "valid_email" FROM regexp_temp;
+-- ending position of a valid email
+SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+', 1, 1, 1) "valid_email" FROM regexp_temp;
+-- start position of first capture group in the email (the dot part)
+SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+', 1, 1, 0, 'i', 1) FROM regexp_temp;
+-- ending position of first capture group in the email (the dot part)
+SELECT email, regexp_instr(email, '\w+@\w+(\.\w+)+', 1, 1, 1, 'i', 1) FROM regexp_temp;
+-- test negative values
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', -1, 1, 1, 'i', 1);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, -1, 1, 'i', 1);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, -1, 'i', 1);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', -1);
+
+-- Find the first occurrence of a sequence of letters starting with the letter e
+-- and ending with the letter y in the phrase "easy come, easy go."
+SELECT regexp_instr('easy come, easy go','e\w*y');
+-- Find the first sequence of letters starting with the letter e and ending with
+-- the letter y in the string "easy come, easy go" starting at the second character
+SELECT regexp_instr('easy come, easy go','e\w*y',2);
+-- Find the second sequence of letters starting with the letter e and ending with
+-- the letter y in the string "easy come, easy go" starting at the first character.
+SELECT regexp_instr('easy come, easy go','e\w*y',1,2);
+-- Find the position of the first character after the first whitespace in the string "easy come, easy go."
+SELECT regexp_instr('easy come, easy go','\s',1,1,1);
+-- Find the position of the start of the third word in a string by capturing each
+-- word as a subexpression, and returning the third subexpression's start position.
+SELECT regexp_instr('one two three','(\w+)\s+(\w+)\s+(\w+)', 1,1,0,'',3);
+
+-- return the substring matching the regexp
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+');
+-- return the substring matching the regexp
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+', 24);
+-- return the substring matching the regexp at the first occurrence
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+', 1, 1);
+-- return the substring matching the regexp at the second occurrence
+SELECT regexp_substr('number of your street, zipcode town, FR', ',[^,]+', 1, 2);
+-- case sensitivity substring search
+SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1);
+SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1, 'i');
+-- case sensitivity substring search with no capture group
+SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1, 'i', 0);
+-- case sensitivity substring search with non existing capture group
+SELECT regexp_substr('number of your street, zipcode town, FR', ',\s+[Zf][^,]+', 1, 1, 'i', 0);
+-- return the substring matching the regexp at different occurrence and capture group
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, 1, 'i', 4);
+SELECT regexp_substr('1234567890 1234557890', '(123)(4(5[56])(78))', 1, 2, 'i', 3);
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, 1, 'i', 0);
+-- test negative values
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', -1, 1, 'i', 4);
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, -1, 'i', 4);
+SELECT regexp_substr('1234567890 1234567890', '(123)(4(56)(78))', 1, 1, 'i', -4);
+-- Select the first substring of letters that end with "thy."
+SELECT regexp_substr('healthy, wealthy, and wise','\w+thy');
+-- Select the first substring of letters that ends with "thy" starting at the second character in the string.
+SELECT regexp_substr('healthy, wealthy, and wise','\w+thy',2);
+-- Return the contents of the third captured subexpression, which captures the third word in the string.
+SELECT regexp_substr('one two three', '(\w+)\s+(\w+)\s+(\w+)', 1, 1, '', 3);
+
+DROP TABLE IF EXISTS regexp_temp;
+
+-- Regression tests for extended regexp_replace() function with start position and occurrence
+SELECT regexp_replace('512.123.4567', '([[:digit:]]{3})\.([[:digit:]]{3})\.([[:digit:]]{4})', '(\1) \2-\3', 1);
+SELECT regexp_replace('512.123.4567 612.123.4567', '([[:digit:]]{3})\.([[:digit:]]{3})\.([[:digit:]]{4})', '(\1) \2-\3', 1, 0);
+SELECT regexp_replace('number your street,'||CHR(10)||' zipcode town, FR', '( ){2,}', ' ', 1, 0);
+SELECT regexp_replace('number your street, zipcode town, FR', '( ){2,}', ' ', 9);
+SELECT regexp_replace('number your street, zipcode town, FR', '( ){2,}', ' ', 9, 0);
+SELECT regexp_replace('number your street, zipcode town, FR', '( ){2,}', ' ', 9, 2);
+SELECT regexp_replace('number your street, zipcode town, FR', '( ){2,}', ' ', 9, 2, 'm');
+SELECT regexp_replace('number your street, zipcode town, FR', '([EURT]){2,}', '[\1]', 9, 1, 'i');
+SELECT regexp_replace('number your street, zipcode town, FR', '([EURT]){2,}', '[\1]', 9, 2, 'i');
+SELECT regexp_replace ('A PostgreSQL function', 'A|e|i|o|u', 'X', 1, 2);
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 0, 'i');
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 1, 'i');
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 2, 'i');
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 3, 'i');
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 9, 'i');
+SELECT regexp_replace ('A PostgreSQL function', 'A|e|i|o|u', 'X', 1, 9);
+-- Invalid parameter values
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', -1, 0, 'i');
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, -1, 'i');
+-- Modifier 'g' should not be taken in account, we have an occurrence to replace
+SELECT regexp_replace ('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 1, 'g');
+-- Find groups of "word characters" (letters, numbers and underscore) ending with
+-- "thy" in the string "healthy, wealthy, and wise" and replace them with nothing.
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', '', 'g');
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', '', 1, 0);
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', '', 1, 0, 'g');
+-- Find groups of word characters ending with "thy" and replace with "something."
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', 'something', 'g');
+-- Find groups of word characters ending with "thy" and replace with the string
+-- "something" starting at the third character in the string.
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', 'something', 3, 0);
+-- Replace the second group of word characters ending with "thy" with "something."
+SELECT regexp_replace('healthy, wealthy, and wise','\w+thy', 'something', 1, 2);
+-- Find groups of word characters ending with "thy" capturing the letters before
+-- the "thy", and replace with the captured letters plus the letters "ish."
+SELECT regexp_replace('healthy, wealthy, and wise','(\w+)thy', '\1ish', 'g');
+SELECT regexp_replace('healthy, wealthy, and wise','(\w+)thy', '\1ish', 1, 0);
+-- Ambiguous use of option parameter with regexp_replace()
+PREPARE rr AS SELECT regexp_replace('healthy, wealthy, and wise','(\w+)thy', '\1ish', $1);
+EXECUTE rr(1);
+DEALLOCATE rr;
+
-- change NULL-display back
\pset null ''
Le 01/08/2021 à 19:23, Tom Lane a écrit :
I've been working through this patch, and trying to verify
compatibility against Oracle and DB2, and I see some points that need
discussion or at least recording for the archives.* In Oracle, while the documentation for regexp_instr says that
return_option should only be 0 or 1, experimentation with sqlfiddle
shows that any nonzero value is silently treated as 1. The patch
raises an error for other values, which I think is a good idea.
(IBM's docs say that DB2 raises an error too, though I can't test
that.) We don't need to be bug-compatible to that extent.* What should happen when the subexpression/capture group number of
regexp_instr or regexp_substr exceeds the number of parenthesized
subexpressions of the regexp? Oracle silently returns a no-match
result (0 or NULL), as does this patch. However, IBM's docs say
that DB2 raises an error. I'm inclined to think that this is
likewise taking bug-compatibility too far, and that we should
raise an error like DB2. There are clearly cases where throwing
an error would help debug a faulty call, while I'm less clear on
a use-case where not throwing an error would be useful.* IBM's docs say that both regexp_count and regexp_like have
arguments "string, pattern [, start] [, flags]" --- that is,
each of start and flags can be independently specified or omitted.
The patch follows Oracle, which has no start option for
regexp_like, and where you can't write flags for regexp_count
without writing start. This is fine by me, because doing these
like DB2 would introduce the same which-argument-is-this issues
as we're being forced to cope with for regexp_replace. I don't
think we need to accept ambiguity in these cases too. But it's
worth memorializing this decision in the thread.* The patch has most of these functions silently ignoring the 'g'
flag, but I think they should raise errors instead. Oracle doesn't
accept a 'g' flag for these, so why should we? The only case where
that logic doesn't hold is regexp_replace, because depending on which
syntax you use the 'g' flag might or might not be meaningful. So
for regexp_replace, I'd vote for silently ignoring 'g' if the
occurrence-number parameter is given, while honoring it if not.I've already made changes in my local copy per the last item,
but I've not done anything about throwing errors for out-of-range
subexpression numbers. Anybody have an opinion about that one?
I thought about this while I was implementing the functions and chose to
not throw an error because of the Oracle behavior and also with others
regular expression implementation. For example in Perl there is no error:
$ perl -e '$str="hello world"; $str =~ s/(l)/$20/; print "$str\n";'
helo world
Usually a regular expression is always tested by its creator to be sure
that this the right one and that it does what is expected. But I agree
that it could help the writer to debug its RE.
Also if I recall well Oracle and DB2 limit the number of capture groups
back references from \1 to \9 for Oracle and \0 to \9 for DB2. I have
chosen to not apply this limit, I don't see the interest of such a
limitation.
--
Gilles Darold
http://www.darold.net/
Gilles Darold <gilles@darold.net> writes:
[ v5-0001-regexp-foo-functions.patch ]
I've gone through this whole patch now, and found quite a lot that I did
not like. In no particular order:
* Wrapping parentheses around the user's regexp doesn't work. It can
turn an invalid regexp into a valid one: for example 'a)(b' should draw
a syntax error. With this patch, no error would be thrown, but the
"outer" parens wouldn't do what you expected. Worse, it can turn a
valid regexp into an invalid one: the metasyntax options described in
9.7.3.4 only work at the start of the regexp. So we have to handle
whole-regexp cases honestly rather than trying to turn them into an
instance of the parenthesized-subexpression case.
* You did a lot of things quite inefficiently, apparently to avoid
touching any existing code. I think it's better to extend
setup_regexp_matches() and replace_text_regexp() a little bit so that
they can support the behaviors these new functions need. In both of
them, it's absolutely trivial to allow a search start position to be
passed in; and it doesn't take much to teach replace_text_regexp()
to replace only the N'th match.
* Speaking of N'th, there is not much of anything that I like
about Oracle's terminology for the function arguments, and I don't
think we ought to adopt it. If we're documenting the functions as
processing the "N'th match", it seems to me to be natural to call
the parameter "N" not "occurrence". Speaking of the "occurrence'th
occurrence" is just silly, not to mention long and easy to misspell.
Likewise, "position" is a horribly vague term for the search start
position; it could be interpreted to mean several other things.
"start" seems much better. "return_opt" is likewise awfully unclear.
I went with "endoption" below, though I could be talked into something
else. The only one of Oracle's choices that I like is "subexpr" for
subexpression number ... but you went with DB2's rather vague "group"
instead. I don't want to use their "capture group" terminology,
because that appears nowhere else in our documentation. Our existing
terminology is "parenthesized subexpression", which seems fine to me
(and also agrees with Oracle's docs).
* I spent a lot of time on the docs too. A lot of the syntax specs
were wrong (where you put the brackets matters), many of the examples
seemed confusingly overcomplicated, and the text explanations needed
copy-editing.
* Also, the regression tests seemed misguided. This patch is not
responsible for testing the regexp engine as such; we have tests
elsewhere that do that. So I don't think we need complex regexps
here. We just need to verify that the parameters of these functions
act properly, and check their error cases. That can be done much
more quickly and straightforwardly than what you had.
So here's a revised version that I like better. I think this
is pretty nearly committable, aside from the question of whether
a too-large subexpression number should be an error or not.
regards, tom lane
Attachments:
v6-0001-regexp-foo-functions.patchtext/x-diff; charset=us-ascii; name=v6-0001-regexp-foo-functions.patchDownload
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index a5b6adc4bb..80aac4965e 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3108,6 +3108,78 @@ repeat('Pg', 4) <returnvalue>PgPgPgPg</returnvalue>
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>regexp_count</primary>
+ </indexterm>
+ <function>regexp_count</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type>
+ [, <parameter>start</parameter> <type>integer</type>
+ [, <parameter>flags</parameter> <type>text</type> ] ] )
+ <returnvalue>integer</returnvalue>
+ </para>
+ <para>
+ Returns the number of times the POSIX regular
+ expression <parameter>pattern</parameter> matches in
+ the <parameter>string</parameter>; see
+ <xref linkend="functions-posix-regexp"/>.
+ </para>
+ <para>
+ <literal>regexp_count('123456789012', '\d\d\d', 2)</literal>
+ <returnvalue>3</returnvalue>
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>regexp_instr</primary>
+ </indexterm>
+ <function>regexp_instr</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type>
+ [, <parameter>start</parameter> <type>integer</type>
+ [, <parameter>N</parameter> <type>integer</type>
+ [, <parameter>endoption</parameter> <type>integer</type>
+ [, <parameter>flags</parameter> <type>text</type>
+ [, <parameter>subexpr</parameter> <type>integer</type> ] ] ] ] ] )
+ <returnvalue>integer</returnvalue>
+ </para>
+ <para>
+ Returns the position within <parameter>string</parameter> where
+ the <parameter>N</parameter>'th match of the POSIX regular
+ expression <parameter>pattern</parameter> occurs, or zero if there is
+ no such match; see <xref linkend="functions-posix-regexp"/>.
+ </para>
+ <para>
+ <literal>regexp_instr('ABCDEF', 'c(.)(..)', 1, 1, 0, 'i')</literal>
+ <returnvalue>3</returnvalue>
+ </para>
+ <para>
+ <literal>regexp_instr('ABCDEF', 'c(.)(..)', 1, 1, 0, 'i', 2)</literal>
+ <returnvalue>5</returnvalue>
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>regexp_like</primary>
+ </indexterm>
+ <function>regexp_like</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type>
+ [, <parameter>flags</parameter> <type>text</type> ] )
+ <returnvalue>boolean</returnvalue>
+ </para>
+ <para>
+ Checks whether a match of the POSIX regular
+ expression <parameter>pattern</parameter> occurs
+ within <parameter>string</parameter>; see
+ <xref linkend="functions-posix-regexp"/>.
+ </para>
+ <para>
+ <literal>regexp_like('Hello World', 'world$', 'i')</literal>
+ <returnvalue>t</returnvalue>
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
@@ -3117,8 +3189,9 @@ repeat('Pg', 4) <returnvalue>PgPgPgPg</returnvalue>
<returnvalue>text[]</returnvalue>
</para>
<para>
- Returns captured substrings resulting from the first match of a POSIX
- regular expression to the <parameter>string</parameter>; see
+ Returns captured substrings resulting from the first match of the
+ POSIX regular expression <parameter>pattern</parameter> to
+ the <parameter>string</parameter>; see
<xref linkend="functions-posix-regexp"/>.
</para>
<para>
@@ -3136,10 +3209,11 @@ repeat('Pg', 4) <returnvalue>PgPgPgPg</returnvalue>
<returnvalue>setof text[]</returnvalue>
</para>
<para>
- Returns captured substrings resulting from the first match of a
- POSIX regular expression to the <parameter>string</parameter>,
- or multiple matches if the <literal>g</literal> flag is used;
- see <xref linkend="functions-posix-regexp"/>.
+ Returns captured substrings resulting from the first match of the
+ POSIX regular expression <parameter>pattern</parameter> to
+ the <parameter>string</parameter>, or all matches if
+ the <literal>g</literal> flag is used; see
+ <xref linkend="functions-posix-regexp"/>.
</para>
<para>
<literal>regexp_matches('foobarbequebaz', 'ba.', 'g')</literal>
@@ -3156,14 +3230,16 @@ repeat('Pg', 4) <returnvalue>PgPgPgPg</returnvalue>
<indexterm>
<primary>regexp_replace</primary>
</indexterm>
- <function>regexp_replace</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type>, <parameter>replacement</parameter> <type>text</type> [, <parameter>flags</parameter> <type>text</type> ] )
+ <function>regexp_replace</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type>, <parameter>replacement</parameter> <type>text</type>
+ [, <parameter>start</parameter> <type>integer</type> ]
+ [, <parameter>flags</parameter> <type>text</type> ] )
<returnvalue>text</returnvalue>
</para>
<para>
- Replaces substrings resulting from the first match of a
- POSIX regular expression, or multiple substring matches
- if the <literal>g</literal> flag is used; see <xref
- linkend="functions-posix-regexp"/>.
+ Replaces the substring that is the first match to the POSIX
+ regular expression <parameter>pattern</parameter>, or all matches
+ if the <literal>g</literal> flag is used; see
+ <xref linkend="functions-posix-regexp"/>.
</para>
<para>
<literal>regexp_replace('Thomas', '.[mN]a.', 'M')</literal>
@@ -3171,6 +3247,26 @@ repeat('Pg', 4) <returnvalue>PgPgPgPg</returnvalue>
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <function>regexp_replace</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type>, <parameter>replacement</parameter> <type>text</type>,
+ <parameter>start</parameter> <type>integer</type>,
+ <parameter>N</parameter> <type>integer</type>
+ [, <parameter>flags</parameter> <type>text</type> ] )
+ <returnvalue>text</returnvalue>
+ </para>
+ <para>
+ Replaces the substring that is the <parameter>N</parameter>'th
+ match to the POSIX regular expression <parameter>pattern</parameter>,
+ or all matches if <parameter>N</parameter> is zero; see
+ <xref linkend="functions-posix-regexp"/>.
+ </para>
+ <para>
+ <literal>regexp_replace('Thomas', '.', 'X', 3, 2)</literal>
+ <returnvalue>ThoXas</returnvalue>
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
@@ -3213,6 +3309,35 @@ repeat('Pg', 4) <returnvalue>PgPgPgPg</returnvalue>
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>regexp_substr</primary>
+ </indexterm>
+ <function>regexp_substr</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type>
+ [, <parameter>start</parameter> <type>integer</type>
+ [, <parameter>N</parameter> <type>integer</type>
+ [, <parameter>flags</parameter> <type>text</type>
+ [, <parameter>subexpr</parameter> <type>integer</type> ] ] ] ] )
+ <returnvalue>text</returnvalue>
+ </para>
+ <para>
+ Returns the substring within <parameter>string</parameter> that
+ matches the <parameter>N</parameter>'th occurrence of the POSIX
+ regular expression <parameter>pattern</parameter>,
+ or <literal>NULL</literal> if there is no such match; see
+ <xref linkend="functions-posix-regexp"/>.
+ </para>
+ <para>
+ <literal>regexp_substr('ABCDEF', 'c(.)(..)', 1, 1, 'i')</literal>
+ <returnvalue>CDEF</returnvalue>
+ </para>
+ <para>
+ <literal>regexp_substr('ABCDEF', 'c(.)(..)', 1, 1, 'i', 2)</literal>
+ <returnvalue>EF</returnvalue>
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
@@ -5377,6 +5502,15 @@ substring('foobar' similar '#"o_b#"%' escape '#') <lineannotation>NULL</linea
<indexterm>
<primary>substring</primary>
</indexterm>
+ <indexterm>
+ <primary>regexp_count</primary>
+ </indexterm>
+ <indexterm>
+ <primary>regexp_instr</primary>
+ </indexterm>
+ <indexterm>
+ <primary>regexp_like</primary>
+ </indexterm>
<indexterm>
<primary>regexp_match</primary>
</indexterm>
@@ -5392,6 +5526,9 @@ substring('foobar' similar '#"o_b#"%' escape '#') <lineannotation>NULL</linea
<indexterm>
<primary>regexp_split_to_array</primary>
</indexterm>
+ <indexterm>
+ <primary>regexp_substr</primary>
+ </indexterm>
<para>
<xref linkend="functions-posix-table"/> lists the available
@@ -5542,6 +5679,109 @@ substring('foobar' from 'o(.)b') <lineannotation>o</lineannotation>
</programlisting>
</para>
+ <para>
+ The <function>regexp_count</function> function counts the number of
+ places where a POSIX regular expression pattern matches a string.
+ It has the syntax
+ <function>regexp_count</function>(<replaceable>string</replaceable>,
+ <replaceable>pattern</replaceable>
+ <optional>, <replaceable>start</replaceable>
+ <optional>, <replaceable>flags</replaceable>
+ </optional></optional>).
+ <replaceable>pattern</replaceable> is searched for
+ in <replaceable>string</replaceable>, normally from the beginning of
+ the string, but if the <replaceable>start</replaceable> parameter is
+ provided then beginning from that character index.
+ The <replaceable>flags</replaceable> parameter is an optional text
+ string containing zero or more single-letter flags that change the
+ function's behavior. For example, including <literal>i</literal> in
+ <replaceable>flags</replaceable> specifies case-insensitive matching.
+ Supported flags are described in
+ <xref linkend="posix-embedded-options-table"/>.
+ </para>
+
+ <para>
+ Some examples:
+<programlisting>
+regexp_count('ABCABCAXYaxy', 'A.') <lineannotation>3</lineannotation>
+regexp_count('ABCABCAXYaxy', 'A.', 1, 'i') <lineannotation>4</lineannotation>
+</programlisting>
+ </para>
+
+ <para>
+ The <function>regexp_instr</function> function returns the starting or
+ ending position of the <replaceable>N</replaceable>'th match of a
+ POSIX regular expression pattern to a string, or zero if there is no
+ such match. It has the syntax
+ <function>regexp_instr</function>(<replaceable>string</replaceable>,
+ <replaceable>pattern</replaceable>
+ <optional>, <replaceable>start</replaceable>
+ <optional>, <replaceable>N</replaceable>
+ <optional>, <replaceable>endoption</replaceable>
+ <optional>, <replaceable>flags</replaceable>
+ <optional>, <replaceable>subexpr</replaceable>
+ </optional></optional></optional></optional></optional>).
+ <replaceable>pattern</replaceable> is searched for
+ in <replaceable>string</replaceable>, normally from the beginning of
+ the string, but if the <replaceable>start</replaceable> parameter is
+ provided then beginning from that character index.
+ If <replaceable>N</replaceable> is specified
+ then the <replaceable>N</replaceable>'th match of the pattern
+ is located, otherwise the first match is located.
+ If the <replaceable>endoption</replaceable> parameter is omitted or
+ specified as zero, the function returns the position of the first
+ character of the match. Otherwise, <replaceable>endoption</replaceable>
+ must be one, and the function returns the position of the character
+ following the match.
+ The <replaceable>flags</replaceable> parameter is an optional text
+ string containing zero or more single-letter flags that change the
+ function's behavior. Supported flags are described
+ in <xref linkend="posix-embedded-options-table"/>.
+ For a pattern containing parenthesized
+ subexpressions, <replaceable>subexpr</replaceable> is an integer
+ indicating which subexpression is of interest: the result identifies
+ the position of the substring matching that subexpression.
+ Subexpressions are numbered in the order of their leading parentheses.
+ When <replaceable>subexpr</replaceable> is omitted or zero, the result
+ identifies the position of the whole match regardless of
+ parenthesized subexpressions.
+ </para>
+
+ <para>
+ Some examples:
+<programlisting>
+regexp_instr('number of your street, town zip, FR', '[^,]+', 1, 2)
+ <lineannotation>23</lineannotation>
+regexp_instr('ABCDEFGHI', '(c..)(...)', 1, 1, 0, 'i', 2)
+ <lineannotation>6</lineannotation>
+</programlisting>
+ </para>
+
+ <para>
+ The <function>regexp_like</function> function checks whether a match
+ of a POSIX regular expression pattern occurs within a string,
+ returning boolean true or false. It has the syntax
+ <function>regexp_like</function>(<replaceable>string</replaceable>,
+ <replaceable>pattern</replaceable>
+ <optional>, <replaceable>flags</replaceable> </optional>).
+ The <replaceable>flags</replaceable> parameter is an optional text
+ string containing zero or more single-letter flags that change the
+ function's behavior. Supported flags are described
+ in <xref linkend="posix-embedded-options-table"/>.
+ This function has the same results as the <literal>~</literal>
+ operator if no flags are specified. If only the <literal>i</literal>
+ flag is specified, it has the same results as
+ the <literal>~*</literal> operator.
+ </para>
+
+ <para>
+ Some examples:
+<programlisting>
+regexp_like('Hello World', 'world') <lineannotation>false</lineannotation>
+regexp_like('Hello World', 'world', 'i') <lineannotation>true</lineannotation>
+</programlisting>
+ </para>
+
<para>
The <function>regexp_match</function> function returns a text array of
captured substring(s) resulting from the first match of a POSIX
@@ -5579,8 +5819,17 @@ SELECT regexp_match('foobarbequebaz', '(bar)(beque)');
{bar,beque}
(1 row)
</programlisting>
- In the common case where you just want the whole matching substring
- or <literal>NULL</literal> for no match, write something like
+ </para>
+
+ <tip>
+ <para>
+ In the common case where you just want the whole matching substring
+ or <literal>NULL</literal> for no match, the best solution is to
+ use <function>regexp_substr()</function>.
+ However, <function>regexp_substr()</function> only exists
+ in <productname>PostgreSQL</productname> version 15 and up. When
+ working in older versions, you can extract the first element
+ of <function>regexp_match()</function>'s result, for example:
<programlisting>
SELECT (regexp_match('foobarbequebaz', 'bar.*que'))[1];
regexp_match
@@ -5588,7 +5837,8 @@ SELECT (regexp_match('foobarbequebaz', 'bar.*que'))[1];
barbeque
(1 row)
</programlisting>
- </para>
+ </para>
+ </tip>
<para>
The <function>regexp_matches</function> function returns a set of text arrays
@@ -5650,7 +5900,13 @@ SELECT col1, (SELECT regexp_matches(col2, '(bar)(beque)')) FROM tab;
It has the syntax
<function>regexp_replace</function>(<replaceable>source</replaceable>,
<replaceable>pattern</replaceable>, <replaceable>replacement</replaceable>
+ <optional>, <replaceable>start</replaceable>
+ <optional>, <replaceable>N</replaceable>
+ </optional></optional>
<optional>, <replaceable>flags</replaceable> </optional>).
+ (Notice that <replaceable>N</replaceable> cannot be specified
+ unless <replaceable>start</replaceable> is,
+ but <replaceable>flags</replaceable> can be given in any case.)
The <replaceable>source</replaceable> string is returned unchanged if
there is no match to the <replaceable>pattern</replaceable>. If there is a
match, the <replaceable>source</replaceable> string is returned with the
@@ -5663,11 +5919,22 @@ SELECT col1, (SELECT regexp_matches(col2, '(bar)(beque)')) FROM tab;
substring matching the entire pattern should be inserted. Write
<literal>\\</literal> if you need to put a literal backslash in the replacement
text.
+ <replaceable>pattern</replaceable> is searched for
+ in <replaceable>string</replaceable>, normally from the beginning of
+ the string, but if the <replaceable>start</replaceable> parameter is
+ provided then beginning from that character index.
+ By default, only the first match of the pattern is replaced.
+ If <replaceable>N</replaceable> is specified and is greater than zero,
+ then the <replaceable>N</replaceable>'th match of the pattern
+ is replaced.
+ If the <literal>g</literal> flag is given, or
+ if <replaceable>N</replaceable> is specified and is zero, then all
+ matches at or after the <replaceable>start</replaceable> position are
+ replaced. (The <literal>g</literal> flag is ignored
+ when <replaceable>N</replaceable> is specified.)
The <replaceable>flags</replaceable> parameter is an optional text
string containing zero or more single-letter flags that change the
- function's behavior. Flag <literal>i</literal> specifies case-insensitive
- matching, while flag <literal>g</literal> specifies replacement of each matching
- substring rather than only the first one. Supported flags (though
+ function's behavior. Supported flags (though
not <literal>g</literal>) are
described in <xref linkend="posix-embedded-options-table"/>.
</para>
@@ -5681,6 +5948,10 @@ regexp_replace('foobarbaz', 'b..', 'X', 'g')
<lineannotation>fooXX</lineannotation>
regexp_replace('foobarbaz', 'b(..)', 'X\1Y', 'g')
<lineannotation>fooXarYXazY</lineannotation>
+regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 0, 'i')
+ <lineannotation>X PXstgrXSQL fXnctXXn</lineannotation>
+regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 3, 'i')
+ <lineannotation>A PostgrXSQL function</lineannotation>
</programlisting>
</para>
@@ -5712,7 +5983,6 @@ regexp_replace('foobarbaz', 'b(..)', 'X\1Y', 'g')
<para>
Some examples:
<programlisting>
-
SELECT foo FROM regexp_split_to_table('the quick brown fox jumps over the lazy dog', '\s+') AS foo;
foo
-------
@@ -5761,11 +6031,51 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', '\s*') AS foo;
zero-length matches that occur at the start or end of the string
or immediately after a previous match. This is contrary to the strict
definition of regexp matching that is implemented by
- <function>regexp_match</function> and
- <function>regexp_matches</function>, but is usually the most convenient behavior
+ the other regexp functions, but is usually the most convenient behavior
in practice. Other software systems such as Perl use similar definitions.
</para>
+ <para>
+ The <function>regexp_substr</function> function returns the substring
+ that matches a POSIX regular expression pattern,
+ or <literal>NULL</literal> if there is no match. It has the syntax
+ <function>regexp_substr</function>(<replaceable>string</replaceable>,
+ <replaceable>pattern</replaceable>
+ <optional>, <replaceable>start</replaceable>
+ <optional>, <replaceable>N</replaceable>
+ <optional>, <replaceable>flags</replaceable>
+ <optional>, <replaceable>subexpr</replaceable>
+ </optional></optional></optional></optional>).
+ <replaceable>pattern</replaceable> is searched for
+ in <replaceable>string</replaceable>, normally from the beginning of
+ the string, but if the <replaceable>start</replaceable> parameter is
+ provided then beginning from that character index.
+ If <replaceable>N</replaceable> is specified
+ then the <replaceable>N</replaceable>'th match of the pattern
+ is returned, otherwise the first match is returned.
+ The <replaceable>flags</replaceable> parameter is an optional text
+ string containing zero or more single-letter flags that change the
+ function's behavior. Supported flags are described
+ in <xref linkend="posix-embedded-options-table"/>.
+ For a pattern containing parenthesized
+ subexpressions, <replaceable>subexpr</replaceable> is an integer
+ indicating which subexpression is of interest: the result is the
+ substring matching that subexpression.
+ Subexpressions are numbered in the order of their leading parentheses.
+ When <replaceable>subexpr</replaceable> is omitted or zero, the result
+ is the whole match regardless of parenthesized subexpressions.
+ </para>
+
+ <para>
+ Some examples:
+<programlisting>
+regexp_substr('number of your street, town zip, FR', '[^,]+', 1, 2)
+ <lineannotation> town zip</lineannotation>
+regexp_substr('ABCDEFGHI', '(c..)(...)', 1, 1, 'i', 2)
+ <lineannotation>FGH</lineannotation>
+</programlisting>
+ </para>
+
<!-- derived from the re_syntax.n man page -->
<sect3 id="posix-syntax-details">
diff --git a/src/backend/utils/adt/regexp.c b/src/backend/utils/adt/regexp.c
index a32c5c82ab..484d4265fd 100644
--- a/src/backend/utils/adt/regexp.c
+++ b/src/backend/utils/adt/regexp.c
@@ -113,6 +113,7 @@ static cached_re_str re_array[MAX_CACHED_RES]; /* cached re's */
/* Local functions */
static regexp_matches_ctx *setup_regexp_matches(text *orig_str, text *pattern,
pg_re_flags *flags,
+ int start_search,
Oid collation,
bool use_subpatterns,
bool ignore_degenerate,
@@ -629,7 +630,7 @@ textregexreplace_noopt(PG_FUNCTION_ARGS)
re = RE_compile_and_cache(p, REG_ADVANCED, PG_GET_COLLATION());
- PG_RETURN_TEXT_P(replace_text_regexp(s, (void *) re, r, false));
+ PG_RETURN_TEXT_P(replace_text_regexp(s, (void *) re, r, 0, 1));
}
/*
@@ -646,11 +647,97 @@ textregexreplace(PG_FUNCTION_ARGS)
regex_t *re;
pg_re_flags flags;
+ /*
+ * regexp_replace() with four arguments will be preferentially resolved as
+ * this form when the fourth argument is of type UNKNOWN. However, the
+ * user might have intended to call textregexreplace_extended_no_n. If we
+ * see flags that look like an integer, emit the same error that
+ * parse_re_flags would, but add a HINT about how to fix it.
+ */
+ if (VARSIZE_ANY_EXHDR(opt) > 0)
+ {
+ char *opt_p = VARDATA_ANY(opt);
+
+ if (*opt_p >= '0' && *opt_p <= '9')
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid regular expression option: \"%.*s\"",
+ pg_mblen(opt_p), opt_p),
+ errhint("If you meant to use regexp_replace() with a start parameter, cast the fourth argument to integer explicitly.")));
+ }
+
parse_re_flags(&flags, opt);
re = RE_compile_and_cache(p, flags.cflags, PG_GET_COLLATION());
- PG_RETURN_TEXT_P(replace_text_regexp(s, (void *) re, r, flags.glob));
+ PG_RETURN_TEXT_P(replace_text_regexp(s, (void *) re, r, 0,
+ flags.glob ? 0 : 1));
+}
+
+/*
+ * textregexreplace_extended()
+ * Return a string matched by a regular expression, with replacement.
+ * Extends textregexreplace by allowing a start position and the
+ * choice of the occurrence to replace (0 means all occurrences).
+ */
+Datum
+textregexreplace_extended(PG_FUNCTION_ARGS)
+{
+ text *s = PG_GETARG_TEXT_PP(0);
+ text *p = PG_GETARG_TEXT_PP(1);
+ text *r = PG_GETARG_TEXT_PP(2);
+ int start = 1;
+ int n = 1;
+ text *flags = PG_GETARG_TEXT_PP_IF_EXISTS(5);
+ pg_re_flags re_flags;
+ regex_t *re;
+
+ /* Collect optional parameters */
+ if (PG_NARGS() > 3)
+ {
+ start = PG_GETARG_INT32(3);
+ if (start <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d",
+ "start", start)));
+ }
+ if (PG_NARGS() > 4)
+ {
+ n = PG_GETARG_INT32(4);
+ if (n < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d",
+ "n", n)));
+ }
+
+ /* Determine options */
+ parse_re_flags(&re_flags, flags);
+
+ /* If N was not specified, deduce it from the 'g' flag */
+ if (PG_NARGS() <= 4)
+ n = re_flags.glob ? 0 : 1;
+
+ /* Compile the regular expression */
+ re = RE_compile_and_cache(p, re_flags.cflags, PG_GET_COLLATION());
+
+ /* Do the replacement(s) */
+ PG_RETURN_TEXT_P(replace_text_regexp(s, (void *) re, r, start - 1, n));
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+textregexreplace_extended_no_n(PG_FUNCTION_ARGS)
+{
+ return textregexreplace_extended(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+textregexreplace_extended_no_flags(PG_FUNCTION_ARGS)
+{
+ return textregexreplace_extended(fcinfo);
}
/*
@@ -958,6 +1045,235 @@ similar_escape(PG_FUNCTION_ARGS)
PG_RETURN_TEXT_P(result);
}
+/*
+ * regexp_count()
+ * Return the number of matches of a pattern within a string.
+ */
+Datum
+regexp_count(PG_FUNCTION_ARGS)
+{
+ text *str = PG_GETARG_TEXT_PP(0);
+ text *pattern = PG_GETARG_TEXT_PP(1);
+ int start = 1;
+ text *flags = PG_GETARG_TEXT_PP_IF_EXISTS(3);
+ pg_re_flags re_flags;
+ regexp_matches_ctx *matchctx;
+
+ /* Collect optional parameters */
+ if (PG_NARGS() > 2)
+ {
+ start = PG_GETARG_INT32(2);
+ if (start <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d",
+ "start", start)));
+ }
+
+ /* Determine options */
+ parse_re_flags(&re_flags, flags);
+ /* User mustn't specify 'g' */
+ if (re_flags.glob)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /* translator: %s is a SQL function name */
+ errmsg("%s does not support the \"global\" option",
+ "regexp_count()")));
+ /* But we find all the matches anyway */
+ re_flags.glob = true;
+
+ /* Do the matching */
+ matchctx = setup_regexp_matches(str, pattern, &re_flags, start - 1,
+ PG_GET_COLLATION(),
+ false, /* can ignore subexprs */
+ false, false);
+
+ PG_RETURN_INT32(matchctx->nmatches);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_count_no_start(PG_FUNCTION_ARGS)
+{
+ return regexp_count(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_count_no_flags(PG_FUNCTION_ARGS)
+{
+ return regexp_count(fcinfo);
+}
+
+/*
+ * regexp_instr()
+ * Return the match's position within the string
+ */
+Datum
+regexp_instr(PG_FUNCTION_ARGS)
+{
+ text *str = PG_GETARG_TEXT_PP(0);
+ text *pattern = PG_GETARG_TEXT_PP(1);
+ int start = 1;
+ int n = 1;
+ int endoption = 0;
+ text *flags = PG_GETARG_TEXT_PP_IF_EXISTS(5);
+ int subexpr = 0;
+ int pos;
+ pg_re_flags re_flags;
+ regexp_matches_ctx *matchctx;
+
+ /* Collect optional parameters */
+ if (PG_NARGS() > 2)
+ {
+ start = PG_GETARG_INT32(2);
+ if (start <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d",
+ "start", start)));
+ }
+ if (PG_NARGS() > 3)
+ {
+ n = PG_GETARG_INT32(3);
+ if (n <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d",
+ "n", n)));
+ }
+ if (PG_NARGS() > 4)
+ {
+ endoption = PG_GETARG_INT32(4);
+ if (endoption != 0 && endoption != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d",
+ "endoption", endoption)));
+ }
+ if (PG_NARGS() > 6)
+ {
+ subexpr = PG_GETARG_INT32(6);
+ if (subexpr < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d",
+ "subexpr", subexpr)));
+ }
+
+ /* Determine options */
+ parse_re_flags(&re_flags, flags);
+ /* User mustn't specify 'g' */
+ if (re_flags.glob)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /* translator: %s is a SQL function name */
+ errmsg("%s does not support the \"global\" option",
+ "regexp_instr()")));
+ /* But we find all the matches anyway */
+ re_flags.glob = true;
+
+ /* Do the matching */
+ matchctx = setup_regexp_matches(str, pattern, &re_flags, start - 1,
+ PG_GET_COLLATION(),
+ (subexpr > 0), /* need submatches? */
+ false, false);
+
+ /* When n exceeds matches return 0 (includes case of no matches) */
+ if (n > matchctx->nmatches)
+ PG_RETURN_INT32(0);
+
+ /* When subexpr exceeds number of subexpressions return 0 */
+ if (subexpr > matchctx->npatterns)
+ PG_RETURN_INT32(0);
+
+ /* Select the appropriate match position to return */
+ pos = (n - 1) * matchctx->npatterns;
+ if (subexpr > 0)
+ pos += subexpr - 1;
+ pos *= 2;
+ if (endoption == 1)
+ pos += 1;
+
+ if (matchctx->match_locs[pos] >= 0)
+ PG_RETURN_INT32(matchctx->match_locs[pos] + 1);
+ else
+ PG_RETURN_INT32(0); /* position not identifiable */
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_instr_no_start(PG_FUNCTION_ARGS)
+{
+ return regexp_instr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_instr_no_n(PG_FUNCTION_ARGS)
+{
+ return regexp_instr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_instr_no_endoption(PG_FUNCTION_ARGS)
+{
+ return regexp_instr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_instr_no_flags(PG_FUNCTION_ARGS)
+{
+ return regexp_instr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_instr_no_subexpr(PG_FUNCTION_ARGS)
+{
+ return regexp_instr(fcinfo);
+}
+
+/*
+ * regexp_like()
+ * Test for a pattern match within a string.
+ */
+Datum
+regexp_like(PG_FUNCTION_ARGS)
+{
+ text *str = PG_GETARG_TEXT_PP(0);
+ text *pattern = PG_GETARG_TEXT_PP(1);
+ text *flags = PG_GETARG_TEXT_PP_IF_EXISTS(2);
+ pg_re_flags re_flags;
+
+ /* Determine options */
+ parse_re_flags(&re_flags, flags);
+ /* User mustn't specify 'g' */
+ if (re_flags.glob)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /* translator: %s is a SQL function name */
+ errmsg("%s does not support the \"global\" option",
+ "regexp_like()")));
+
+ /* Otherwise it's like textregexeq/texticregexeq */
+ PG_RETURN_BOOL(RE_compile_and_execute(pattern,
+ VARDATA_ANY(str),
+ VARSIZE_ANY_EXHDR(str),
+ re_flags.cflags,
+ PG_GET_COLLATION(),
+ 0, NULL));
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_like_no_flags(PG_FUNCTION_ARGS)
+{
+ return regexp_like(fcinfo);
+}
+
/*
* regexp_match()
* Return the first substring(s) matching a pattern within a string.
@@ -982,7 +1298,7 @@ regexp_match(PG_FUNCTION_ARGS)
"regexp_match()"),
errhint("Use the regexp_matches function instead.")));
- matchctx = setup_regexp_matches(orig_str, pattern, &re_flags,
+ matchctx = setup_regexp_matches(orig_str, pattern, &re_flags, 0,
PG_GET_COLLATION(), true, false, false);
if (matchctx->nmatches == 0)
@@ -1029,7 +1345,7 @@ regexp_matches(PG_FUNCTION_ARGS)
/* be sure to copy the input string into the multi-call ctx */
matchctx = setup_regexp_matches(PG_GETARG_TEXT_P_COPY(0), pattern,
- &re_flags,
+ &re_flags, 0,
PG_GET_COLLATION(),
true, false, false);
@@ -1064,24 +1380,28 @@ regexp_matches_no_flags(PG_FUNCTION_ARGS)
}
/*
- * setup_regexp_matches --- do the initial matching for regexp_match
- * and regexp_split functions
+ * setup_regexp_matches --- do the initial matching for regexp_match,
+ * regexp_split, and related functions
*
* To avoid having to re-find the compiled pattern on each call, we do
* all the matching in one swoop. The returned regexp_matches_ctx contains
* the locations of all the substrings matching the pattern.
*
- * The three bool parameters have only two patterns (one for matching, one for
- * splitting) but it seems clearer to distinguish the functionality this way
- * than to key it all off one "is_split" flag. We don't currently assume that
- * fetching_unmatched is exclusive of fetching the matched text too; if it's
- * set, the conversion buffer is large enough to fetch any single matched or
- * unmatched string, but not any larger substring. (In practice, when splitting
- * the matches are usually small anyway, and it didn't seem worth complicating
- * the code further.)
+ * start_search: the character (not byte) offset in orig_str at which to
+ * begin the search. Returned positions are relative to orig_str anyway.
+ * use_subpatterns: collect data about matches to parenthesized subexpressions.
+ * ignore_degenerate: ignore zero-length matches.
+ * fetching_unmatched: caller wants to fetch unmatched substrings.
+ *
+ * We don't currently assume that fetching_unmatched is exclusive of fetching
+ * the matched text too; if it's set, the conversion buffer is large enough to
+ * fetch any single matched or unmatched string, but not any larger
+ * substring. (In practice, when splitting the matches are usually small
+ * anyway, and it didn't seem worth complicating the code further.)
*/
static regexp_matches_ctx *
setup_regexp_matches(text *orig_str, text *pattern, pg_re_flags *re_flags,
+ int start_search,
Oid collation,
bool use_subpatterns,
bool ignore_degenerate,
@@ -1099,7 +1419,6 @@ setup_regexp_matches(text *orig_str, text *pattern, pg_re_flags *re_flags,
int array_idx;
int prev_match_end;
int prev_valid_match_end;
- int start_search;
int maxlen = 0; /* largest fetch length in characters */
/* save original string --- we'll extract result substrings from it */
@@ -1142,7 +1461,6 @@ setup_regexp_matches(text *orig_str, text *pattern, pg_re_flags *re_flags,
/* search for the pattern, perhaps repeatedly */
prev_match_end = 0;
prev_valid_match_end = 0;
- start_search = 0;
while (RE_wchar_execute(cpattern, wide_str, wide_len, start_search,
pmatch_len, pmatch))
{
@@ -1367,7 +1685,7 @@ regexp_split_to_table(PG_FUNCTION_ARGS)
/* be sure to copy the input string into the multi-call ctx */
splitctx = setup_regexp_matches(PG_GETARG_TEXT_P_COPY(0), pattern,
- &re_flags,
+ &re_flags, 0,
PG_GET_COLLATION(),
false, true, true);
@@ -1422,7 +1740,7 @@ regexp_split_to_array(PG_FUNCTION_ARGS)
splitctx = setup_regexp_matches(PG_GETARG_TEXT_PP(0),
PG_GETARG_TEXT_PP(1),
- &re_flags,
+ &re_flags, 0,
PG_GET_COLLATION(),
false, true, true);
@@ -1489,6 +1807,125 @@ build_regexp_split_result(regexp_matches_ctx *splitctx)
}
}
+/*
+ * regexp_substr()
+ * Return the substring that matches a regular expression pattern
+ */
+Datum
+regexp_substr(PG_FUNCTION_ARGS)
+{
+ text *str = PG_GETARG_TEXT_PP(0);
+ text *pattern = PG_GETARG_TEXT_PP(1);
+ int start = 1;
+ int n = 1;
+ text *flags = PG_GETARG_TEXT_PP_IF_EXISTS(4);
+ int subexpr = 0;
+ int so,
+ eo,
+ pos;
+ pg_re_flags re_flags;
+ regexp_matches_ctx *matchctx;
+
+ /* Collect optional parameters */
+ if (PG_NARGS() > 2)
+ {
+ start = PG_GETARG_INT32(2);
+ if (start <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d",
+ "start", start)));
+ }
+ if (PG_NARGS() > 3)
+ {
+ n = PG_GETARG_INT32(3);
+ if (n <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d",
+ "n", n)));
+ }
+ if (PG_NARGS() > 5)
+ {
+ subexpr = PG_GETARG_INT32(5);
+ if (subexpr < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d",
+ "subexpr", subexpr)));
+ }
+
+ /* Determine options */
+ parse_re_flags(&re_flags, flags);
+ /* User mustn't specify 'g' */
+ if (re_flags.glob)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /* translator: %s is a SQL function name */
+ errmsg("%s does not support the \"global\" option",
+ "regexp_substr()")));
+ /* But we find all the matches anyway */
+ re_flags.glob = true;
+
+ /* Do the matching */
+ matchctx = setup_regexp_matches(str, pattern, &re_flags, start - 1,
+ PG_GET_COLLATION(),
+ (subexpr > 0), /* need submatches? */
+ false, false);
+
+ /* When n exceeds matches return NULL (includes case of no matches) */
+ if (n > matchctx->nmatches)
+ PG_RETURN_NULL();
+
+ /* When subexpr exceeds number of subexpressions return NULL */
+ if (subexpr > matchctx->npatterns)
+ PG_RETURN_NULL();
+
+ /* Select the appropriate match position to return */
+ pos = (n - 1) * matchctx->npatterns;
+ if (subexpr > 0)
+ pos += subexpr - 1;
+ pos *= 2;
+ so = matchctx->match_locs[pos];
+ eo = matchctx->match_locs[pos + 1];
+
+ if (so < 0 || eo < 0)
+ PG_RETURN_NULL(); /* unidentifiable location */
+
+ PG_RETURN_DATUM(DirectFunctionCall3(text_substr,
+ PointerGetDatum(matchctx->orig_str),
+ Int32GetDatum(so + 1),
+ Int32GetDatum(eo - so)));
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_substr_no_start(PG_FUNCTION_ARGS)
+{
+ return regexp_substr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_substr_no_n(PG_FUNCTION_ARGS)
+{
+ return regexp_substr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_substr_no_flags(PG_FUNCTION_ARGS)
+{
+ return regexp_substr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_substr_no_subexpr(PG_FUNCTION_ARGS)
+{
+ return regexp_substr(fcinfo);
+}
+
/*
* regexp_fixed_prefix - extract fixed prefix, if any, for a regexp
*
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index d2a11b1b5d..a0bde4e352 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -4496,23 +4496,28 @@ appendStringInfoRegexpSubstr(StringInfo str, text *replace_text,
/*
* replace_text_regexp
*
- * replace text that matches to regexp in src_text to replace_text.
+ * replace text that matches to regexp in src_text with replace_text.
+ *
+ * search_start: the character (not byte) offset in src_text at which to
+ * begin searching.
+ * n: if 0, replace all matches; if > 0, replace only the N'th match.
*
* Note: to avoid having to include regex.h in builtins.h, we declare
* the regexp argument as void *, but really it's regex_t *.
*/
text *
replace_text_regexp(text *src_text, void *regexp,
- text *replace_text, bool glob)
+ text *replace_text,
+ int search_start, int n)
{
text *ret_text;
regex_t *re = (regex_t *) regexp;
int src_text_len = VARSIZE_ANY_EXHDR(src_text);
+ int nmatches = 0;
StringInfoData buf;
regmatch_t pmatch[REGEXP_REPLACE_BACKREF_CNT];
pg_wchar *data;
size_t data_len;
- int search_start;
int data_pos;
char *start_ptr;
bool have_escape;
@@ -4530,7 +4535,6 @@ replace_text_regexp(text *src_text, void *regexp,
start_ptr = (char *) VARDATA_ANY(src_text);
data_pos = 0;
- search_start = 0;
while (search_start <= data_len)
{
int regexec_result;
@@ -4560,6 +4564,23 @@ replace_text_regexp(text *src_text, void *regexp,
errmsg("regular expression failed: %s", errMsg)));
}
+ /*
+ * Count matches, and decide whether to replace this match.
+ */
+ nmatches++;
+ if (n > 0 && nmatches != n)
+ {
+ /*
+ * No, so advance search_start, but not start_ptr/data_pos. (Thus,
+ * we treat the matched text as if it weren't matched, and copy it
+ * to the output later.)
+ */
+ search_start = pmatch[0].rm_eo;
+ if (pmatch[0].rm_so == pmatch[0].rm_eo)
+ search_start++;
+ continue;
+ }
+
/*
* Copy the text to the left of the match position. Note we are given
* character not byte indexes.
@@ -4596,9 +4617,9 @@ replace_text_regexp(text *src_text, void *regexp,
data_pos = pmatch[0].rm_eo;
/*
- * When global option is off, replace the first instance only.
+ * If we only want to replace one occurrence, we're done.
*/
- if (!glob)
+ if (n > 0)
break;
/*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 8cd0252082..b603700ed9 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -3565,6 +3565,18 @@
{ oid => '2285', descr => 'replace text using regexp',
proname => 'regexp_replace', prorettype => 'text',
proargtypes => 'text text text text', prosrc => 'textregexreplace' },
+{ oid => '9611', descr => 'replace text using regexp',
+ proname => 'regexp_replace', prorettype => 'text',
+ proargtypes => 'text text text int4 int4 text',
+ prosrc => 'textregexreplace_extended' },
+{ oid => '9612', descr => 'replace text using regexp',
+ proname => 'regexp_replace', prorettype => 'text',
+ proargtypes => 'text text text int4 int4',
+ prosrc => 'textregexreplace_extended_no_flags' },
+{ oid => '9613', descr => 'replace text using regexp',
+ proname => 'regexp_replace', prorettype => 'text',
+ proargtypes => 'text text text int4',
+ prosrc => 'textregexreplace_extended_no_n' },
{ oid => '3396', descr => 'find first match for regexp',
proname => 'regexp_match', prorettype => '_text', proargtypes => 'text text',
prosrc => 'regexp_match_no_flags' },
@@ -3579,6 +3591,58 @@
proname => 'regexp_matches', prorows => '10', proretset => 't',
prorettype => '_text', proargtypes => 'text text text',
prosrc => 'regexp_matches' },
+{ oid => '9614', descr => 'count regexp matches',
+ proname => 'regexp_count', prorettype => 'int4', proargtypes => 'text text',
+ prosrc => 'regexp_count_no_start' },
+{ oid => '9615', descr => 'count regexp matches',
+ proname => 'regexp_count', prorettype => 'int4',
+ proargtypes => 'text text int4', prosrc => 'regexp_count_no_flags' },
+{ oid => '9616', descr => 'count regexp matches',
+ proname => 'regexp_count', prorettype => 'int4',
+ proargtypes => 'text text int4 text', prosrc => 'regexp_count' },
+{ oid => '9617', descr => 'position of regexp match',
+ proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text',
+ prosrc => 'regexp_instr_no_start' },
+{ oid => '9618', descr => 'position of regexp match',
+ proname => 'regexp_instr', prorettype => 'int4',
+ proargtypes => 'text text int4', prosrc => 'regexp_instr_no_n' },
+{ oid => '9619', descr => 'position of regexp match',
+ proname => 'regexp_instr', prorettype => 'int4',
+ proargtypes => 'text text int4 int4', prosrc => 'regexp_instr_no_endoption' },
+{ oid => '9620', descr => 'position of regexp match',
+ proname => 'regexp_instr', prorettype => 'int4',
+ proargtypes => 'text text int4 int4 int4',
+ prosrc => 'regexp_instr_no_flags' },
+{ oid => '9621', descr => 'position of regexp match',
+ proname => 'regexp_instr', prorettype => 'int4',
+ proargtypes => 'text text int4 int4 int4 text',
+ prosrc => 'regexp_instr_no_subexpr' },
+{ oid => '9622', descr => 'position of regexp match',
+ proname => 'regexp_instr', prorettype => 'int4',
+ proargtypes => 'text text int4 int4 int4 text int4',
+ prosrc => 'regexp_instr' },
+{ oid => '9623', descr => 'test for regexp match',
+ proname => 'regexp_like', prorettype => 'bool', proargtypes => 'text text',
+ prosrc => 'regexp_like_no_flags' },
+{ oid => '9624', descr => 'test for regexp match',
+ proname => 'regexp_like', prorettype => 'bool',
+ proargtypes => 'text text text', prosrc => 'regexp_like' },
+{ oid => '9625', descr => 'extract substring that matches regexp',
+ proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text',
+ prosrc => 'regexp_substr_no_start' },
+{ oid => '9626', descr => 'extract substring that matches regexp',
+ proname => 'regexp_substr', prorettype => 'text',
+ proargtypes => 'text text int4', prosrc => 'regexp_substr_no_n' },
+{ oid => '9627', descr => 'extract substring that matches regexp',
+ proname => 'regexp_substr', prorettype => 'text',
+ proargtypes => 'text text int4 int4', prosrc => 'regexp_substr_no_flags' },
+{ oid => '9628', descr => 'extract substring that matches regexp',
+ proname => 'regexp_substr', prorettype => 'text',
+ proargtypes => 'text text int4 int4 text',
+ prosrc => 'regexp_substr_no_subexpr' },
+{ oid => '9629', descr => 'extract substring that matches regexp',
+ proname => 'regexp_substr', prorettype => 'text',
+ proargtypes => 'text text int4 int4 text int4', prosrc => 'regexp_substr' },
{ oid => '2088', descr => 'split string by field_sep and return field_num',
proname => 'split_part', prorettype => 'text',
proargtypes => 'text text int4', prosrc => 'split_part' },
diff --git a/src/include/utils/varlena.h b/src/include/utils/varlena.h
index 5c39723332..6645e2af13 100644
--- a/src/include/utils/varlena.h
+++ b/src/include/utils/varlena.h
@@ -34,6 +34,7 @@ extern bool SplitDirectoriesString(char *rawstring, char separator,
extern bool SplitGUCList(char *rawstring, char separator,
List **namelist);
extern text *replace_text_regexp(text *src_text, void *regexp,
- text *replace_text, bool glob);
+ text *replace_text,
+ int search_start, int n);
#endif
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index 91aa819804..a9efd74c7b 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -515,6 +515,13 @@ SELECT SUBSTRING('abcdefg' FROM 'b(.*)f') AS "cde";
cde
(1 row)
+-- Check case where we have a match, but not a subexpression match
+SELECT SUBSTRING('foo' FROM 'foo(bar)?') IS NULL AS t;
+ t
+---
+ t
+(1 row)
+
-- Check behavior of SIMILAR TO, which uses largely the same regexp variant
SELECT 'abcdefg' SIMILAR TO '_bcd%' AS true;
true
@@ -592,6 +599,370 @@ SELECT regexp_replace('AAA aaa', 'A+', 'Z', 'gi');
-- invalid regexp option
SELECT regexp_replace('AAA aaa', 'A+', 'Z', 'z');
ERROR: invalid regular expression option: "z"
+-- extended regexp_replace tests
+SELECT regexp_replace('A PostgreSQL function', 'A|e|i|o|u', 'X', 1);
+ regexp_replace
+-----------------------
+ X PostgreSQL function
+(1 row)
+
+SELECT regexp_replace('A PostgreSQL function', 'A|e|i|o|u', 'X', 1, 2);
+ regexp_replace
+-----------------------
+ A PXstgreSQL function
+(1 row)
+
+SELECT regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 0, 'i');
+ regexp_replace
+-----------------------
+ X PXstgrXSQL fXnctXXn
+(1 row)
+
+SELECT regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 1, 'i');
+ regexp_replace
+-----------------------
+ X PostgreSQL function
+(1 row)
+
+SELECT regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 2, 'i');
+ regexp_replace
+-----------------------
+ A PXstgreSQL function
+(1 row)
+
+SELECT regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 3, 'i');
+ regexp_replace
+-----------------------
+ A PostgrXSQL function
+(1 row)
+
+SELECT regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 9, 'i');
+ regexp_replace
+-----------------------
+ A PostgreSQL function
+(1 row)
+
+SELECT regexp_replace('A PostgreSQL function', 'A|e|i|o|u', 'X', 7, 0, 'i');
+ regexp_replace
+-----------------------
+ A PostgrXSQL fXnctXXn
+(1 row)
+
+-- 'g' flag should be ignored when N is specified
+SELECT regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 1, 'g');
+ regexp_replace
+-----------------------
+ A PXstgreSQL function
+(1 row)
+
+-- errors
+SELECT regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', -1, 0, 'i');
+ERROR: invalid value for parameter "start": -1
+SELECT regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, -1, 'i');
+ERROR: invalid value for parameter "n": -1
+-- erroneous invocation of non-extended form
+SELECT regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', '1');
+ERROR: invalid regular expression option: "1"
+HINT: If you meant to use regexp_replace() with a start parameter, cast the fourth argument to integer explicitly.
+-- regexp_count tests
+SELECT regexp_count('123123123123123', '(12)3');
+ regexp_count
+--------------
+ 5
+(1 row)
+
+SELECT regexp_count('123123123123', '123', 1);
+ regexp_count
+--------------
+ 4
+(1 row)
+
+SELECT regexp_count('123123123123', '123', 3);
+ regexp_count
+--------------
+ 3
+(1 row)
+
+SELECT regexp_count('123123123123', '123', 33);
+ regexp_count
+--------------
+ 0
+(1 row)
+
+SELECT regexp_count('ABCABCABCABC', 'Abc', 1, '');
+ regexp_count
+--------------
+ 0
+(1 row)
+
+SELECT regexp_count('ABCABCABCABC', 'Abc', 1, 'i');
+ regexp_count
+--------------
+ 4
+(1 row)
+
+-- errors
+SELECT regexp_count('123123123123', '123', 0);
+ERROR: invalid value for parameter "start": 0
+SELECT regexp_count('123123123123', '123', -3);
+ERROR: invalid value for parameter "start": -3
+-- regexp_like tests
+SELECT regexp_like('Steven', '^Ste(v|ph)en$');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT regexp_like('a'||CHR(10)||'d', 'a.d', 'n');
+ regexp_like
+-------------
+ f
+(1 row)
+
+SELECT regexp_like('a'||CHR(10)||'d', 'a.d', 's');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT regexp_like('abc', ' a . c ', 'x');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT regexp_like('abc', 'a.c', 'g'); -- error
+ERROR: regexp_like() does not support the "global" option
+-- regexp_instr tests
+SELECT regexp_instr('abcdefghi', 'd.f');
+ regexp_instr
+--------------
+ 4
+(1 row)
+
+SELECT regexp_instr('abcdefghi', 'd.q');
+ regexp_instr
+--------------
+ 0
+(1 row)
+
+SELECT regexp_instr('abcabcabc', 'a.c');
+ regexp_instr
+--------------
+ 1
+(1 row)
+
+SELECT regexp_instr('abcabcabc', 'a.c', 2);
+ regexp_instr
+--------------
+ 4
+(1 row)
+
+SELECT regexp_instr('abcabcabc', 'a.c', 1, 3);
+ regexp_instr
+--------------
+ 7
+(1 row)
+
+SELECT regexp_instr('abcabcabc', 'a.c', 1, 4);
+ regexp_instr
+--------------
+ 0
+(1 row)
+
+SELECT regexp_instr('abcabcabc', 'A.C', 1, 2, 0, 'i');
+ regexp_instr
+--------------
+ 4
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 0);
+ regexp_instr
+--------------
+ 1
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 1);
+ regexp_instr
+--------------
+ 1
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 2);
+ regexp_instr
+--------------
+ 4
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 3);
+ regexp_instr
+--------------
+ 5
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 4);
+ regexp_instr
+--------------
+ 7
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 5);
+ regexp_instr
+--------------
+ 0
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 0);
+ regexp_instr
+--------------
+ 9
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 1);
+ regexp_instr
+--------------
+ 4
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 2);
+ regexp_instr
+--------------
+ 9
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 3);
+ regexp_instr
+--------------
+ 7
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 4);
+ regexp_instr
+--------------
+ 9
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 5);
+ regexp_instr
+--------------
+ 0
+(1 row)
+
+-- Check case where we have a match, but not a subexpression match
+SELECT regexp_instr('foo', 'foo(bar)?', 1, 1, 0, '', 1);
+ regexp_instr
+--------------
+ 0
+(1 row)
+
+-- errors
+SELECT regexp_instr('abcabcabc', 'a.c', 0, 1);
+ERROR: invalid value for parameter "start": 0
+SELECT regexp_instr('abcabcabc', 'a.c', 1, 0);
+ERROR: invalid value for parameter "n": 0
+SELECT regexp_instr('abcabcabc', 'a.c', 1, 1, -1);
+ERROR: invalid value for parameter "endoption": -1
+SELECT regexp_instr('abcabcabc', 'a.c', 1, 1, 2);
+ERROR: invalid value for parameter "endoption": 2
+SELECT regexp_instr('abcabcabc', 'a.c', 1, 1, 0, 'g');
+ERROR: regexp_instr() does not support the "global" option
+SELECT regexp_instr('abcabcabc', 'a.c', 1, 1, 0, '', -1);
+ERROR: invalid value for parameter "subexpr": -1
+-- regexp_substr tests
+SELECT regexp_substr('abcdefghi', 'd.f');
+ regexp_substr
+---------------
+ def
+(1 row)
+
+SELECT regexp_substr('abcdefghi', 'd.q') IS NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT regexp_substr('abcabcabc', 'a.c');
+ regexp_substr
+---------------
+ abc
+(1 row)
+
+SELECT regexp_substr('abcabcabc', 'a.c', 2);
+ regexp_substr
+---------------
+ abc
+(1 row)
+
+SELECT regexp_substr('abcabcabc', 'a.c', 1, 3);
+ regexp_substr
+---------------
+ abc
+(1 row)
+
+SELECT regexp_substr('abcabcabc', 'a.c', 1, 4) IS NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT regexp_substr('abcabcabc', 'A.C', 1, 2, 'i');
+ regexp_substr
+---------------
+ abc
+(1 row)
+
+SELECT regexp_substr('1234567890', '(123)(4(56)(78))', 1, 1, 'i', 0);
+ regexp_substr
+---------------
+ 12345678
+(1 row)
+
+SELECT regexp_substr('1234567890', '(123)(4(56)(78))', 1, 1, 'i', 1);
+ regexp_substr
+---------------
+ 123
+(1 row)
+
+SELECT regexp_substr('1234567890', '(123)(4(56)(78))', 1, 1, 'i', 2);
+ regexp_substr
+---------------
+ 45678
+(1 row)
+
+SELECT regexp_substr('1234567890', '(123)(4(56)(78))', 1, 1, 'i', 3);
+ regexp_substr
+---------------
+ 56
+(1 row)
+
+SELECT regexp_substr('1234567890', '(123)(4(56)(78))', 1, 1, 'i', 4);
+ regexp_substr
+---------------
+ 78
+(1 row)
+
+SELECT regexp_substr('1234567890', '(123)(4(56)(78))', 1, 1, 'i', 5) IS NULL AS t;
+ t
+---
+ t
+(1 row)
+
+-- Check case where we have a match, but not a subexpression match
+SELECT regexp_substr('foo', 'foo(bar)?', 1, 1, '', 1) IS NULL AS t;
+ t
+---
+ t
+(1 row)
+
+-- errors
+SELECT regexp_substr('abcabcabc', 'a.c', 0, 1);
+ERROR: invalid value for parameter "start": 0
+SELECT regexp_substr('abcabcabc', 'a.c', 1, 0);
+ERROR: invalid value for parameter "n": 0
+SELECT regexp_substr('abcabcabc', 'a.c', 1, 1, 'g');
+ERROR: regexp_substr() does not support the "global" option
+SELECT regexp_substr('abcabcabc', 'a.c', 1, 1, '', -1);
+ERROR: invalid value for parameter "subexpr": -1
-- set so we can tell NULL from empty string
\pset null '\\N'
-- return all matches from regexp
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index 2c502534c2..6a029cc369 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -171,6 +171,8 @@ SELECT SUBSTRING('abcdefg' FROM 'c.e') AS "cde";
-- With a parenthesized subexpression, return only what matches the subexpr
SELECT SUBSTRING('abcdefg' FROM 'b(.*)f') AS "cde";
+-- Check case where we have a match, but not a subexpression match
+SELECT SUBSTRING('foo' FROM 'foo(bar)?') IS NULL AS t;
-- Check behavior of SIMILAR TO, which uses largely the same regexp variant
SELECT 'abcdefg' SIMILAR TO '_bcd%' AS true;
@@ -193,6 +195,93 @@ SELECT regexp_replace('AAA aaa', 'A+', 'Z', 'gi');
-- invalid regexp option
SELECT regexp_replace('AAA aaa', 'A+', 'Z', 'z');
+-- extended regexp_replace tests
+SELECT regexp_replace('A PostgreSQL function', 'A|e|i|o|u', 'X', 1);
+SELECT regexp_replace('A PostgreSQL function', 'A|e|i|o|u', 'X', 1, 2);
+SELECT regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 0, 'i');
+SELECT regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 1, 'i');
+SELECT regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 2, 'i');
+SELECT regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 3, 'i');
+SELECT regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 9, 'i');
+SELECT regexp_replace('A PostgreSQL function', 'A|e|i|o|u', 'X', 7, 0, 'i');
+-- 'g' flag should be ignored when N is specified
+SELECT regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 1, 'g');
+-- errors
+SELECT regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', -1, 0, 'i');
+SELECT regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, -1, 'i');
+-- erroneous invocation of non-extended form
+SELECT regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', '1');
+
+-- regexp_count tests
+SELECT regexp_count('123123123123123', '(12)3');
+SELECT regexp_count('123123123123', '123', 1);
+SELECT regexp_count('123123123123', '123', 3);
+SELECT regexp_count('123123123123', '123', 33);
+SELECT regexp_count('ABCABCABCABC', 'Abc', 1, '');
+SELECT regexp_count('ABCABCABCABC', 'Abc', 1, 'i');
+-- errors
+SELECT regexp_count('123123123123', '123', 0);
+SELECT regexp_count('123123123123', '123', -3);
+
+-- regexp_like tests
+SELECT regexp_like('Steven', '^Ste(v|ph)en$');
+SELECT regexp_like('a'||CHR(10)||'d', 'a.d', 'n');
+SELECT regexp_like('a'||CHR(10)||'d', 'a.d', 's');
+SELECT regexp_like('abc', ' a . c ', 'x');
+SELECT regexp_like('abc', 'a.c', 'g'); -- error
+
+-- regexp_instr tests
+SELECT regexp_instr('abcdefghi', 'd.f');
+SELECT regexp_instr('abcdefghi', 'd.q');
+SELECT regexp_instr('abcabcabc', 'a.c');
+SELECT regexp_instr('abcabcabc', 'a.c', 2);
+SELECT regexp_instr('abcabcabc', 'a.c', 1, 3);
+SELECT regexp_instr('abcabcabc', 'a.c', 1, 4);
+SELECT regexp_instr('abcabcabc', 'A.C', 1, 2, 0, 'i');
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 0);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 1);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 2);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 3);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 4);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 5);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 0);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 1);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 2);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 3);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 4);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 5);
+-- Check case where we have a match, but not a subexpression match
+SELECT regexp_instr('foo', 'foo(bar)?', 1, 1, 0, '', 1);
+-- errors
+SELECT regexp_instr('abcabcabc', 'a.c', 0, 1);
+SELECT regexp_instr('abcabcabc', 'a.c', 1, 0);
+SELECT regexp_instr('abcabcabc', 'a.c', 1, 1, -1);
+SELECT regexp_instr('abcabcabc', 'a.c', 1, 1, 2);
+SELECT regexp_instr('abcabcabc', 'a.c', 1, 1, 0, 'g');
+SELECT regexp_instr('abcabcabc', 'a.c', 1, 1, 0, '', -1);
+
+-- regexp_substr tests
+SELECT regexp_substr('abcdefghi', 'd.f');
+SELECT regexp_substr('abcdefghi', 'd.q') IS NULL AS t;
+SELECT regexp_substr('abcabcabc', 'a.c');
+SELECT regexp_substr('abcabcabc', 'a.c', 2);
+SELECT regexp_substr('abcabcabc', 'a.c', 1, 3);
+SELECT regexp_substr('abcabcabc', 'a.c', 1, 4) IS NULL AS t;
+SELECT regexp_substr('abcabcabc', 'A.C', 1, 2, 'i');
+SELECT regexp_substr('1234567890', '(123)(4(56)(78))', 1, 1, 'i', 0);
+SELECT regexp_substr('1234567890', '(123)(4(56)(78))', 1, 1, 'i', 1);
+SELECT regexp_substr('1234567890', '(123)(4(56)(78))', 1, 1, 'i', 2);
+SELECT regexp_substr('1234567890', '(123)(4(56)(78))', 1, 1, 'i', 3);
+SELECT regexp_substr('1234567890', '(123)(4(56)(78))', 1, 1, 'i', 4);
+SELECT regexp_substr('1234567890', '(123)(4(56)(78))', 1, 1, 'i', 5) IS NULL AS t;
+-- Check case where we have a match, but not a subexpression match
+SELECT regexp_substr('foo', 'foo(bar)?', 1, 1, '', 1) IS NULL AS t;
+-- errors
+SELECT regexp_substr('abcabcabc', 'a.c', 0, 1);
+SELECT regexp_substr('abcabcabc', 'a.c', 1, 0);
+SELECT regexp_substr('abcabcabc', 'a.c', 1, 1, 'g');
+SELECT regexp_substr('abcabcabc', 'a.c', 1, 1, '', -1);
+
-- set so we can tell NULL from empty string
\pset null '\\N'
I wrote:
... aside from the question of whether
a too-large subexpression number should be an error or not.
Oh ... poking around some more, I noticed a very nearby precedent.
regexp_replace's replacement string can include \1 to \9 to insert
the substring matching the N'th parenthesized subexpression. But
if there is no such subexpression, you don't get an error, just
an empty insertion. So that seems like an argument for not
throwing an error for an out-of-range subexpr parameter.
regards, tom lane
Le 02/08/2021 à 01:21, Tom Lane a écrit :
Gilles Darold <gilles@darold.net> writes:
[ v5-0001-regexp-foo-functions.patch ]
I've gone through this whole patch now, and found quite a lot that I did
not like. In no particular order:* Wrapping parentheses around the user's regexp doesn't work. It can
turn an invalid regexp into a valid one: for example 'a)(b' should draw
a syntax error. With this patch, no error would be thrown, but the
"outer" parens wouldn't do what you expected. Worse, it can turn a
valid regexp into an invalid one: the metasyntax options described in
9.7.3.4 only work at the start of the regexp. So we have to handle
whole-regexp cases honestly rather than trying to turn them into an
instance of the parenthesized-subexpression case.* You did a lot of things quite inefficiently, apparently to avoid
touching any existing code. I think it's better to extend
setup_regexp_matches() and replace_text_regexp() a little bit so that
they can support the behaviors these new functions need. In both of
them, it's absolutely trivial to allow a search start position to be
passed in; and it doesn't take much to teach replace_text_regexp()
to replace only the N'th match.* Speaking of N'th, there is not much of anything that I like
about Oracle's terminology for the function arguments, and I don't
think we ought to adopt it. If we're documenting the functions as
processing the "N'th match", it seems to me to be natural to call
the parameter "N" not "occurrence". Speaking of the "occurrence'th
occurrence" is just silly, not to mention long and easy to misspell.
Likewise, "position" is a horribly vague term for the search start
position; it could be interpreted to mean several other things.
"start" seems much better. "return_opt" is likewise awfully unclear.
I went with "endoption" below, though I could be talked into something
else. The only one of Oracle's choices that I like is "subexpr" for
subexpression number ... but you went with DB2's rather vague "group"
instead. I don't want to use their "capture group" terminology,
because that appears nowhere else in our documentation. Our existing
terminology is "parenthesized subexpression", which seems fine to me
(and also agrees with Oracle's docs).* I spent a lot of time on the docs too. A lot of the syntax specs
were wrong (where you put the brackets matters), many of the examples
seemed confusingly overcomplicated, and the text explanations needed
copy-editing.* Also, the regression tests seemed misguided. This patch is not
responsible for testing the regexp engine as such; we have tests
elsewhere that do that. So I don't think we need complex regexps
here. We just need to verify that the parameters of these functions
act properly, and check their error cases. That can be done much
more quickly and straightforwardly than what you had.So here's a revised version that I like better. I think this
is pretty nearly committable, aside from the question of whether
a too-large subexpression number should be an error or not.
Thanks a lot for the patch improvement and the guidance. I have read the
patch and I agree with your choices I think I was too much trying to
mimic the oraclisms. I don't think we should take care of the too-large
subexpression number, the regexp writer should always test its regular
expression and also this will not prevent him to chose the wrong capture
group number but just a non existing one.
Best regards,
--
Gilles Darold
Le 02/08/2021 � 23:22, Gilles Darold a �crit�:
Le 02/08/2021 � 01:21, Tom Lane a �crit�:
Gilles Darold <gilles@darold.net> writes:
[ v5-0001-regexp-foo-functions.patch ]
I've gone through this whole patch now, and found quite a lot that I did
not like. In no particular order:* Wrapping parentheses around the user's regexp doesn't work. It can
turn an invalid regexp into a valid one: for example 'a)(b' should draw
a syntax error. With this patch, no error would be thrown, but the
"outer" parens wouldn't do what you expected. Worse, it can turn a
valid regexp into an invalid one: the metasyntax options described in
9.7.3.4 only work at the start of the regexp. So we have to handle
whole-regexp cases honestly rather than trying to turn them into an
instance of the parenthesized-subexpression case.* You did a lot of things quite inefficiently, apparently to avoid
touching any existing code. I think it's better to extend
setup_regexp_matches() and replace_text_regexp() a little bit so that
they can support the behaviors these new functions need. In both of
them, it's absolutely trivial to allow a search start position to be
passed in; and it doesn't take much to teach replace_text_regexp()
to replace only the N'th match.* Speaking of N'th, there is not much of anything that I like
about Oracle's terminology for the function arguments, and I don't
think we ought to adopt it. If we're documenting the functions as
processing the "N'th match", it seems to me to be natural to call
the parameter "N" not "occurrence". Speaking of the "occurrence'th
occurrence" is just silly, not to mention long and easy to misspell.
Likewise, "position" is a horribly vague term for the search start
position; it could be interpreted to mean several other things.
"start" seems much better. "return_opt" is likewise awfully unclear.
I went with "endoption" below, though I could be talked into something
else. The only one of Oracle's choices that I like is "subexpr" for
subexpression number ... but you went with DB2's rather vague "group"
instead. I don't want to use their "capture group" terminology,
because that appears nowhere else in our documentation. Our existing
terminology is "parenthesized subexpression", which seems fine to me
(and also agrees with Oracle's docs).* I spent a lot of time on the docs too. A lot of the syntax specs
were wrong (where you put the brackets matters), many of the examples
seemed confusingly overcomplicated, and the text explanations needed
copy-editing.* Also, the regression tests seemed misguided. This patch is not
responsible for testing the regexp engine as such; we have tests
elsewhere that do that. So I don't think we need complex regexps
here. We just need to verify that the parameters of these functions
act properly, and check their error cases. That can be done much
more quickly and straightforwardly than what you had.So here's a revised version that I like better. I think this
is pretty nearly committable, aside from the question of whether
a too-large subexpression number should be an error or not.Thanks a lot for the patch improvement and the guidance. I have read the
patch and I agree with your choices I think I was too much trying to
mimic the oraclisms. I don't think we should take care of the too-large
subexpression number, the regexp writer should always test its regular
expression and also this will not prevent him to chose the wrong capture
group number but just a non existing one.
Actually I just found that the regexp_like() function doesn't support
the start parameter which is something we should support. I saw that
Oracle do not support it but DB2 does and I think we should also support
it. I will post a new version of the patch once it is done.
Best regards,
--
Gilles Darold
Le 03/08/2021 à 11:45, Gilles Darold a écrit :
Actually I just found that the regexp_like() function doesn't support
the start parameter which is something we should support. I saw that
Oracle do not support it but DB2 does and I think we should also
support it. I will post a new version of the patch once it is done.
Here is a new version of the patch that adds the start parameter to
regexp_like() function but while I'm adding support to this parameter it
become less obvious for me that we should implement it. However feel
free to not use this version if you think that adding the start
parameter has no real interest.
Best regards,
--
Gilles Darold
Attachments:
v7-0001-regexp-foo-functions.patchtext/x-patch; charset=UTF-8; name=v7-0001-regexp-foo-functions.patchDownload
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index a5b6adc4bb..2bc9060e47 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3108,6 +3108,80 @@ repeat('Pg', 4) <returnvalue>PgPgPgPg</returnvalue>
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>regexp_count</primary>
+ </indexterm>
+ <function>regexp_count</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type>
+ [, <parameter>start</parameter> <type>integer</type>
+ [, <parameter>flags</parameter> <type>text</type> ] ] )
+ <returnvalue>integer</returnvalue>
+ </para>
+ <para>
+ Returns the number of times the POSIX regular
+ expression <parameter>pattern</parameter> matches in
+ the <parameter>string</parameter>; see
+ <xref linkend="functions-posix-regexp"/>.
+ </para>
+ <para>
+ <literal>regexp_count('123456789012', '\d\d\d', 2)</literal>
+ <returnvalue>3</returnvalue>
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>regexp_instr</primary>
+ </indexterm>
+ <function>regexp_instr</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type>
+ [, <parameter>start</parameter> <type>integer</type>
+ [, <parameter>N</parameter> <type>integer</type>
+ [, <parameter>endoption</parameter> <type>integer</type>
+ [, <parameter>flags</parameter> <type>text</type>
+ [, <parameter>subexpr</parameter> <type>integer</type> ] ] ] ] ] )
+ <returnvalue>integer</returnvalue>
+ </para>
+ <para>
+ Returns the position within <parameter>string</parameter> where
+ the <parameter>N</parameter>'th match of the POSIX regular
+ expression <parameter>pattern</parameter> occurs, or zero if there is
+ no such match; see <xref linkend="functions-posix-regexp"/>.
+ </para>
+ <para>
+ <literal>regexp_instr('ABCDEF', 'c(.)(..)', 1, 1, 0, 'i')</literal>
+ <returnvalue>3</returnvalue>
+ </para>
+ <para>
+ <literal>regexp_instr('ABCDEF', 'c(.)(..)', 1, 1, 0, 'i', 2)</literal>
+ <returnvalue>5</returnvalue>
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>regexp_like</primary>
+ </indexterm>
+ <function>regexp_like</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type>
+ [, <parameter>start</parameter> <type>integer</type>
+ [, <parameter>flags</parameter> <type>text</type> ] ] )
+ <returnvalue>boolean</returnvalue>
+ </para>
+ <para>
+ Checks whether a match of the POSIX regular
+ expression <parameter>pattern</parameter> occurs
+ within <parameter>string</parameter> starting at
+ the <parameter>start</parameter>'th character; see
+ <xref linkend="functions-posix-regexp"/>.
+ </para>
+ <para>
+ <literal>regexp_like('Hello World', 'world$', 1, 'i')</literal>
+ <returnvalue>t</returnvalue>
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
@@ -3117,8 +3191,9 @@ repeat('Pg', 4) <returnvalue>PgPgPgPg</returnvalue>
<returnvalue>text[]</returnvalue>
</para>
<para>
- Returns captured substrings resulting from the first match of a POSIX
- regular expression to the <parameter>string</parameter>; see
+ Returns captured substrings resulting from the first match of the
+ POSIX regular expression <parameter>pattern</parameter> to
+ the <parameter>string</parameter>; see
<xref linkend="functions-posix-regexp"/>.
</para>
<para>
@@ -3136,10 +3211,11 @@ repeat('Pg', 4) <returnvalue>PgPgPgPg</returnvalue>
<returnvalue>setof text[]</returnvalue>
</para>
<para>
- Returns captured substrings resulting from the first match of a
- POSIX regular expression to the <parameter>string</parameter>,
- or multiple matches if the <literal>g</literal> flag is used;
- see <xref linkend="functions-posix-regexp"/>.
+ Returns captured substrings resulting from the first match of the
+ POSIX regular expression <parameter>pattern</parameter> to
+ the <parameter>string</parameter>, or all matches if
+ the <literal>g</literal> flag is used; see
+ <xref linkend="functions-posix-regexp"/>.
</para>
<para>
<literal>regexp_matches('foobarbequebaz', 'ba.', 'g')</literal>
@@ -3156,14 +3232,16 @@ repeat('Pg', 4) <returnvalue>PgPgPgPg</returnvalue>
<indexterm>
<primary>regexp_replace</primary>
</indexterm>
- <function>regexp_replace</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type>, <parameter>replacement</parameter> <type>text</type> [, <parameter>flags</parameter> <type>text</type> ] )
+ <function>regexp_replace</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type>, <parameter>replacement</parameter> <type>text</type>
+ [, <parameter>start</parameter> <type>integer</type> ]
+ [, <parameter>flags</parameter> <type>text</type> ] )
<returnvalue>text</returnvalue>
</para>
<para>
- Replaces substrings resulting from the first match of a
- POSIX regular expression, or multiple substring matches
- if the <literal>g</literal> flag is used; see <xref
- linkend="functions-posix-regexp"/>.
+ Replaces the substring that is the first match to the POSIX
+ regular expression <parameter>pattern</parameter>, or all matches
+ if the <literal>g</literal> flag is used; see
+ <xref linkend="functions-posix-regexp"/>.
</para>
<para>
<literal>regexp_replace('Thomas', '.[mN]a.', 'M')</literal>
@@ -3171,6 +3249,26 @@ repeat('Pg', 4) <returnvalue>PgPgPgPg</returnvalue>
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <function>regexp_replace</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type>, <parameter>replacement</parameter> <type>text</type>,
+ <parameter>start</parameter> <type>integer</type>,
+ <parameter>N</parameter> <type>integer</type>
+ [, <parameter>flags</parameter> <type>text</type> ] )
+ <returnvalue>text</returnvalue>
+ </para>
+ <para>
+ Replaces the substring that is the <parameter>N</parameter>'th
+ match to the POSIX regular expression <parameter>pattern</parameter>,
+ or all matches if <parameter>N</parameter> is zero; see
+ <xref linkend="functions-posix-regexp"/>.
+ </para>
+ <para>
+ <literal>regexp_replace('Thomas', '.', 'X', 3, 2)</literal>
+ <returnvalue>ThoXas</returnvalue>
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
@@ -3213,6 +3311,35 @@ repeat('Pg', 4) <returnvalue>PgPgPgPg</returnvalue>
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>regexp_substr</primary>
+ </indexterm>
+ <function>regexp_substr</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type>
+ [, <parameter>start</parameter> <type>integer</type>
+ [, <parameter>N</parameter> <type>integer</type>
+ [, <parameter>flags</parameter> <type>text</type>
+ [, <parameter>subexpr</parameter> <type>integer</type> ] ] ] ] )
+ <returnvalue>text</returnvalue>
+ </para>
+ <para>
+ Returns the substring within <parameter>string</parameter> that
+ matches the <parameter>N</parameter>'th occurrence of the POSIX
+ regular expression <parameter>pattern</parameter>,
+ or <literal>NULL</literal> if there is no such match; see
+ <xref linkend="functions-posix-regexp"/>.
+ </para>
+ <para>
+ <literal>regexp_substr('ABCDEF', 'c(.)(..)', 1, 1, 'i')</literal>
+ <returnvalue>CDEF</returnvalue>
+ </para>
+ <para>
+ <literal>regexp_substr('ABCDEF', 'c(.)(..)', 1, 1, 'i', 2)</literal>
+ <returnvalue>EF</returnvalue>
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
@@ -5377,6 +5504,15 @@ substring('foobar' similar '#"o_b#"%' escape '#') <lineannotation>NULL</linea
<indexterm>
<primary>substring</primary>
</indexterm>
+ <indexterm>
+ <primary>regexp_count</primary>
+ </indexterm>
+ <indexterm>
+ <primary>regexp_instr</primary>
+ </indexterm>
+ <indexterm>
+ <primary>regexp_like</primary>
+ </indexterm>
<indexterm>
<primary>regexp_match</primary>
</indexterm>
@@ -5392,6 +5528,9 @@ substring('foobar' similar '#"o_b#"%' escape '#') <lineannotation>NULL</linea
<indexterm>
<primary>regexp_split_to_array</primary>
</indexterm>
+ <indexterm>
+ <primary>regexp_substr</primary>
+ </indexterm>
<para>
<xref linkend="functions-posix-table"/> lists the available
@@ -5542,6 +5681,114 @@ substring('foobar' from 'o(.)b') <lineannotation>o</lineannotation>
</programlisting>
</para>
+ <para>
+ The <function>regexp_count</function> function counts the number of
+ places where a POSIX regular expression pattern matches a string.
+ It has the syntax
+ <function>regexp_count</function>(<replaceable>string</replaceable>,
+ <replaceable>pattern</replaceable>
+ <optional>, <replaceable>start</replaceable>
+ <optional>, <replaceable>flags</replaceable>
+ </optional></optional>).
+ <replaceable>pattern</replaceable> is searched for
+ in <replaceable>string</replaceable>, normally from the beginning of
+ the string, but if the <replaceable>start</replaceable> parameter is
+ provided then beginning from that character index.
+ The <replaceable>flags</replaceable> parameter is an optional text
+ string containing zero or more single-letter flags that change the
+ function's behavior. For example, including <literal>i</literal> in
+ <replaceable>flags</replaceable> specifies case-insensitive matching.
+ Supported flags are described in
+ <xref linkend="posix-embedded-options-table"/>.
+ </para>
+
+ <para>
+ Some examples:
+<programlisting>
+regexp_count('ABCABCAXYaxy', 'A.') <lineannotation>3</lineannotation>
+regexp_count('ABCABCAXYaxy', 'A.', 1, 'i') <lineannotation>4</lineannotation>
+</programlisting>
+ </para>
+
+ <para>
+ The <function>regexp_instr</function> function returns the starting or
+ ending position of the <replaceable>N</replaceable>'th match of a
+ POSIX regular expression pattern to a string, or zero if there is no
+ such match. It has the syntax
+ <function>regexp_instr</function>(<replaceable>string</replaceable>,
+ <replaceable>pattern</replaceable>
+ <optional>, <replaceable>start</replaceable>
+ <optional>, <replaceable>N</replaceable>
+ <optional>, <replaceable>endoption</replaceable>
+ <optional>, <replaceable>flags</replaceable>
+ <optional>, <replaceable>subexpr</replaceable>
+ </optional></optional></optional></optional></optional>).
+ <replaceable>pattern</replaceable> is searched for
+ in <replaceable>string</replaceable>, normally from the beginning of
+ the string, but if the <replaceable>start</replaceable> parameter is
+ provided then beginning from that character index.
+ If <replaceable>N</replaceable> is specified
+ then the <replaceable>N</replaceable>'th match of the pattern
+ is located, otherwise the first match is located.
+ If the <replaceable>endoption</replaceable> parameter is omitted or
+ specified as zero, the function returns the position of the first
+ character of the match. Otherwise, <replaceable>endoption</replaceable>
+ must be one, and the function returns the position of the character
+ following the match.
+ The <replaceable>flags</replaceable> parameter is an optional text
+ string containing zero or more single-letter flags that change the
+ function's behavior. Supported flags are described
+ in <xref linkend="posix-embedded-options-table"/>.
+ For a pattern containing parenthesized
+ subexpressions, <replaceable>subexpr</replaceable> is an integer
+ indicating which subexpression is of interest: the result identifies
+ the position of the substring matching that subexpression.
+ Subexpressions are numbered in the order of their leading parentheses.
+ When <replaceable>subexpr</replaceable> is omitted or zero, the result
+ identifies the position of the whole match regardless of
+ parenthesized subexpressions.
+ </para>
+
+ <para>
+ Some examples:
+<programlisting>
+regexp_instr('number of your street, town zip, FR', '[^,]+', 1, 2)
+ <lineannotation>23</lineannotation>
+regexp_instr('ABCDEFGHI', '(c..)(...)', 1, 1, 0, 'i', 2)
+ <lineannotation>6</lineannotation>
+</programlisting>
+ </para>
+
+ <para>
+ The <function>regexp_like</function> function checks whether a match
+ of a POSIX regular expression pattern occurs within a string,
+ returning boolean true or false. It has the syntax
+ <function>regexp_like</function>(<replaceable>string</replaceable>,
+ <replaceable>pattern</replaceable>
+ <optional>, <replaceable>start</replaceable> </optional>
+ <optional>, <replaceable>flags</replaceable> </optional>).
+ <replaceable>pattern</replaceable> is searched for
+ in <replaceable>string</replaceable>, normally from the beginning of
+ the string, but if the <replaceable>start</replaceable> parameter is
+ provided then beginning from that character index.
+ The <replaceable>flags</replaceable> parameter is an optional text
+ string containing zero or more single-letter flags that change the
+ function's behavior. Supported flags are described
+ in <xref linkend="posix-embedded-options-table"/>.
+ This function has the same results as the <literal>~</literal>
+ operator if no flags are specified. If only the <literal>i</literal>
+ flag is specified, it has the same results as
+ the <literal>~*</literal> operator.
+ </para>
+
+ <para>
+ Some examples:
+<programlisting>
+regexp_like('Hello World', 'world') <lineannotation>false</lineannotation>
+regexp_like('Hello World', 'world', 1, 'i') <lineannotation>true</lineannotation>
+</programlisting>
+ </para>
+
<para>
The <function>regexp_match</function> function returns a text array of
captured substring(s) resulting from the first match of a POSIX
@@ -5579,8 +5826,17 @@ SELECT regexp_match('foobarbequebaz', '(bar)(beque)');
{bar,beque}
(1 row)
</programlisting>
- In the common case where you just want the whole matching substring
- or <literal>NULL</literal> for no match, write something like
+ </para>
+
+ <tip>
+ <para>
+ In the common case where you just want the whole matching substring
+ or <literal>NULL</literal> for no match, the best solution is to
+ use <function>regexp_substr()</function>.
+ However, <function>regexp_substr()</function> only exists
+ in <productname>PostgreSQL</productname> version 15 and up. When
+ working in older versions, you can extract the first element
+ of <function>regexp_match()</function>'s result, for example:
<programlisting>
SELECT (regexp_match('foobarbequebaz', 'bar.*que'))[1];
regexp_match
@@ -5588,7 +5844,8 @@ SELECT (regexp_match('foobarbequebaz', 'bar.*que'))[1];
barbeque
(1 row)
</programlisting>
- </para>
+ </para>
+ </tip>
<para>
The <function>regexp_matches</function> function returns a set of text arrays
@@ -5650,7 +5907,13 @@ SELECT col1, (SELECT regexp_matches(col2, '(bar)(beque)')) FROM tab;
It has the syntax
<function>regexp_replace</function>(<replaceable>source</replaceable>,
<replaceable>pattern</replaceable>, <replaceable>replacement</replaceable>
+ <optional>, <replaceable>start</replaceable>
+ <optional>, <replaceable>N</replaceable>
+ </optional></optional>
<optional>, <replaceable>flags</replaceable> </optional>).
+ (Notice that <replaceable>N</replaceable> cannot be specified
+ unless <replaceable>start</replaceable> is,
+ but <replaceable>flags</replaceable> can be given in any case.)
The <replaceable>source</replaceable> string is returned unchanged if
there is no match to the <replaceable>pattern</replaceable>. If there is a
match, the <replaceable>source</replaceable> string is returned with the
@@ -5663,11 +5926,22 @@ SELECT col1, (SELECT regexp_matches(col2, '(bar)(beque)')) FROM tab;
substring matching the entire pattern should be inserted. Write
<literal>\\</literal> if you need to put a literal backslash in the replacement
text.
+ <replaceable>pattern</replaceable> is searched for
+ in <replaceable>string</replaceable>, normally from the beginning of
+ the string, but if the <replaceable>start</replaceable> parameter is
+ provided then beginning from that character index.
+ By default, only the first match of the pattern is replaced.
+ If <replaceable>N</replaceable> is specified and is greater than zero,
+ then the <replaceable>N</replaceable>'th match of the pattern
+ is replaced.
+ If the <literal>g</literal> flag is given, or
+ if <replaceable>N</replaceable> is specified and is zero, then all
+ matches at or after the <replaceable>start</replaceable> position are
+ replaced. (The <literal>g</literal> flag is ignored
+ when <replaceable>N</replaceable> is specified.)
The <replaceable>flags</replaceable> parameter is an optional text
string containing zero or more single-letter flags that change the
- function's behavior. Flag <literal>i</literal> specifies case-insensitive
- matching, while flag <literal>g</literal> specifies replacement of each matching
- substring rather than only the first one. Supported flags (though
+ function's behavior. Supported flags (though
not <literal>g</literal>) are
described in <xref linkend="posix-embedded-options-table"/>.
</para>
@@ -5681,6 +5955,10 @@ regexp_replace('foobarbaz', 'b..', 'X', 'g')
<lineannotation>fooXX</lineannotation>
regexp_replace('foobarbaz', 'b(..)', 'X\1Y', 'g')
<lineannotation>fooXarYXazY</lineannotation>
+regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 0, 'i')
+ <lineannotation>X PXstgrXSQL fXnctXXn</lineannotation>
+regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 3, 'i')
+ <lineannotation>A PostgrXSQL function</lineannotation>
</programlisting>
</para>
@@ -5712,7 +5990,6 @@ regexp_replace('foobarbaz', 'b(..)', 'X\1Y', 'g')
<para>
Some examples:
<programlisting>
-
SELECT foo FROM regexp_split_to_table('the quick brown fox jumps over the lazy dog', '\s+') AS foo;
foo
-------
@@ -5761,11 +6038,51 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', '\s*') AS foo;
zero-length matches that occur at the start or end of the string
or immediately after a previous match. This is contrary to the strict
definition of regexp matching that is implemented by
- <function>regexp_match</function> and
- <function>regexp_matches</function>, but is usually the most convenient behavior
+ the other regexp functions, but is usually the most convenient behavior
in practice. Other software systems such as Perl use similar definitions.
</para>
+ <para>
+ The <function>regexp_substr</function> function returns the substring
+ that matches a POSIX regular expression pattern,
+ or <literal>NULL</literal> if there is no match. It has the syntax
+ <function>regexp_substr</function>(<replaceable>string</replaceable>,
+ <replaceable>pattern</replaceable>
+ <optional>, <replaceable>start</replaceable>
+ <optional>, <replaceable>N</replaceable>
+ <optional>, <replaceable>flags</replaceable>
+ <optional>, <replaceable>subexpr</replaceable>
+ </optional></optional></optional></optional>).
+ <replaceable>pattern</replaceable> is searched for
+ in <replaceable>string</replaceable>, normally from the beginning of
+ the string, but if the <replaceable>start</replaceable> parameter is
+ provided then beginning from that character index.
+ If <replaceable>N</replaceable> is specified
+ then the <replaceable>N</replaceable>'th match of the pattern
+ is returned, otherwise the first match is returned.
+ The <replaceable>flags</replaceable> parameter is an optional text
+ string containing zero or more single-letter flags that change the
+ function's behavior. Supported flags are described
+ in <xref linkend="posix-embedded-options-table"/>.
+ For a pattern containing parenthesized
+ subexpressions, <replaceable>subexpr</replaceable> is an integer
+ indicating which subexpression is of interest: the result is the
+ substring matching that subexpression.
+ Subexpressions are numbered in the order of their leading parentheses.
+ When <replaceable>subexpr</replaceable> is omitted or zero, the result
+ is the whole match regardless of parenthesized subexpressions.
+ </para>
+
+ <para>
+ Some examples:
+<programlisting>
+regexp_substr('number of your street, town zip, FR', '[^,]+', 1, 2)
+ <lineannotation> town zip</lineannotation>
+regexp_substr('ABCDEFGHI', '(c..)(...)', 1, 1, 'i', 2)
+ <lineannotation>FGH</lineannotation>
+</programlisting>
+ </para>
+
<!-- derived from the re_syntax.n man page -->
<sect3 id="posix-syntax-details">
diff --git a/src/backend/utils/adt/regexp.c b/src/backend/utils/adt/regexp.c
index a32c5c82ab..ede20a4d14 100644
--- a/src/backend/utils/adt/regexp.c
+++ b/src/backend/utils/adt/regexp.c
@@ -113,6 +113,7 @@ static cached_re_str re_array[MAX_CACHED_RES]; /* cached re's */
/* Local functions */
static regexp_matches_ctx *setup_regexp_matches(text *orig_str, text *pattern,
pg_re_flags *flags,
+ int start_search,
Oid collation,
bool use_subpatterns,
bool ignore_degenerate,
@@ -629,7 +630,7 @@ textregexreplace_noopt(PG_FUNCTION_ARGS)
re = RE_compile_and_cache(p, REG_ADVANCED, PG_GET_COLLATION());
- PG_RETURN_TEXT_P(replace_text_regexp(s, (void *) re, r, false));
+ PG_RETURN_TEXT_P(replace_text_regexp(s, (void *) re, r, 0, 1));
}
/*
@@ -646,11 +647,97 @@ textregexreplace(PG_FUNCTION_ARGS)
regex_t *re;
pg_re_flags flags;
+ /*
+ * regexp_replace() with four arguments will be preferentially resolved as
+ * this form when the fourth argument is of type UNKNOWN. However, the
+ * user might have intended to call textregexreplace_extended_no_n. If we
+ * see flags that look like an integer, emit the same error that
+ * parse_re_flags would, but add a HINT about how to fix it.
+ */
+ if (VARSIZE_ANY_EXHDR(opt) > 0)
+ {
+ char *opt_p = VARDATA_ANY(opt);
+
+ if (*opt_p >= '0' && *opt_p <= '9')
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid regular expression option: \"%.*s\"",
+ pg_mblen(opt_p), opt_p),
+ errhint("If you meant to use regexp_replace() with a start parameter, cast the fourth argument to integer explicitly.")));
+ }
+
parse_re_flags(&flags, opt);
re = RE_compile_and_cache(p, flags.cflags, PG_GET_COLLATION());
- PG_RETURN_TEXT_P(replace_text_regexp(s, (void *) re, r, flags.glob));
+ PG_RETURN_TEXT_P(replace_text_regexp(s, (void *) re, r, 0,
+ flags.glob ? 0 : 1));
+}
+
+/*
+ * textregexreplace_extended()
+ * Return a string matched by a regular expression, with replacement.
+ * Extends textregexreplace by allowing a start position and the
+ * choice of the occurrence to replace (0 means all occurrences).
+ */
+Datum
+textregexreplace_extended(PG_FUNCTION_ARGS)
+{
+ text *s = PG_GETARG_TEXT_PP(0);
+ text *p = PG_GETARG_TEXT_PP(1);
+ text *r = PG_GETARG_TEXT_PP(2);
+ int start = 1;
+ int n = 1;
+ text *flags = PG_GETARG_TEXT_PP_IF_EXISTS(5);
+ pg_re_flags re_flags;
+ regex_t *re;
+
+ /* Collect optional parameters */
+ if (PG_NARGS() > 3)
+ {
+ start = PG_GETARG_INT32(3);
+ if (start <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d",
+ "start", start)));
+ }
+ if (PG_NARGS() > 4)
+ {
+ n = PG_GETARG_INT32(4);
+ if (n < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d",
+ "n", n)));
+ }
+
+ /* Determine options */
+ parse_re_flags(&re_flags, flags);
+
+ /* If N was not specified, deduce it from the 'g' flag */
+ if (PG_NARGS() <= 4)
+ n = re_flags.glob ? 0 : 1;
+
+ /* Compile the regular expression */
+ re = RE_compile_and_cache(p, re_flags.cflags, PG_GET_COLLATION());
+
+ /* Do the replacement(s) */
+ PG_RETURN_TEXT_P(replace_text_regexp(s, (void *) re, r, start - 1, n));
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+textregexreplace_extended_no_n(PG_FUNCTION_ARGS)
+{
+ return textregexreplace_extended(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+textregexreplace_extended_no_flags(PG_FUNCTION_ARGS)
+{
+ return textregexreplace_extended(fcinfo);
}
/*
@@ -958,6 +1045,255 @@ similar_escape(PG_FUNCTION_ARGS)
PG_RETURN_TEXT_P(result);
}
+/*
+ * regexp_count()
+ * Return the number of matches of a pattern within a string.
+ */
+Datum
+regexp_count(PG_FUNCTION_ARGS)
+{
+ text *str = PG_GETARG_TEXT_PP(0);
+ text *pattern = PG_GETARG_TEXT_PP(1);
+ int start = 1;
+ text *flags = PG_GETARG_TEXT_PP_IF_EXISTS(3);
+ pg_re_flags re_flags;
+ regexp_matches_ctx *matchctx;
+
+ /* Collect optional parameters */
+ if (PG_NARGS() > 2)
+ {
+ start = PG_GETARG_INT32(2);
+ if (start <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d",
+ "start", start)));
+ }
+
+ /* Determine options */
+ parse_re_flags(&re_flags, flags);
+ /* User mustn't specify 'g' */
+ if (re_flags.glob)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /* translator: %s is a SQL function name */
+ errmsg("%s does not support the \"global\" option",
+ "regexp_count()")));
+ /* But we find all the matches anyway */
+ re_flags.glob = true;
+
+ /* Do the matching */
+ matchctx = setup_regexp_matches(str, pattern, &re_flags, start - 1,
+ PG_GET_COLLATION(),
+ false, /* can ignore subexprs */
+ false, false);
+
+ PG_RETURN_INT32(matchctx->nmatches);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_count_no_start(PG_FUNCTION_ARGS)
+{
+ return regexp_count(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_count_no_flags(PG_FUNCTION_ARGS)
+{
+ return regexp_count(fcinfo);
+}
+
+/*
+ * regexp_instr()
+ * Return the match's position within the string
+ */
+Datum
+regexp_instr(PG_FUNCTION_ARGS)
+{
+ text *str = PG_GETARG_TEXT_PP(0);
+ text *pattern = PG_GETARG_TEXT_PP(1);
+ int start = 1;
+ int n = 1;
+ int endoption = 0;
+ text *flags = PG_GETARG_TEXT_PP_IF_EXISTS(5);
+ int subexpr = 0;
+ int pos;
+ pg_re_flags re_flags;
+ regexp_matches_ctx *matchctx;
+
+ /* Collect optional parameters */
+ if (PG_NARGS() > 2)
+ {
+ start = PG_GETARG_INT32(2);
+ if (start <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d",
+ "start", start)));
+ }
+ if (PG_NARGS() > 3)
+ {
+ n = PG_GETARG_INT32(3);
+ if (n <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d",
+ "n", n)));
+ }
+ if (PG_NARGS() > 4)
+ {
+ endoption = PG_GETARG_INT32(4);
+ if (endoption != 0 && endoption != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d",
+ "endoption", endoption)));
+ }
+ if (PG_NARGS() > 6)
+ {
+ subexpr = PG_GETARG_INT32(6);
+ if (subexpr < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d",
+ "subexpr", subexpr)));
+ }
+
+ /* Determine options */
+ parse_re_flags(&re_flags, flags);
+ /* User mustn't specify 'g' */
+ if (re_flags.glob)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /* translator: %s is a SQL function name */
+ errmsg("%s does not support the \"global\" option",
+ "regexp_instr()")));
+ /* But we find all the matches anyway */
+ re_flags.glob = true;
+
+ /* Do the matching */
+ matchctx = setup_regexp_matches(str, pattern, &re_flags, start - 1,
+ PG_GET_COLLATION(),
+ (subexpr > 0), /* need submatches? */
+ false, false);
+
+ /* When n exceeds matches return 0 (includes case of no matches) */
+ if (n > matchctx->nmatches)
+ PG_RETURN_INT32(0);
+
+ /* When subexpr exceeds number of subexpressions return 0 */
+ if (subexpr > matchctx->npatterns)
+ PG_RETURN_INT32(0);
+
+ /* Select the appropriate match position to return */
+ pos = (n - 1) * matchctx->npatterns;
+ if (subexpr > 0)
+ pos += subexpr - 1;
+ pos *= 2;
+ if (endoption == 1)
+ pos += 1;
+
+ if (matchctx->match_locs[pos] >= 0)
+ PG_RETURN_INT32(matchctx->match_locs[pos] + 1);
+ else
+ PG_RETURN_INT32(0); /* position not identifiable */
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_instr_no_start(PG_FUNCTION_ARGS)
+{
+ return regexp_instr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_instr_no_n(PG_FUNCTION_ARGS)
+{
+ return regexp_instr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_instr_no_endoption(PG_FUNCTION_ARGS)
+{
+ return regexp_instr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_instr_no_flags(PG_FUNCTION_ARGS)
+{
+ return regexp_instr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_instr_no_subexpr(PG_FUNCTION_ARGS)
+{
+ return regexp_instr(fcinfo);
+}
+
+/*
+ * regexp_like()
+ * Test for a pattern match within a string.
+ */
+Datum
+regexp_like(PG_FUNCTION_ARGS)
+{
+ text *str = PG_GETARG_TEXT_PP(0);
+ text *pattern = PG_GETARG_TEXT_PP(1);
+ int start = 1;
+ text *flags = PG_GETARG_TEXT_PP_IF_EXISTS(3);
+ pg_re_flags re_flags;
+ regexp_matches_ctx *matchctx;
+
+ /* Collect optional parameters */
+ if (PG_NARGS() > 2)
+ {
+ start = PG_GETARG_INT32(2);
+ if (start <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d",
+ "start", start)));
+ }
+
+ /* Determine options */
+ parse_re_flags(&re_flags, flags);
+ /* User mustn't specify 'g' */
+ if (re_flags.glob)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /* translator: %s is a SQL function name */
+ errmsg("%s does not support the \"global\" option",
+ "regexp_like()")));
+
+ /* Do the matching */
+ matchctx = setup_regexp_matches(str, pattern, &re_flags, start - 1,
+ PG_GET_COLLATION(),
+ false, /* can ignore subexprs */
+ false, false);
+
+ PG_RETURN_BOOL((matchctx->nmatches > 0) ? true : false);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_like_no_start(PG_FUNCTION_ARGS)
+{
+ return regexp_like(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_like_no_flags(PG_FUNCTION_ARGS)
+{
+ return regexp_like(fcinfo);
+}
+
/*
* regexp_match()
* Return the first substring(s) matching a pattern within a string.
@@ -982,7 +1318,7 @@ regexp_match(PG_FUNCTION_ARGS)
"regexp_match()"),
errhint("Use the regexp_matches function instead.")));
- matchctx = setup_regexp_matches(orig_str, pattern, &re_flags,
+ matchctx = setup_regexp_matches(orig_str, pattern, &re_flags, 0,
PG_GET_COLLATION(), true, false, false);
if (matchctx->nmatches == 0)
@@ -1029,7 +1365,7 @@ regexp_matches(PG_FUNCTION_ARGS)
/* be sure to copy the input string into the multi-call ctx */
matchctx = setup_regexp_matches(PG_GETARG_TEXT_P_COPY(0), pattern,
- &re_flags,
+ &re_flags, 0,
PG_GET_COLLATION(),
true, false, false);
@@ -1064,24 +1400,28 @@ regexp_matches_no_flags(PG_FUNCTION_ARGS)
}
/*
- * setup_regexp_matches --- do the initial matching for regexp_match
- * and regexp_split functions
+ * setup_regexp_matches --- do the initial matching for regexp_match,
+ * regexp_split, and related functions
*
* To avoid having to re-find the compiled pattern on each call, we do
* all the matching in one swoop. The returned regexp_matches_ctx contains
* the locations of all the substrings matching the pattern.
*
- * The three bool parameters have only two patterns (one for matching, one for
- * splitting) but it seems clearer to distinguish the functionality this way
- * than to key it all off one "is_split" flag. We don't currently assume that
- * fetching_unmatched is exclusive of fetching the matched text too; if it's
- * set, the conversion buffer is large enough to fetch any single matched or
- * unmatched string, but not any larger substring. (In practice, when splitting
- * the matches are usually small anyway, and it didn't seem worth complicating
- * the code further.)
+ * start_search: the character (not byte) offset in orig_str at which to
+ * begin the search. Returned positions are relative to orig_str anyway.
+ * use_subpatterns: collect data about matches to parenthesized subexpressions.
+ * ignore_degenerate: ignore zero-length matches.
+ * fetching_unmatched: caller wants to fetch unmatched substrings.
+ *
+ * We don't currently assume that fetching_unmatched is exclusive of fetching
+ * the matched text too; if it's set, the conversion buffer is large enough to
+ * fetch any single matched or unmatched string, but not any larger
+ * substring. (In practice, when splitting the matches are usually small
+ * anyway, and it didn't seem worth complicating the code further.)
*/
static regexp_matches_ctx *
setup_regexp_matches(text *orig_str, text *pattern, pg_re_flags *re_flags,
+ int start_search,
Oid collation,
bool use_subpatterns,
bool ignore_degenerate,
@@ -1099,7 +1439,6 @@ setup_regexp_matches(text *orig_str, text *pattern, pg_re_flags *re_flags,
int array_idx;
int prev_match_end;
int prev_valid_match_end;
- int start_search;
int maxlen = 0; /* largest fetch length in characters */
/* save original string --- we'll extract result substrings from it */
@@ -1142,7 +1481,6 @@ setup_regexp_matches(text *orig_str, text *pattern, pg_re_flags *re_flags,
/* search for the pattern, perhaps repeatedly */
prev_match_end = 0;
prev_valid_match_end = 0;
- start_search = 0;
while (RE_wchar_execute(cpattern, wide_str, wide_len, start_search,
pmatch_len, pmatch))
{
@@ -1367,7 +1705,7 @@ regexp_split_to_table(PG_FUNCTION_ARGS)
/* be sure to copy the input string into the multi-call ctx */
splitctx = setup_regexp_matches(PG_GETARG_TEXT_P_COPY(0), pattern,
- &re_flags,
+ &re_flags, 0,
PG_GET_COLLATION(),
false, true, true);
@@ -1422,7 +1760,7 @@ regexp_split_to_array(PG_FUNCTION_ARGS)
splitctx = setup_regexp_matches(PG_GETARG_TEXT_PP(0),
PG_GETARG_TEXT_PP(1),
- &re_flags,
+ &re_flags, 0,
PG_GET_COLLATION(),
false, true, true);
@@ -1489,6 +1827,125 @@ build_regexp_split_result(regexp_matches_ctx *splitctx)
}
}
+/*
+ * regexp_substr()
+ * Return the substring that matches a regular expression pattern
+ */
+Datum
+regexp_substr(PG_FUNCTION_ARGS)
+{
+ text *str = PG_GETARG_TEXT_PP(0);
+ text *pattern = PG_GETARG_TEXT_PP(1);
+ int start = 1;
+ int n = 1;
+ text *flags = PG_GETARG_TEXT_PP_IF_EXISTS(4);
+ int subexpr = 0;
+ int so,
+ eo,
+ pos;
+ pg_re_flags re_flags;
+ regexp_matches_ctx *matchctx;
+
+ /* Collect optional parameters */
+ if (PG_NARGS() > 2)
+ {
+ start = PG_GETARG_INT32(2);
+ if (start <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d",
+ "start", start)));
+ }
+ if (PG_NARGS() > 3)
+ {
+ n = PG_GETARG_INT32(3);
+ if (n <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d",
+ "n", n)));
+ }
+ if (PG_NARGS() > 5)
+ {
+ subexpr = PG_GETARG_INT32(5);
+ if (subexpr < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\": %d",
+ "subexpr", subexpr)));
+ }
+
+ /* Determine options */
+ parse_re_flags(&re_flags, flags);
+ /* User mustn't specify 'g' */
+ if (re_flags.glob)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /* translator: %s is a SQL function name */
+ errmsg("%s does not support the \"global\" option",
+ "regexp_substr()")));
+ /* But we find all the matches anyway */
+ re_flags.glob = true;
+
+ /* Do the matching */
+ matchctx = setup_regexp_matches(str, pattern, &re_flags, start - 1,
+ PG_GET_COLLATION(),
+ (subexpr > 0), /* need submatches? */
+ false, false);
+
+ /* When n exceeds matches return NULL (includes case of no matches) */
+ if (n > matchctx->nmatches)
+ PG_RETURN_NULL();
+
+ /* When subexpr exceeds number of subexpressions return NULL */
+ if (subexpr > matchctx->npatterns)
+ PG_RETURN_NULL();
+
+ /* Select the appropriate match position to return */
+ pos = (n - 1) * matchctx->npatterns;
+ if (subexpr > 0)
+ pos += subexpr - 1;
+ pos *= 2;
+ so = matchctx->match_locs[pos];
+ eo = matchctx->match_locs[pos + 1];
+
+ if (so < 0 || eo < 0)
+ PG_RETURN_NULL(); /* unidentifiable location */
+
+ PG_RETURN_DATUM(DirectFunctionCall3(text_substr,
+ PointerGetDatum(matchctx->orig_str),
+ Int32GetDatum(so + 1),
+ Int32GetDatum(eo - so)));
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_substr_no_start(PG_FUNCTION_ARGS)
+{
+ return regexp_substr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_substr_no_n(PG_FUNCTION_ARGS)
+{
+ return regexp_substr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_substr_no_flags(PG_FUNCTION_ARGS)
+{
+ return regexp_substr(fcinfo);
+}
+
+/* This is separate to keep the opr_sanity regression test from complaining */
+Datum
+regexp_substr_no_subexpr(PG_FUNCTION_ARGS)
+{
+ return regexp_substr(fcinfo);
+}
+
/*
* regexp_fixed_prefix - extract fixed prefix, if any, for a regexp
*
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index d2a11b1b5d..a0bde4e352 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -4496,23 +4496,28 @@ appendStringInfoRegexpSubstr(StringInfo str, text *replace_text,
/*
* replace_text_regexp
*
- * replace text that matches to regexp in src_text to replace_text.
+ * replace text that matches to regexp in src_text with replace_text.
+ *
+ * search_start: the character (not byte) offset in src_text at which to
+ * begin searching.
+ * n: if 0, replace all matches; if > 0, replace only the N'th match.
*
* Note: to avoid having to include regex.h in builtins.h, we declare
* the regexp argument as void *, but really it's regex_t *.
*/
text *
replace_text_regexp(text *src_text, void *regexp,
- text *replace_text, bool glob)
+ text *replace_text,
+ int search_start, int n)
{
text *ret_text;
regex_t *re = (regex_t *) regexp;
int src_text_len = VARSIZE_ANY_EXHDR(src_text);
+ int nmatches = 0;
StringInfoData buf;
regmatch_t pmatch[REGEXP_REPLACE_BACKREF_CNT];
pg_wchar *data;
size_t data_len;
- int search_start;
int data_pos;
char *start_ptr;
bool have_escape;
@@ -4530,7 +4535,6 @@ replace_text_regexp(text *src_text, void *regexp,
start_ptr = (char *) VARDATA_ANY(src_text);
data_pos = 0;
- search_start = 0;
while (search_start <= data_len)
{
int regexec_result;
@@ -4560,6 +4564,23 @@ replace_text_regexp(text *src_text, void *regexp,
errmsg("regular expression failed: %s", errMsg)));
}
+ /*
+ * Count matches, and decide whether to replace this match.
+ */
+ nmatches++;
+ if (n > 0 && nmatches != n)
+ {
+ /*
+ * No, so advance search_start, but not start_ptr/data_pos. (Thus,
+ * we treat the matched text as if it weren't matched, and copy it
+ * to the output later.)
+ */
+ search_start = pmatch[0].rm_eo;
+ if (pmatch[0].rm_so == pmatch[0].rm_eo)
+ search_start++;
+ continue;
+ }
+
/*
* Copy the text to the left of the match position. Note we are given
* character not byte indexes.
@@ -4596,9 +4617,9 @@ replace_text_regexp(text *src_text, void *regexp,
data_pos = pmatch[0].rm_eo;
/*
- * When global option is off, replace the first instance only.
+ * If we only want to replace one occurrence, we're done.
*/
- if (!glob)
+ if (n > 0)
break;
/*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 8cd0252082..32e5d25714 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -3565,6 +3565,18 @@
{ oid => '2285', descr => 'replace text using regexp',
proname => 'regexp_replace', prorettype => 'text',
proargtypes => 'text text text text', prosrc => 'textregexreplace' },
+{ oid => '9611', descr => 'replace text using regexp',
+ proname => 'regexp_replace', prorettype => 'text',
+ proargtypes => 'text text text int4 int4 text',
+ prosrc => 'textregexreplace_extended' },
+{ oid => '9612', descr => 'replace text using regexp',
+ proname => 'regexp_replace', prorettype => 'text',
+ proargtypes => 'text text text int4 int4',
+ prosrc => 'textregexreplace_extended_no_flags' },
+{ oid => '9613', descr => 'replace text using regexp',
+ proname => 'regexp_replace', prorettype => 'text',
+ proargtypes => 'text text text int4',
+ prosrc => 'textregexreplace_extended_no_n' },
{ oid => '3396', descr => 'find first match for regexp',
proname => 'regexp_match', prorettype => '_text', proargtypes => 'text text',
prosrc => 'regexp_match_no_flags' },
@@ -3579,6 +3591,61 @@
proname => 'regexp_matches', prorows => '10', proretset => 't',
prorettype => '_text', proargtypes => 'text text text',
prosrc => 'regexp_matches' },
+{ oid => '9614', descr => 'count regexp matches',
+ proname => 'regexp_count', prorettype => 'int4', proargtypes => 'text text',
+ prosrc => 'regexp_count_no_start' },
+{ oid => '9615', descr => 'count regexp matches',
+ proname => 'regexp_count', prorettype => 'int4',
+ proargtypes => 'text text int4', prosrc => 'regexp_count_no_flags' },
+{ oid => '9616', descr => 'count regexp matches',
+ proname => 'regexp_count', prorettype => 'int4',
+ proargtypes => 'text text int4 text', prosrc => 'regexp_count' },
+{ oid => '9617', descr => 'position of regexp match',
+ proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text',
+ prosrc => 'regexp_instr_no_start' },
+{ oid => '9618', descr => 'position of regexp match',
+ proname => 'regexp_instr', prorettype => 'int4',
+ proargtypes => 'text text int4', prosrc => 'regexp_instr_no_n' },
+{ oid => '9619', descr => 'position of regexp match',
+ proname => 'regexp_instr', prorettype => 'int4',
+ proargtypes => 'text text int4 int4', prosrc => 'regexp_instr_no_endoption' },
+{ oid => '9620', descr => 'position of regexp match',
+ proname => 'regexp_instr', prorettype => 'int4',
+ proargtypes => 'text text int4 int4 int4',
+ prosrc => 'regexp_instr_no_flags' },
+{ oid => '9621', descr => 'position of regexp match',
+ proname => 'regexp_instr', prorettype => 'int4',
+ proargtypes => 'text text int4 int4 int4 text',
+ prosrc => 'regexp_instr_no_subexpr' },
+{ oid => '9622', descr => 'position of regexp match',
+ proname => 'regexp_instr', prorettype => 'int4',
+ proargtypes => 'text text int4 int4 int4 text int4',
+ prosrc => 'regexp_instr' },
+{ oid => '9623', descr => 'test for regexp match',
+ proname => 'regexp_like', prorettype => 'bool', proargtypes => 'text text',
+ prosrc => 'regexp_like_no_start' },
+{ oid => '9624', descr => 'test for regexp match',
+ proname => 'regexp_like', prorettype => 'bool',
+ proargtypes => 'text text int4', prosrc => 'regexp_like_no_flags' },
+{ oid => '9630', descr => 'test for regexp match',
+ proname => 'regexp_like', prorettype => 'bool',
+ proargtypes => 'text text int4 text', prosrc => 'regexp_like' },
+{ oid => '9625', descr => 'extract substring that matches regexp',
+ proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text',
+ prosrc => 'regexp_substr_no_start' },
+{ oid => '9626', descr => 'extract substring that matches regexp',
+ proname => 'regexp_substr', prorettype => 'text',
+ proargtypes => 'text text int4', prosrc => 'regexp_substr_no_n' },
+{ oid => '9627', descr => 'extract substring that matches regexp',
+ proname => 'regexp_substr', prorettype => 'text',
+ proargtypes => 'text text int4 int4', prosrc => 'regexp_substr_no_flags' },
+{ oid => '9628', descr => 'extract substring that matches regexp',
+ proname => 'regexp_substr', prorettype => 'text',
+ proargtypes => 'text text int4 int4 text',
+ prosrc => 'regexp_substr_no_subexpr' },
+{ oid => '9629', descr => 'extract substring that matches regexp',
+ proname => 'regexp_substr', prorettype => 'text',
+ proargtypes => 'text text int4 int4 text int4', prosrc => 'regexp_substr' },
{ oid => '2088', descr => 'split string by field_sep and return field_num',
proname => 'split_part', prorettype => 'text',
proargtypes => 'text text int4', prosrc => 'split_part' },
diff --git a/src/include/utils/varlena.h b/src/include/utils/varlena.h
index 5c39723332..6645e2af13 100644
--- a/src/include/utils/varlena.h
+++ b/src/include/utils/varlena.h
@@ -34,6 +34,7 @@ extern bool SplitDirectoriesString(char *rawstring, char separator,
extern bool SplitGUCList(char *rawstring, char separator,
List **namelist);
extern text *replace_text_regexp(text *src_text, void *regexp,
- text *replace_text, bool glob);
+ text *replace_text,
+ int search_start, int n);
#endif
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index 91aa819804..2359105348 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -515,6 +515,13 @@ SELECT SUBSTRING('abcdefg' FROM 'b(.*)f') AS "cde";
cde
(1 row)
+-- Check case where we have a match, but not a subexpression match
+SELECT SUBSTRING('foo' FROM 'foo(bar)?') IS NULL AS t;
+ t
+---
+ t
+(1 row)
+
-- Check behavior of SIMILAR TO, which uses largely the same regexp variant
SELECT 'abcdefg' SIMILAR TO '_bcd%' AS true;
true
@@ -592,6 +599,378 @@ SELECT regexp_replace('AAA aaa', 'A+', 'Z', 'gi');
-- invalid regexp option
SELECT regexp_replace('AAA aaa', 'A+', 'Z', 'z');
ERROR: invalid regular expression option: "z"
+-- extended regexp_replace tests
+SELECT regexp_replace('A PostgreSQL function', 'A|e|i|o|u', 'X', 1);
+ regexp_replace
+-----------------------
+ X PostgreSQL function
+(1 row)
+
+SELECT regexp_replace('A PostgreSQL function', 'A|e|i|o|u', 'X', 1, 2);
+ regexp_replace
+-----------------------
+ A PXstgreSQL function
+(1 row)
+
+SELECT regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 0, 'i');
+ regexp_replace
+-----------------------
+ X PXstgrXSQL fXnctXXn
+(1 row)
+
+SELECT regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 1, 'i');
+ regexp_replace
+-----------------------
+ X PostgreSQL function
+(1 row)
+
+SELECT regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 2, 'i');
+ regexp_replace
+-----------------------
+ A PXstgreSQL function
+(1 row)
+
+SELECT regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 3, 'i');
+ regexp_replace
+-----------------------
+ A PostgrXSQL function
+(1 row)
+
+SELECT regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 9, 'i');
+ regexp_replace
+-----------------------
+ A PostgreSQL function
+(1 row)
+
+SELECT regexp_replace('A PostgreSQL function', 'A|e|i|o|u', 'X', 7, 0, 'i');
+ regexp_replace
+-----------------------
+ A PostgrXSQL fXnctXXn
+(1 row)
+
+-- 'g' flag should be ignored when N is specified
+SELECT regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 1, 'g');
+ regexp_replace
+-----------------------
+ A PXstgreSQL function
+(1 row)
+
+-- errors
+SELECT regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', -1, 0, 'i');
+ERROR: invalid value for parameter "start": -1
+SELECT regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, -1, 'i');
+ERROR: invalid value for parameter "n": -1
+-- erroneous invocation of non-extended form
+SELECT regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', '1');
+ERROR: invalid regular expression option: "1"
+HINT: If you meant to use regexp_replace() with a start parameter, cast the fourth argument to integer explicitly.
+-- regexp_count tests
+SELECT regexp_count('123123123123123', '(12)3');
+ regexp_count
+--------------
+ 5
+(1 row)
+
+SELECT regexp_count('123123123123', '123', 1);
+ regexp_count
+--------------
+ 4
+(1 row)
+
+SELECT regexp_count('123123123123', '123', 3);
+ regexp_count
+--------------
+ 3
+(1 row)
+
+SELECT regexp_count('123123123123', '123', 33);
+ regexp_count
+--------------
+ 0
+(1 row)
+
+SELECT regexp_count('ABCABCABCABC', 'Abc', 1, '');
+ regexp_count
+--------------
+ 0
+(1 row)
+
+SELECT regexp_count('ABCABCABCABC', 'Abc', 1, 'i');
+ regexp_count
+--------------
+ 4
+(1 row)
+
+-- errors
+SELECT regexp_count('123123123123', '123', 0);
+ERROR: invalid value for parameter "start": 0
+SELECT regexp_count('123123123123', '123', -3);
+ERROR: invalid value for parameter "start": -3
+-- regexp_like tests
+SELECT regexp_like('Steven', '^Ste(v|ph)en$');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT regexp_like('a'||CHR(10)||'d', 'a.d', 1, 'n');
+ regexp_like
+-------------
+ f
+(1 row)
+
+SELECT regexp_like('a'||CHR(10)||'d', 'a.d', 1, 's');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT regexp_like('abc', ' a . c ', 1, 'x');
+ regexp_like
+-------------
+ t
+(1 row)
+
+SELECT regexp_like('abc', 'a.c', 2);
+ regexp_like
+-------------
+ f
+(1 row)
+
+SELECT regexp_like('abc', 'a.c', 0); -- error
+ERROR: invalid value for parameter "start": 0
+SELECT regexp_like('abc', 'a.c', 1, 'g'); -- error
+ERROR: regexp_like() does not support the "global" option
+-- regexp_instr tests
+SELECT regexp_instr('abcdefghi', 'd.f');
+ regexp_instr
+--------------
+ 4
+(1 row)
+
+SELECT regexp_instr('abcdefghi', 'd.q');
+ regexp_instr
+--------------
+ 0
+(1 row)
+
+SELECT regexp_instr('abcabcabc', 'a.c');
+ regexp_instr
+--------------
+ 1
+(1 row)
+
+SELECT regexp_instr('abcabcabc', 'a.c', 2);
+ regexp_instr
+--------------
+ 4
+(1 row)
+
+SELECT regexp_instr('abcabcabc', 'a.c', 1, 3);
+ regexp_instr
+--------------
+ 7
+(1 row)
+
+SELECT regexp_instr('abcabcabc', 'a.c', 1, 4);
+ regexp_instr
+--------------
+ 0
+(1 row)
+
+SELECT regexp_instr('abcabcabc', 'A.C', 1, 2, 0, 'i');
+ regexp_instr
+--------------
+ 4
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 0);
+ regexp_instr
+--------------
+ 1
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 1);
+ regexp_instr
+--------------
+ 1
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 2);
+ regexp_instr
+--------------
+ 4
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 3);
+ regexp_instr
+--------------
+ 5
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 4);
+ regexp_instr
+--------------
+ 7
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 5);
+ regexp_instr
+--------------
+ 0
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 0);
+ regexp_instr
+--------------
+ 9
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 1);
+ regexp_instr
+--------------
+ 4
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 2);
+ regexp_instr
+--------------
+ 9
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 3);
+ regexp_instr
+--------------
+ 7
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 4);
+ regexp_instr
+--------------
+ 9
+(1 row)
+
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 5);
+ regexp_instr
+--------------
+ 0
+(1 row)
+
+-- Check case where we have a match, but not a subexpression match
+SELECT regexp_instr('foo', 'foo(bar)?', 1, 1, 0, '', 1);
+ regexp_instr
+--------------
+ 0
+(1 row)
+
+-- errors
+SELECT regexp_instr('abcabcabc', 'a.c', 0, 1);
+ERROR: invalid value for parameter "start": 0
+SELECT regexp_instr('abcabcabc', 'a.c', 1, 0);
+ERROR: invalid value for parameter "n": 0
+SELECT regexp_instr('abcabcabc', 'a.c', 1, 1, -1);
+ERROR: invalid value for parameter "endoption": -1
+SELECT regexp_instr('abcabcabc', 'a.c', 1, 1, 2);
+ERROR: invalid value for parameter "endoption": 2
+SELECT regexp_instr('abcabcabc', 'a.c', 1, 1, 0, 'g');
+ERROR: regexp_instr() does not support the "global" option
+SELECT regexp_instr('abcabcabc', 'a.c', 1, 1, 0, '', -1);
+ERROR: invalid value for parameter "subexpr": -1
+-- regexp_substr tests
+SELECT regexp_substr('abcdefghi', 'd.f');
+ regexp_substr
+---------------
+ def
+(1 row)
+
+SELECT regexp_substr('abcdefghi', 'd.q') IS NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT regexp_substr('abcabcabc', 'a.c');
+ regexp_substr
+---------------
+ abc
+(1 row)
+
+SELECT regexp_substr('abcabcabc', 'a.c', 2);
+ regexp_substr
+---------------
+ abc
+(1 row)
+
+SELECT regexp_substr('abcabcabc', 'a.c', 1, 3);
+ regexp_substr
+---------------
+ abc
+(1 row)
+
+SELECT regexp_substr('abcabcabc', 'a.c', 1, 4) IS NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT regexp_substr('abcabcabc', 'A.C', 1, 2, 'i');
+ regexp_substr
+---------------
+ abc
+(1 row)
+
+SELECT regexp_substr('1234567890', '(123)(4(56)(78))', 1, 1, 'i', 0);
+ regexp_substr
+---------------
+ 12345678
+(1 row)
+
+SELECT regexp_substr('1234567890', '(123)(4(56)(78))', 1, 1, 'i', 1);
+ regexp_substr
+---------------
+ 123
+(1 row)
+
+SELECT regexp_substr('1234567890', '(123)(4(56)(78))', 1, 1, 'i', 2);
+ regexp_substr
+---------------
+ 45678
+(1 row)
+
+SELECT regexp_substr('1234567890', '(123)(4(56)(78))', 1, 1, 'i', 3);
+ regexp_substr
+---------------
+ 56
+(1 row)
+
+SELECT regexp_substr('1234567890', '(123)(4(56)(78))', 1, 1, 'i', 4);
+ regexp_substr
+---------------
+ 78
+(1 row)
+
+SELECT regexp_substr('1234567890', '(123)(4(56)(78))', 1, 1, 'i', 5) IS NULL AS t;
+ t
+---
+ t
+(1 row)
+
+-- Check case where we have a match, but not a subexpression match
+SELECT regexp_substr('foo', 'foo(bar)?', 1, 1, '', 1) IS NULL AS t;
+ t
+---
+ t
+(1 row)
+
+-- errors
+SELECT regexp_substr('abcabcabc', 'a.c', 0, 1);
+ERROR: invalid value for parameter "start": 0
+SELECT regexp_substr('abcabcabc', 'a.c', 1, 0);
+ERROR: invalid value for parameter "n": 0
+SELECT regexp_substr('abcabcabc', 'a.c', 1, 1, 'g');
+ERROR: regexp_substr() does not support the "global" option
+SELECT regexp_substr('abcabcabc', 'a.c', 1, 1, '', -1);
+ERROR: invalid value for parameter "subexpr": -1
-- set so we can tell NULL from empty string
\pset null '\\N'
-- return all matches from regexp
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index 2c502534c2..92837fdd14 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -171,6 +171,8 @@ SELECT SUBSTRING('abcdefg' FROM 'c.e') AS "cde";
-- With a parenthesized subexpression, return only what matches the subexpr
SELECT SUBSTRING('abcdefg' FROM 'b(.*)f') AS "cde";
+-- Check case where we have a match, but not a subexpression match
+SELECT SUBSTRING('foo' FROM 'foo(bar)?') IS NULL AS t;
-- Check behavior of SIMILAR TO, which uses largely the same regexp variant
SELECT 'abcdefg' SIMILAR TO '_bcd%' AS true;
@@ -193,6 +195,95 @@ SELECT regexp_replace('AAA aaa', 'A+', 'Z', 'gi');
-- invalid regexp option
SELECT regexp_replace('AAA aaa', 'A+', 'Z', 'z');
+-- extended regexp_replace tests
+SELECT regexp_replace('A PostgreSQL function', 'A|e|i|o|u', 'X', 1);
+SELECT regexp_replace('A PostgreSQL function', 'A|e|i|o|u', 'X', 1, 2);
+SELECT regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 0, 'i');
+SELECT regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 1, 'i');
+SELECT regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 2, 'i');
+SELECT regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 3, 'i');
+SELECT regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 9, 'i');
+SELECT regexp_replace('A PostgreSQL function', 'A|e|i|o|u', 'X', 7, 0, 'i');
+-- 'g' flag should be ignored when N is specified
+SELECT regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 1, 'g');
+-- errors
+SELECT regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', -1, 0, 'i');
+SELECT regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, -1, 'i');
+-- erroneous invocation of non-extended form
+SELECT regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', '1');
+
+-- regexp_count tests
+SELECT regexp_count('123123123123123', '(12)3');
+SELECT regexp_count('123123123123', '123', 1);
+SELECT regexp_count('123123123123', '123', 3);
+SELECT regexp_count('123123123123', '123', 33);
+SELECT regexp_count('ABCABCABCABC', 'Abc', 1, '');
+SELECT regexp_count('ABCABCABCABC', 'Abc', 1, 'i');
+-- errors
+SELECT regexp_count('123123123123', '123', 0);
+SELECT regexp_count('123123123123', '123', -3);
+
+-- regexp_like tests
+SELECT regexp_like('Steven', '^Ste(v|ph)en$');
+SELECT regexp_like('a'||CHR(10)||'d', 'a.d', 1, 'n');
+SELECT regexp_like('a'||CHR(10)||'d', 'a.d', 1, 's');
+SELECT regexp_like('abc', ' a . c ', 1, 'x');
+SELECT regexp_like('abc', 'a.c', 2);
+SELECT regexp_like('abc', 'a.c', 0); -- error
+SELECT regexp_like('abc', 'a.c', 1, 'g'); -- error
+
+-- regexp_instr tests
+SELECT regexp_instr('abcdefghi', 'd.f');
+SELECT regexp_instr('abcdefghi', 'd.q');
+SELECT regexp_instr('abcabcabc', 'a.c');
+SELECT regexp_instr('abcabcabc', 'a.c', 2);
+SELECT regexp_instr('abcabcabc', 'a.c', 1, 3);
+SELECT regexp_instr('abcabcabc', 'a.c', 1, 4);
+SELECT regexp_instr('abcabcabc', 'A.C', 1, 2, 0, 'i');
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 0);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 1);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 2);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 3);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 4);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 5);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 0);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 1);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 2);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 3);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 4);
+SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 1, 'i', 5);
+-- Check case where we have a match, but not a subexpression match
+SELECT regexp_instr('foo', 'foo(bar)?', 1, 1, 0, '', 1);
+-- errors
+SELECT regexp_instr('abcabcabc', 'a.c', 0, 1);
+SELECT regexp_instr('abcabcabc', 'a.c', 1, 0);
+SELECT regexp_instr('abcabcabc', 'a.c', 1, 1, -1);
+SELECT regexp_instr('abcabcabc', 'a.c', 1, 1, 2);
+SELECT regexp_instr('abcabcabc', 'a.c', 1, 1, 0, 'g');
+SELECT regexp_instr('abcabcabc', 'a.c', 1, 1, 0, '', -1);
+
+-- regexp_substr tests
+SELECT regexp_substr('abcdefghi', 'd.f');
+SELECT regexp_substr('abcdefghi', 'd.q') IS NULL AS t;
+SELECT regexp_substr('abcabcabc', 'a.c');
+SELECT regexp_substr('abcabcabc', 'a.c', 2);
+SELECT regexp_substr('abcabcabc', 'a.c', 1, 3);
+SELECT regexp_substr('abcabcabc', 'a.c', 1, 4) IS NULL AS t;
+SELECT regexp_substr('abcabcabc', 'A.C', 1, 2, 'i');
+SELECT regexp_substr('1234567890', '(123)(4(56)(78))', 1, 1, 'i', 0);
+SELECT regexp_substr('1234567890', '(123)(4(56)(78))', 1, 1, 'i', 1);
+SELECT regexp_substr('1234567890', '(123)(4(56)(78))', 1, 1, 'i', 2);
+SELECT regexp_substr('1234567890', '(123)(4(56)(78))', 1, 1, 'i', 3);
+SELECT regexp_substr('1234567890', '(123)(4(56)(78))', 1, 1, 'i', 4);
+SELECT regexp_substr('1234567890', '(123)(4(56)(78))', 1, 1, 'i', 5) IS NULL AS t;
+-- Check case where we have a match, but not a subexpression match
+SELECT regexp_substr('foo', 'foo(bar)?', 1, 1, '', 1) IS NULL AS t;
+-- errors
+SELECT regexp_substr('abcabcabc', 'a.c', 0, 1);
+SELECT regexp_substr('abcabcabc', 'a.c', 1, 0);
+SELECT regexp_substr('abcabcabc', 'a.c', 1, 1, 'g');
+SELECT regexp_substr('abcabcabc', 'a.c', 1, 1, '', -1);
+
-- set so we can tell NULL from empty string
\pset null '\\N'
On 8/3/21 1:26 PM, Gilles Darold wrote:
Le 03/08/2021 � 11:45, Gilles Darold a �crit�:
Actually I just found that the regexp_like() function doesn't support
the start parameter which is something we should support. I saw that
Oracle do not support it but DB2 does and I think we should also
support it. I will post a new version of the patch once it is done.
+1
I for one am in favor of this 'start'-argument addition. Slightly
harder usage, but more precise manipulation.
Erik Rijkers
Show quoted text
Here is a new version of the patch that adds the start parameter to
regexp_like() function but while I'm adding support to this parameter it
become less obvious for me that we should implement it. However feel
free to not use this version if you think that adding the start
parameter has no real interest.Best regards,
Erik Rijkers <er@xs4all.nl> writes:
On 8/3/21 1:26 PM, Gilles Darold wrote:
Le 03/08/2021 à 11:45, Gilles Darold a écrit :
Actually I just found that the regexp_like() function doesn't support
the start parameter which is something we should support. I saw that
Oracle do not support it but DB2 does and I think we should also
support it. I will post a new version of the patch once it is done.
+1
I for one am in favor of this 'start'-argument addition. Slightly
harder usage, but more precise manipulation.
As I said upthread, I am *not* in favor of making those DB2 additions.
We do not need to create ambiguities around those functions like the
one we have for regexp_replace. If Oracle doesn't have those options,
why do we need them?
regards, tom lane
Le 03/08/2021 à 15:39, Tom Lane a écrit :
Erik Rijkers <er@xs4all.nl> writes:
On 8/3/21 1:26 PM, Gilles Darold wrote:
Le 03/08/2021 à 11:45, Gilles Darold a écrit :
Actually I just found that the regexp_like() function doesn't support
the start parameter which is something we should support. I saw that
Oracle do not support it but DB2 does and I think we should also
support it. I will post a new version of the patch once it is done.+1
I for one am in favor of this 'start'-argument addition. Slightly
harder usage, but more precise manipulation.As I said upthread, I am *not* in favor of making those DB2 additions.
We do not need to create ambiguities around those functions like the
one we have for regexp_replace. If Oracle doesn't have those options,
why do we need them?
Sorry I have missed that, but I'm fine with this implemenation so let's
keep the v6 version of the patch and drop this one.
--
Gilles Darold
Gilles Darold <gilles@darold.net> writes:
Sorry I have missed that, but I'm fine with this implemenation so let's
keep the v6 version of the patch and drop this one.
Pushed, then. There's still lots of time to tweak the behavior of course.
regards, tom lane
On 03.08.21 19:10, Tom Lane wrote:
Gilles Darold <gilles@darold.net> writes:
Sorry I have missed that, but I'm fine with this implemenation so let's
keep the v6 version of the patch and drop this one.Pushed, then. There's still lots of time to tweak the behavior of course.
I have a documentation follow-up to this. It seems that these new
functions are almost a de facto standard, whereas the SQL-standard
functions are not implemented anywhere. I propose the attached patch to
update the subsection in the pattern-matching section to give more
detail on this and suggest equivalent functions among these newly added
ones. What do you think?
Attachments:
0001-doc-More-documentation-on-regular-expressions-and-SQ.patchtext/plain; charset=UTF-8; name=0001-doc-More-documentation-on-regular-expressions-and-SQ.patchDownload
From a2dbd0e24a30b945a5d641ed773dc44f5e6b50c1 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Wed, 15 Dec 2021 11:02:59 +0100
Subject: [PATCH] doc: More documentation on regular expressions and SQL
standard
---
doc/src/sgml/func.sgml | 91 +++++++++++++++++++++++++---
src/backend/catalog/sql_features.txt | 10 +--
2 files changed, 88 insertions(+), 13 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5801299b27..e58efce586 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -7353,10 +7353,26 @@ <title>Basic Regular Expressions</title>
<!-- end re_syntax.n man page -->
<sect3 id="posix-vs-xquery">
- <title>Differences from XQuery (<literal>LIKE_REGEX</literal>)</title>
+ <title>Differences from SQL Standard and XQuery</title>
<indexterm zone="posix-vs-xquery">
- <primary><literal>LIKE_REGEX</literal></primary>
+ <primary>LIKE_REGEX</primary>
+ </indexterm>
+
+ <indexterm zone="posix-vs-xquery">
+ <primary>OCCURRENCES_REGEX</primary>
+ </indexterm>
+
+ <indexterm zone="posix-vs-xquery">
+ <primary>POSITION_REGEX</primary>
+ </indexterm>
+
+ <indexterm zone="posix-vs-xquery">
+ <primary>SUBSTRING_REGEX</primary>
+ </indexterm>
+
+ <indexterm zone="posix-vs-xquery">
+ <primary>TRANSLATE_REGEX</primary>
</indexterm>
<indexterm zone="posix-vs-xquery">
@@ -7364,16 +7380,75 @@ <title>Differences from XQuery (<literal>LIKE_REGEX</literal>)</title>
</indexterm>
<para>
- Since SQL:2008, the SQL standard includes
- a <literal>LIKE_REGEX</literal> operator that performs pattern
+ Since SQL:2008, the SQL standard includes regular expression operators
+ and functions that performs pattern
matching according to the XQuery regular expression
- standard. <productname>PostgreSQL</productname> does not yet
- implement this operator, but you can get very similar behavior using
- the <function>regexp_match()</function> function, since XQuery
- regular expressions are quite close to the ARE syntax described above.
+ standard:
+ <itemizedlist>
+ <listitem><para><literal>LIKE_REGEX</literal></para></listitem>
+ <listitem><para><literal>OCCURRENCES_REGEX</literal></para></listitem>
+ <listitem><para><literal>POSITION_REGEX</literal></para></listitem>
+ <listitem><para><literal>SUBSTRING_REGEX</literal></para></listitem>
+ <listitem><para><literal>TRANSLATE_REGEX</literal></para></listitem>
+ </itemizedlist>
+ <productname>PostgreSQL</productname> does not currently implement these
+ operators and functions. You can get approximately equivalent
+ functionality in each case as shown in <xref
+ linkend="functions-regexp-sql-table"/>. (Various optional clauses on
+ both sides have been omitted in this table.)
+ </para>
+
+ <table id="functions-regexp-sql-table">
+ <title>Regular Expression Functions Equivalencies</title>
+
+ <tgroup cols="2">
+ <thead>
+ <row>
+ <entry>SQL standard</entry>
+ <entry>PostgreSQL</entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry><literal><replaceable>string</replaceable> LIKE_REGEX <replaceable>pattern</replaceable></literal></entry>
+ <entry><literal>regexp_like(<replaceable>string</replaceable>, <replaceable>pattern</replaceable>)</literal> or <literal><replaceable>string</replaceable> ~ <replaceable>pattern</replaceable></literal></entry>
+ </row>
+
+ <row>
+ <entry><literal>OCCURRENCES_REGEX(<replaceable>pattern</replaceable> IN <replaceable>string</replaceable></literal></entry>
+ <entry><literal>regexp_count(<replaceable>string</replaceable>, <replaceable>pattern</replaceable>)</literal></entry>
+ </row>
+
+ <row>
+ <entry><literal>POSITION_REGEX(<replaceable>pattern</replaceable> IN <replaceable>string</replaceable></literal></entry>
+ <entry><literal>regexp_instr(<replaceable>string</replaceable>, <replaceable>pattern</replaceable>)</literal></entry>
+ </row>
+
+ <row>
+ <entry><literal>SUBSTRING_REGEX(<replaceable>pattern</replaceable> IN <replaceable>string</replaceable></literal></entry>
+ <entry><literal>regexp_substr(<replaceable>string</replaceable>, <replaceable>pattern</replaceable>)</literal></entry>
+ </row>
+
+ <row>
+ <entry><literal>TRANSLATE_REGEX(<replaceable>pattern</replaceable> IN <replaceable>string</replaceable> WITH <replaceable>replacement</replaceable></literal></entry>
+ <entry><literal>regexp_replace(<replaceable>string</replaceable>, <replaceable>pattern</replaceable>, <replaceable>replacement</replaceable>)</literal></entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <para>
+ Regular expression functions similar to those provided by PostgreSQL are
+ also available in a number of other SQL implementations, whereas the
+ SQL-standard functions are not as widely implemented. Some of the
+ details of the regular expression syntax will likely differ in each
+ implementation.
</para>
<para>
+ The SQL-standard operators and functions use XQuery regular expressions,
+ which are quite close to the ARE syntax described above.
Notable differences between the existing POSIX-based
regular-expression feature and XQuery regular expressions include:
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index 9f424216e2..b8a78f4d41 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -323,11 +323,11 @@ F821 Local table references NO
F831 Full cursor update NO
F831 Full cursor update 01 Updatable scrollable cursors NO
F831 Full cursor update 02 Updatable ordered cursors NO
-F841 LIKE_REGEX predicate NO
-F842 OCCURRENCES_REGEX function NO
-F843 POSITION_REGEX function NO
-F844 SUBSTRING_REGEX function NO
-F845 TRANSLATE_REGEX function NO
+F841 LIKE_REGEX predicate NO consider regexp_like()
+F842 OCCURRENCES_REGEX function NO consider regexp_matches()
+F843 POSITION_REGEX function NO consider regexp_instr()
+F844 SUBSTRING_REGEX function NO consider regexp_substr()
+F845 TRANSLATE_REGEX function NO consider regexp_replace()
F846 Octet support in regular expression operators NO
F847 Nonconstant regular expressions NO
F850 Top-level <order by clause> in <query expression> YES
--
2.34.1
Le 15/12/2021 à 13:41, Peter Eisentraut a écrit :
On 03.08.21 19:10, Tom Lane wrote:
Gilles Darold <gilles@darold.net> writes:
Sorry I have missed that, but I'm fine with this implemenation so let's
keep the v6 version of the patch and drop this one.Pushed, then. There's still lots of time to tweak the behavior of
course.I have a documentation follow-up to this. It seems that these new
functions are almost a de facto standard, whereas the SQL-standard
functions are not implemented anywhere. I propose the attached patch
to update the subsection in the pattern-matching section to give more
detail on this and suggest equivalent functions among these newly
added ones. What do you think?
I'm in favor to apply your changes to documentation. It is a good thing
to precise the relation between this implementation of the regex_*
functions and the SQL stardard.
--
Gilles Darold
On 15.12.21 14:15, Gilles Darold wrote:
Le 15/12/2021 à 13:41, Peter Eisentraut a écrit :
On 03.08.21 19:10, Tom Lane wrote:
Gilles Darold <gilles@darold.net> writes:
Sorry I have missed that, but I'm fine with this implemenation so let's
keep the v6 version of the patch and drop this one.Pushed, then. There's still lots of time to tweak the behavior of
course.I have a documentation follow-up to this. It seems that these new
functions are almost a de facto standard, whereas the SQL-standard
functions are not implemented anywhere. I propose the attached patch
to update the subsection in the pattern-matching section to give more
detail on this and suggest equivalent functions among these newly
added ones. What do you think?I'm in favor to apply your changes to documentation. It is a good thing
to precise the relation between this implementation of the regex_*
functions and the SQL stardard.
ok, done