regexp_replace not respecting greediness
With the following regular expression, the second .* seems to match non-greedily although (if I am correct) it should match greedily:
postgres=# SELECT REGEXP_REPLACE('jane.smith@example.com', '.*?@.*', 'ab');
regexp_replace
----------------
abexample.com
(1 row)
Other database systems (e.g., DuckDB, Umbra) match the whole input: https://analytics.db.in.tum.de/?q=SELECT+REGEXP_REPLACE%28%27jane.smith%40example.com%27%2C+%27.*%3F%40.*%27%2C+%27ab%27%29%3B
Affected version: PostgreSQL 17.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 15.2.1 20250813, 64-bit
Cheers,
Simon
--
Research associate
Chair for database systems
Department of Informatics
TU München Tel: +49 89 289 17276
Boltzmannstr. 3 E-Mail: simon.ellmann@tum.de
D-85748 Garching bei München, Germany
On Friday, September 19, 2025, Simon Ellmann <simon.ellmann@tum.de> wrote:
With the following regular expression, the second .* seems to match
non-greedily although (if I am correct) it should match greedily:
Working as documented in rule 6:
https://www.postgresql.org/docs/current/functions-matching.html#POSIX-MATCHING-RULES
David J.
Simon Ellmann <simon.ellmann@tum.de> writes:
With the following regular expression, the second .* seems to match non-greedily although (if I am correct) it should match greedily:
postgres=# SELECT REGEXP_REPLACE('jane.smith@example.com', '.*?@.*', 'ab');
This is correct according to the rules given at
https://www.postgresql.org/docs/current/functions-matching.html#POSIX-MATCHING-RULES
specifically that "A branch — that is, an RE that has no top-level |
operator — has the same greediness as the first quantified atom in it
that has a greediness attribute." Because of that, the RE as a whole
is non-greedy and will match the shortest not longest amount of text
overall. The discussion in that manual section shows what to do
when you don't like the results.
Other database systems (e.g., DuckDB, Umbra) match the whole input:
If your complaint is "but it's not like Perl!", I suggest using
a plperl function to do your regexp work.
regards, tom lane
Thanks for the hint to the documentation, I missed that part. This is really surprising behavior!
Cheers,
Simon
--
Research associate
Chair for database systems
Department of Informatics
TU München Tel: +49 89 289 17276
Boltzmannstr. 3 E-Mail: simon.ellmann@tum.de
D-85748 Garching bei München, Germany
On 19. Sep 2025, at 16:09, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Friday, September 19, 2025, Simon Ellmann <simon.ellmann@tum.de<mailto:simon.ellmann@tum.de>> wrote:
With the following regular expression, the second .* seems to match non-greedily although (if I am correct) it should match greedily:
Working as documented in rule 6:
https://www.postgresql.org/docs/current/functions-matching.html#POSIX-MATCHING-RULES
David J.