regexp_replace not respecting greediness

Started by Simon Ellmann7 months ago4 messagesbugs
Jump to latest
#1Simon Ellmann
simon.ellmann@tum.de

With the following regular expression, the second .* seems to match non-greedily although (if I am correct) it should match greedily:

postgres=# SELECT REGEXP_REPLACE('jane.smith@example.com', '.*?@.*', 'ab');
regexp_replace
----------------
abexample.com
(1 row)

Other database systems (e.g., DuckDB, Umbra) match the whole input: https://analytics.db.in.tum.de/?q=SELECT+REGEXP_REPLACE%28%27jane.smith%40example.com%27%2C+%27.*%3F%40.*%27%2C+%27ab%27%29%3B

Affected version: PostgreSQL 17.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 15.2.1 20250813, 64-bit

Cheers,
Simon

--

Research associate
Chair for database systems
Department of Informatics
TU München Tel: +49 89 289 17276
Boltzmannstr. 3 E-Mail: simon.ellmann@tum.de
D-85748 Garching bei München, Germany

#2David G. Johnston
david.g.johnston@gmail.com
In reply to: Simon Ellmann (#1)
Re: regexp_replace not respecting greediness

On Friday, September 19, 2025, Simon Ellmann <simon.ellmann@tum.de> wrote:

With the following regular expression, the second .* seems to match
non-greedily although (if I am correct) it should match greedily:

Working as documented in rule 6:

https://www.postgresql.org/docs/current/functions-matching.html#POSIX-MATCHING-RULES

David J.

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Ellmann (#1)
Re: regexp_replace not respecting greediness

Simon Ellmann <simon.ellmann@tum.de> writes:

With the following regular expression, the second .* seems to match non-greedily although (if I am correct) it should match greedily:
postgres=# SELECT REGEXP_REPLACE('jane.smith@example.com', '.*?@.*', 'ab');

This is correct according to the rules given at

https://www.postgresql.org/docs/current/functions-matching.html#POSIX-MATCHING-RULES

specifically that "A branch — that is, an RE that has no top-level |
operator — has the same greediness as the first quantified atom in it
that has a greediness attribute." Because of that, the RE as a whole
is non-greedy and will match the shortest not longest amount of text
overall. The discussion in that manual section shows what to do
when you don't like the results.

Other database systems (e.g., DuckDB, Umbra) match the whole input:

If your complaint is "but it's not like Perl!", I suggest using
a plperl function to do your regexp work.

regards, tom lane

#4Simon Ellmann
simon.ellmann@tum.de
In reply to: David G. Johnston (#2)
Re: regexp_replace not respecting greediness

Thanks for the hint to the documentation, I missed that part. This is really surprising behavior!

Cheers,
Simon

--

Research associate
Chair for database systems
Department of Informatics
TU München Tel: +49 89 289 17276
Boltzmannstr. 3 E-Mail: simon.ellmann@tum.de
D-85748 Garching bei München, Germany

On 19. Sep 2025, at 16:09, David G. Johnston <david.g.johnston@gmail.com> wrote:

On Friday, September 19, 2025, Simon Ellmann <simon.ellmann@tum.de<mailto:simon.ellmann@tum.de>> wrote:
With the following regular expression, the second .* seems to match non-greedily although (if I am correct) it should match greedily:

Working as documented in rule 6:

https://www.postgresql.org/docs/current/functions-matching.html#POSIX-MATCHING-RULES

David J.