BUG #18956: Observing an issue in regexp_count()

Started by PG Bug reporting form10 months ago4 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 18956
Logged by: Anudeep Vattikonda
Email address: anudeepvattikonda0404@gmail.com
PostgreSQL version: 17.5
Operating system: Mac
Description:

Hi
I am trying to run the below query
select REGEXP_COUNT('cat at the flat', '\Bat\b') ;
I was expecting it to return 2 but I see Postgres is returning 0. I see that
there are two matches, cat and flat. All it should do is to look for the
word at whose left side shoudn't be a word boundary while the right side
should be a word boundary
Thank you

In reply to: PG Bug reporting form (#1)
Re: BUG #18956: Observing an issue in regexp_count()

On Thu, Jun 12, 2025 at 08:03:25AM +0000, PG Bug reporting form wrote:

The following bug has been logged on the website:

Bug reference: 18956
Logged by: Anudeep Vattikonda
Email address: anudeepvattikonda0404@gmail.com
PostgreSQL version: 17.5
Operating system: Mac
Description:

Hi
I am trying to run the below query
select REGEXP_COUNT('cat at the flat', '\Bat\b') ;
I was expecting it to return 2 but I see Postgres is returning 0. I see that
there are two matches, cat and flat. All it should do is to look for the
word at whose left side shoudn't be a word boundary while the right side
should be a word boundary

What makes you think that \B/\b has anything to do with word boundary?

Docs
(https://www.postgresql.org/docs/17/functions-matching.html#FUNCTIONS-POSIX-REGEXP)
show:

\b - backspace, as in C
\B - synonym for backslash (\) to help reduce the need for backslash doubling

As far as I can tell pg regexps have nothing related to word boundaries.

You could get 2 by changing the regexp to something that actually works:

$ select REGEXP_COUNT('cat at the flat', '[a-z]at(?![a-z])');
regexp_count
──────────────
2
(1 row)

Best regards,

depesz

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: hubert depesz lubaczewski (#2)
Re: BUG #18956: Observing an issue in regexp_count()

hubert depesz lubaczewski <depesz@depesz.com> writes:

On Thu, Jun 12, 2025 at 08:03:25AM +0000, PG Bug reporting form wrote:

I am trying to run the below query
select REGEXP_COUNT('cat at the flat', '\Bat\b') ;
I was expecting it to return 2 but I see Postgres is returning 0. I see that
there are two matches, cat and flat. All it should do is to look for the
word at whose left side shoudn't be a word boundary while the right side
should be a word boundary

What makes you think that \B/\b has anything to do with word boundary?

Indeed, they do not.

As far as I can tell pg regexps have nothing related to word boundaries.

Sure we do, see "Regular Expression Constraint Escapes":

https://www.postgresql.org/docs/current/functions-matching.html#POSIX-CONSTRAINT-ESCAPES-TABLE

Unfortunately, since these are all way outside the POSIX regexp
standard, different systems have implemented these extensions
differently. I don't doubt that \B/\b mean word boundaries
in some other system.

regards, tom lane

In reply to: Tom Lane (#3)
Re: BUG #18956: Observing an issue in regexp_count()

On Thu, Jun 12, 2025 at 09:54:46AM -0400, Tom Lane wrote:

hubert depesz lubaczewski <depesz@depesz.com> writes:

On Thu, Jun 12, 2025 at 08:03:25AM +0000, PG Bug reporting form wrote:

I am trying to run the below query
select REGEXP_COUNT('cat at the flat', '\Bat\b') ;
I was expecting it to return 2 but I see Postgres is returning 0. I see that
there are two matches, cat and flat. All it should do is to look for the
word at whose left side shoudn't be a word boundary while the right side
should be a word boundary

What makes you think that \B/\b has anything to do with word boundary?

Indeed, they do not.

As far as I can tell pg regexps have nothing related to word boundaries.

Sure we do, see "Regular Expression Constraint Escapes":

https://www.postgresql.org/docs/current/functions-matching.html#POSIX-CONSTRAINT-ESCAPES-TABLE

Unfortunately, since these are all way outside the POSIX regexp
standard, different systems have implemented these extensions
differently. I don't doubt that \B/\b mean word boundaries
in some other system.

Oh, Missed that. Thanks.

So the regexp can be rewritten to:

=$ select REGEXP_COUNT('cat at the flat', '\Yat\M');
regexp_count
──────────────
2
(1 row)

Best regards,

depesz