regexp_matches bug in 9.3.4 and 9.4.1

Started by Jeff Certainalmost 11 years ago6 messagesbugs
Jump to latest
#1Jeff Certain
gcertain@dynamicsignal.com

Repro:
drop table if exists regex_test;
create table regex_test(filter citext);
insert into regex_test(filter) values ('Filter Filter Filter');
select filter, regexp_matches(filter, 'Filter', 'g') from regex_test;
select filter, regexp_matches('Filter Filter Filter', 'Filter', 'g') from regex_test;

Expected result:
Both select statements should return the same number of rows. Specifically, in this case, I expect to get back 3 rows.

Actual result:
When referencing the filter column, only one row is ever returned. When supplying the string that is the same as the data in the filter column, the correct number of rows are returned.

Result 1 (incorrect):
select filter, regexp_matches(filter, 'Filter', 'g') from regex_test;
Filter Filter Filter {Filter}

Result 2 (correct):
select filter, regexp_matches('Filter Filter Filter', 'Filter', 'g') from regex_test;
Filter Filter Filter {Filter}
Filter Filter Filter {Filter}
Filter Filter Filter {Filter}

Versions
PostgreSQL 9.3.4, compiled by Visual C++ build 1600, 64-bit
PostgreSQL 9.4.1 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-11), 64-bit

#2David G. Johnston
david.g.johnston@gmail.com
In reply to: Jeff Certain (#1)
Re: regexp_matches bug in 9.3.4 and 9.4.1

Please note...

On Mon, May 4, 2015 at 9:56 AM, Jeff Certain <gcertain@dynamicsignal.com>
wrote:

Repro:

drop table if exists regex_test;

create table regex_test(filter citext);

insert into regex_test(filter) values ('Filter Filter Filter');

select filter, regexp_matches(filter, 'Filter', 'g') from regex_test;

​"filter" is of type "citext"​

select filter, regexp_matches('Filter Filter Filter', 'Filter', 'g') from

regex_test;

​'Filter Filter Filter' is of type "text"​

Expected result:

Both select statements should return the same number of rows.
Specifically, in this case, I expect to get back 3 rows.

​The following non-citext version of your first query works just fine on 9.3

​select filter, regexp_matches(filter, 'Filter', 'g') from (values ('Filter
Filter Filter'::text)) regex_test (filter);

​regexp_matches is not mentioned on the following page in the documentation:

http://www.postgresql.org/docs/9.4/interactive/citext.html

Actual result:

When referencing the filter column, only one row is ever returned. When
supplying the string that is the same as the data in the filter column, the
correct number of rows are returned.

Result 1 (incorrect):

select filter, regexp_matches(filter, 'Filter', 'g') from regex_test;

Filter Filter Filter {Filter}

Result 2 (correct):

select filter, regexp_matches('Filter Filter Filter', 'Filter', 'g') from
regex_test;

Filter Filter Filter {Filter}

Filter Filter Filter {Filter}

Filter Filter Filter {Filter}

​This fails on 9.3 if the citext type is used instead of text.

Seems the however the citext version of regexp_matches is invoked it
ignores the global flag. Haven't explored deeper.

David J.

#3Jeff Certain
gcertain@dynamicsignal.com
In reply to: David G. Johnston (#2)
Re: regexp_matches bug in 9.3.4 and 9.4.1

Thanks for the quick response.

From my end of things, I would expect either a) an exception; b) 3 matches; or c) 0 matches. The current behavior (returning one match and ignoring the global flag) strikes me as being misleading and challenging to debug.

Jeff

From: David G. Johnston [mailto:david.g.johnston@gmail.com]
Sent: Monday, May 4, 2015 3:37 PM
To: Jeff Certain
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] regexp_matches bug in 9.3.4 and 9.4.1

Please note...

On Mon, May 4, 2015 at 9:56 AM, Jeff Certain <gcertain@dynamicsignal.com<mailto:gcertain@dynamicsignal.com>> wrote:
Repro:
drop table if exists regex_test;
create table regex_test(filter citext);
insert into regex_test(filter) values ('Filter Filter Filter');
select filter, regexp_matches(filter, 'Filter', 'g') from regex_test;

​"filter" is of type "citext"​

select filter, regexp_matches('Filter Filter Filter', 'Filter', 'g') from regex_test;

​'Filter Filter Filter' is of type "text"​

Expected result:
Both select statements should return the same number of rows. Specifically, in this case, I expect to get back 3 rows.

​The following non-citext version of your first query works just fine on 9.3

​select filter, regexp_matches(filter, 'Filter', 'g') from (values ('Filter Filter Filter'::text)) regex_test (filter);

​regexp_matches is not mentioned on the following page in the documentation:

http://www.postgresql.org/docs/9.4/interactive/citext.html

Actual result:
When referencing the filter column, only one row is ever returned. When supplying the string that is the same as the data in the filter column, the correct number of rows are returned.

Result 1 (incorrect):
select filter, regexp_matches(filter, 'Filter', 'g') from regex_test;
Filter Filter Filter {Filter}

Result 2 (correct):
select filter, regexp_matches('Filter Filter Filter', 'Filter', 'g') from regex_test;
Filter Filter Filter {Filter}
Filter Filter Filter {Filter}
Filter Filter Filter {Filter}

​This fails on 9.3 if the citext type is used instead of text.

Seems the however the citext version of regexp_matches is invoked it ignores the global flag. Haven't explored deeper.

David J.

#4David G. Johnston
david.g.johnston@gmail.com
In reply to: Jeff Certain (#3)
Re: regexp_matches bug in 9.3.4 and 9.4.1

Please don't top-post...

On Mon, May 4, 2015 at 3:41 PM, Jeff Certain <gcertain@dynamicsignal.com>
wrote:

Thanks for the quick response.

From my end of things, I would expect either a) an exception; b) 3
matches; or c) 0 matches. The current behavior (returning one match and
ignoring the global flag) strikes me as being misleading and challenging to
debug.

​Unless someone tries to argue that this is not a bug the solution will be
"b) 3 matches". I stopped where I did because I'm not a -hacker.

David J.

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: David G. Johnston (#2)
Re: regexp_matches bug in 9.3.4 and 9.4.1

"David G. Johnston" <david.g.johnston@gmail.com> writes:

Seems the however the citext version of regexp_matches is invoked it
ignores the global flag. Haven't explored deeper.

Hm ...

regression=# create extension citext;
CREATE EXTENSION
regression=# \df regexp_matches
List of functions
Schema | Name | Result data type | Argument data types | Type
------------+----------------+------------------+----------------------+--------
pg_catalog | regexp_matches | SETOF text[] | text, text | normal
pg_catalog | regexp_matches | SETOF text[] | text, text, text | normal
public | regexp_matches | text[] | citext, citext | normal
public | regexp_matches | text[] | citext, citext, text | normal
(4 rows)

Looks like somebody just fat-fingered the declaration of citext's
regexp_matches() functions: they should be marked proretset and
they ain't. Sigh. I wonder if any of the other ones are wrong.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Certain (#1)
Re: regexp_matches bug in 9.3.4 and 9.4.1

Jeff Certain <gcertain@dynamicsignal.com> writes:

create table regex_test(filter citext);
insert into regex_test(filter) values ('Filter Filter Filter');
select filter, regexp_matches(filter, 'Filter', 'g') from regex_test;
select filter, regexp_matches('Filter Filter Filter', 'Filter', 'g') from regex_test;

Expected result:
Both select statements should return the same number of rows. Specifically, in this case, I expect to get back 3 rows.

Actual result:
When referencing the filter column, only one row is ever returned. When supplying the string that is the same as the data in the filter column, the correct number of rows are returned.

Fixed, although after discussion we concluded that we should not change
the default behavior in the back branches. See commit log message:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=b1ec45994e5108d734c45876c25593823fcf8644

Thanks for the report!

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs