Trouble with regexp_matches

Started by Edson Richterover 9 years ago6 messagesgeneral
Jump to latest
#1Edson Richter
edsonrichter@hotmail.com

Dear list,

Version string PostgreSQL 9.4.10 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4), 64-bit

I’m running the query below, and it is limiting results as if “regexp_matches” being in where clause.
IMHO, it is wrong: in case there is no match, shall return null or empty array – not remove the result from the set!!!

Is this a collateral effect of using regexp_matches in columns?
If yes, shall not this information be BOLD RED FLASHING in documentation (or it is already, and some kind sould would point me where)?

-- First query (that is limiting results) ---------------------------------------------------------------------
select codigoocorrencia, datahoraocorrencia, datahoraimportacao, observacao, regexp_matches(observacao, '\d\d/\d\d/\d\d\d\d')
from batchloteocorrencia
where codigoocorrencia = '091'
and observacao is not null
order by datahoraimportacao DESC

Total results = 59

--Second query (that is not limiting results, as I did expect)-------------------------------------------------------------------
select codigoocorrencia, datahoraocorrencia, datahoraimportacao, observacao
from batchloteocorrencia
where codigoocorrencia = '091'
and observacao is not null
order by datahoraimportacao DESC

Total results = 3826

Why is that?

Regards,

Edson Richter

#2Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Edson Richter (#1)
Re: Trouble with regexp_matches

On 11/05/2016 10:01 AM, Edson Richter wrote:

Dear list,

Version string PostgreSQL 9.4.10 on x86_64-unknown-linux-gnu,
compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4), 64-bit

I’m running the query below, and it is limiting results as if
“regexp_matches” being in where clause.

IMHO, it is wrong: in case there is no match, shall return null or empty
array – not remove the result from the set!!!

Is this a collateral effect of using regexp_matches in columns?

If yes, shall not this information be BOLD RED FLASHING in documentation
(or it is already, and some kind sould would point me where)?

https://www.postgresql.org/docs/9.4/static/functions-matching.html

"It is possible to force regexp_matches() to always return one row by
using a sub-select; this is particularly useful in a SELECT target list
when you want all rows returned, even non-matching ones:

SELECT col1, (SELECT regexp_matches(col2, '(bar)(beque)')) FROM tab;
"

-- First query (that is limiting results)
---------------------------------------------------------------------

select codigoocorrencia, datahoraocorrencia, datahoraimportacao,
observacao, regexp_matches(observacao, '\d\d/\d\d/\d\d\d\d')

from batchloteocorrencia

where codigoocorrencia = '091'

and observacao is not null

order by datahoraimportacao DESC

Total results = 59

--Second query (that is not limiting results, as I did
expect)-------------------------------------------------------------------

select codigoocorrencia, datahoraocorrencia, datahoraimportacao, observacao

from batchloteocorrencia

where codigoocorrencia = '091'

and observacao is not null

order by datahoraimportacao DESC

Total results = 3826

Why is that?

Regards,

Edson Richter

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Edson Richter (#1)
Re: Trouble with regexp_matches

Edson Richter <edsonrichter@hotmail.com> writes:

I’m running the query below, and it is limiting results as if “regexp_matches” being in where clause.
IMHO, it is wrong: in case there is no match, shall return null or empty array – not remove the result from the set!!!

Well, no, because regexp_matches() returns a set. If there's no match,
there's zero rows in the set.

The standard workaround is to use a scalar sub-select, which has the
effect of converting a zero-row result into a NULL:

select codigoocorrencia, datahoraocorrencia, datahoraimportacao,
observacao, (select regexp_matches(observacao, '\d\d/\d\d/\d\d\d\d'))
from ...

As of v10 there will be a less confusing solution: use regexp_match()
instead.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Edson Richter
edsonrichter@hotmail.com
In reply to: Tom Lane (#3)
RES: Trouble with regexp_matches

-----Mensagem original-----
De: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Enviada em: sábado, 5 de novembro de 2016 15:21
Para: Edson Richter <edsonrichter@hotmail.com>
Cc: pgsql-general@postgresql.org
Assunto: Re: [GENERAL] Trouble with regexp_matches

Edson Richter <edsonrichter@hotmail.com> writes:

I’m running the query below, and it is limiting results as if

“regexp_matches” being in where clause.

IMHO, it is wrong: in case there is no match, shall return null or empty array

– not remove the result from the set!!!

Well, no, because regexp_matches() returns a set. If there's no match,
there's zero rows in the set.

For me, it is a strange behavior - putting something in select clause will restrict results as if it lies in join or where clauses.

The standard workaround is to use a scalar sub-select, which has the effect
of converting a zero-row result into a NULL:

select codigoocorrencia, datahoraocorrencia, datahoraimportacao,
observacao, (select regexp_matches(observacao, '\d\d/\d\d/\d\d\d\d'))
from ...

As of v10 there will be a less confusing solution: use regexp_match() instead.

regards, tom lane

Ok, for now, I've changed regexp_matches() to "... substring(observacao from '\d\d/\d\d/\d\d\d\d')" which does the job gracefully.

I still believe that an alert shall be made in the docs page (String functions), because seems confusing and error prone.

Thanks,

Edson Richter

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Edson Richter
edsonrichter@hotmail.com
In reply to: Adrian Klaver (#2)
RES: Trouble with regexp_matches

-----Mensagem original-----
De: Adrian Klaver [mailto:adrian.klaver@aklaver.com]
Enviada em: sábado, 5 de novembro de 2016 15:13
Para: Edson Richter <edsonrichter@hotmail.com>; pgsql-
general@postgresql.org
Assunto: Re: [GENERAL] Trouble with regexp_matches

On 11/05/2016 10:01 AM, Edson Richter wrote:

Dear list,

Version string PostgreSQL 9.4.10 on x86_64-unknown-linux-gnu,
compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4), 64-bit

I’m running the query below, and it is limiting results as if
“regexp_matches” being in where clause.

IMHO, it is wrong: in case there is no match, shall return null or
empty array – not remove the result from the set!!!

Is this a collateral effect of using regexp_matches in columns?

If yes, shall not this information be BOLD RED FLASHING in
documentation (or it is already, and some kind sould would point me

where)?

https://www.postgresql.org/docs/9.4/static/functions-matching.html

I see - I always believed that this page was related to WHERE clause or using functions in the PostgreSQL way (which is, in your turn, a alternative to "from" syntax) - not for the select clause.
But now that you mention it, and re-reading all the information, I can understand the implications.

Nevertheless, would be nice to put a huge warning at the "String functions" page about this behavior...

"It is possible to force regexp_matches() to always return one row by using a
sub-select; this is particularly useful in a SELECT target list when you want all
rows returned, even non-matching ones:

SELECT col1, (SELECT regexp_matches(col2, '(bar)(beque)')) FROM tab; "

Thanks, this worked - as well removing the "regexp_matches" and using "SUBSTRING( text FROM pattern)".

I really appreciate your help.

Kind regards,

Edson Richter

-- First query (that is limiting results)
---------------------------------------------------------------------

select codigoocorrencia, datahoraocorrencia, datahoraimportacao,
observacao, regexp_matches(observacao, '\d\d/\d\d/\d\d\d\d')

from batchloteocorrencia

where codigoocorrencia = '091'

and observacao is not null

order by datahoraimportacao DESC

Total results = 59

--Second query (that is not limiting results, as I did
expect)---------------------------------------------------------------
----

select codigoocorrencia, datahoraocorrencia, datahoraimportacao,
observacao

from batchloteocorrencia

where codigoocorrencia = '091'

and observacao is not null

order by datahoraimportacao DESC

Total results = 3826

Why is that?

Regards,

Edson Richter

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Edson Richter (#5)
Re: RES: Trouble with regexp_matches

On 11/05/2016 11:56 AM, Edson Richter wrote:

-----Mensagem original-----
De: Adrian Klaver [mailto:adrian.klaver@aklaver.com]
Enviada em: sábado, 5 de novembro de 2016 15:13
Para: Edson Richter <edsonrichter@hotmail.com>; pgsql-
general@postgresql.org
Assunto: Re: [GENERAL] Trouble with regexp_matches

On 11/05/2016 10:01 AM, Edson Richter wrote:

Dear list,

Version string PostgreSQL 9.4.10 on x86_64-unknown-linux-gnu,
compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4), 64-bit

I’m running the query below, and it is limiting results as if
“regexp_matches” being in where clause.

IMHO, it is wrong: in case there is no match, shall return null or
empty array – not remove the result from the set!!!

Is this a collateral effect of using regexp_matches in columns?

If yes, shall not this information be BOLD RED FLASHING in
documentation (or it is already, and some kind sould would point me

where)?

https://www.postgresql.org/docs/9.4/static/functions-matching.html

I see - I always believed that this page was related to WHERE clause or using functions in the PostgreSQL way (which is, in your turn, a alternative to "from" syntax) - not for the select clause.
But now that you mention it, and re-reading all the information, I can understand the implications.

Nevertheless, would be nice to put a huge warning at the "String functions" page about this behavior...

Well the above page is pointed to in the string functions section:

https://www.postgresql.org/docs/9.4/static/functions-string.html

"regexp_matches(string text, pattern text [, flags text])
setof text[] Return all captured substrings resulting from matching a
POSIX regular expression against the string. See Section 9.7.3 for more
information."

Where 'Section 9.7.3' is a link to it. Not sure if that counts as a
warning though:)

"It is possible to force regexp_matches() to always return one row by using a
sub-select; this is particularly useful in a SELECT target list when you want all
rows returned, even non-matching ones:

SELECT col1, (SELECT regexp_matches(col2, '(bar)(beque)')) FROM tab; "

Thanks, this worked - as well removing the "regexp_matches" and using "SUBSTRING( text FROM pattern)".

I really appreciate your help.

Kind regards,

Edson Richter

-- First query (that is limiting results)
---------------------------------------------------------------------

select codigoocorrencia, datahoraocorrencia, datahoraimportacao,
observacao, regexp_matches(observacao, '\d\d/\d\d/\d\d\d\d')

from batchloteocorrencia

where codigoocorrencia = '091'

and observacao is not null

order by datahoraimportacao DESC

Total results = 59

--Second query (that is not limiting results, as I did
expect)---------------------------------------------------------------
----

select codigoocorrencia, datahoraocorrencia, datahoraimportacao,
observacao

from batchloteocorrencia

where codigoocorrencia = '091'

and observacao is not null

order by datahoraimportacao DESC

Total results = 3826

Why is that?

Regards,

Edson Richter

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general