strpos behavior change around empty substring in PG12

Started by Shay Rojanskyabout 6 years ago5 messages
#1Shay Rojansky
roji@roji.org

Greetings hackers,

Before PG12, select strpos('test', '') returns 1 (empty substring found at
first position of the string), whereas starting with PG12 it returns 0
(empty substring not found).

Is this behavior change intentional? If so, it doesn't seem to be
documented in the release notes...

First raised by Austin Drenski in
https://github.com/npgsql/efcore.pg/pull/1068#issuecomment-546795826

Thanks,

Shay

#2Robert Haas
robertmhaas@gmail.com
In reply to: Shay Rojansky (#1)
Re: strpos behavior change around empty substring in PG12

On Mon, Oct 28, 2019 at 11:02 AM Shay Rojansky <roji@roji.org> wrote:

Before PG12, select strpos('test', '') returns 1 (empty substring found at first position of the string), whereas starting with PG12 it returns 0 (empty substring not found).

Is this behavior change intentional? If so, it doesn't seem to be documented in the release notes...

First raised by Austin Drenski in https://github.com/npgsql/efcore.pg/pull/1068#issuecomment-546795826

It looks to me like this got broken here:

commit 9556aa01c69a26ca726d8dda8e395acc7c1e30fc
Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Fri Jan 25 16:25:05 2019 +0200

Use single-byte Boyer-Moore-Horspool search even with multibyte encodings.

Not sure what happened exactly.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#2)
Re: strpos behavior change around empty substring in PG12

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Oct 28, 2019 at 11:02 AM Shay Rojansky <roji@roji.org> wrote:

Before PG12, select strpos('test', '') returns 1 (empty substring found at first position of the string), whereas starting with PG12 it returns 0 (empty substring not found).

It looks to me like this got broken here:

commit 9556aa01c69a26ca726d8dda8e395acc7c1e30fc
Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Fri Jan 25 16:25:05 2019 +0200
Use single-byte Boyer-Moore-Horspool search even with multibyte encodings.

Not sure what happened exactly.

I think the problem is lack of clarity about the edge cases.
The patch added this short-circuit right at the top of text_position():

+   if (VARSIZE_ANY_EXHDR(t1) < 1 || VARSIZE_ANY_EXHDR(t2) < 1)
+       return 0;

and as this example shows, that's the Wrong Thing. Fortunately,
it also seems easily fixed.

regards, tom lane

#4Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Tom Lane (#3)
Re: strpos behavior change around empty substring in PG12

On 28/10/2019 17:57, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Oct 28, 2019 at 11:02 AM Shay Rojansky <roji@roji.org> wrote:

Before PG12, select strpos('test', '') returns 1 (empty substring found at first position of the string), whereas starting with PG12 it returns 0 (empty substring not found).

It looks to me like this got broken here:

commit 9556aa01c69a26ca726d8dda8e395acc7c1e30fc
Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Fri Jan 25 16:25:05 2019 +0200
Use single-byte Boyer-Moore-Horspool search even with multibyte encodings.

Not sure what happened exactly.

I think the problem is lack of clarity about the edge cases.
The patch added this short-circuit right at the top of text_position():

+   if (VARSIZE_ANY_EXHDR(t1) < 1 || VARSIZE_ANY_EXHDR(t2) < 1)
+       return 0;

and as this example shows, that's the Wrong Thing. Fortunately,
it also seems easily fixed.

Tom fixed this in commit bd1ef5799b; thanks!

To be sure, I also checked the SQL standard for what POSITION('' IN
'test') is supposed to return. It agrees that 1 is correct:

If CHAR_LENGTH(CVE1) is 0 (zero), then the result is 1 (one).

- Heikki

#5Shay Rojansky
roji@roji.org
In reply to: Tom Lane (#3)
Re: strpos behavior change around empty substring in PG12

Thanks for the quick turnaround!

Tom Lane <tgl@sss.pgh.pa.us> schrieb am Mo., 28. Okt. 2019, 16:57:

Show quoted text

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Oct 28, 2019 at 11:02 AM Shay Rojansky <roji@roji.org> wrote:

Before PG12, select strpos('test', '') returns 1 (empty substring found

at first position of the string), whereas starting with PG12 it returns 0
(empty substring not found).

It looks to me like this got broken here:

commit 9556aa01c69a26ca726d8dda8e395acc7c1e30fc
Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Fri Jan 25 16:25:05 2019 +0200
Use single-byte Boyer-Moore-Horspool search even with multibyte

encodings.

Not sure what happened exactly.

I think the problem is lack of clarity about the edge cases.
The patch added this short-circuit right at the top of text_position():

+   if (VARSIZE_ANY_EXHDR(t1) < 1 || VARSIZE_ANY_EXHDR(t2) < 1)
+       return 0;

and as this example shows, that's the Wrong Thing. Fortunately,
it also seems easily fixed.

regards, tom lane