textlike under the LIKE operator for char(n)

Started by Kohei KaiGaiover 9 years ago3 messages
#1Kohei KaiGai
kaigai@kaigai.gr.jp

Hi,

I found a mysterious behavior when we use LIKE operator on char(n) data type.

postgres=# select 'abcd'::char(20) LIKE 'ab%cd';
?column?
----------
f
(1 row)

postgres=# select 'abcd'::char(4) LIKE 'ab%cd';
?column?
----------
t
(1 row)

LIKE operator (that is eventually processed by textlike) considers the
padding space of char(n) data type as a part of string.

On the other hands, equal operator ignores the padding space when it
compares two strings.

postgres=# select 'abcd'::char(20) = 'abcd';
?column?
----------
t
(1 row)

postgres=# select 'abcd'::char(4) = 'abcd';
?column?
----------
t
(1 row)

The LIKE operator on char(n) data type is implemented by textlike().

at pg_proc.h:
DATA(insert OID = 1631 ( bpcharlike PGNSP PGUID 12 1 0 0 0 f f
f f t f i s 2 0 16 "1042 25" _null_ _null_ _null_ _null_ _null_
textlike _null_ _null_ _null_ ));

It calls GenericMatchText() with length of the target string,
calculated by VARSIZE_ANY_EXHDR, however, it includes the padding
space.
It seems to me bcTruelen() gives the correct length for char(n) data
types, instead of this macro.

Is this behavior as expected? or, bug?

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Kevin Grittner
kgrittn@gmail.com
In reply to: Kohei KaiGai (#1)
Re: textlike under the LIKE operator for char(n)

On Fri, May 6, 2016 at 8:58 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:

postgres=# select 'abcd'::char(20) LIKE 'ab%cd';
?column?
----------
f
(1 row)

postgres=# select 'abcd'::char(4) LIKE 'ab%cd';
?column?
----------
t
(1 row)

LIKE operator (that is eventually processed by textlike) considers the
padding space of char(n) data type as a part of string.

The SQL standard generally requires this for CHAR(n) columns.

On the other hands, equal operator ignores the padding space when it
compares two strings.

postgres=# select 'abcd'::char(20) = 'abcd';
?column?
----------
t
(1 row)

postgres=# select 'abcd'::char(4) = 'abcd';
?column?
----------
t
(1 row)

The SQL standard specifically requires this exception to the
general rule.

Is this behavior as expected? or, bug?

This has been discussed on community lists multiple times in the
past; you might want to search the archives. I'm not inclined to
dig through the standard for details on this point again right now,
but in general the behaviors we provide for CHAR(n) are mandated by
standard. It would not entirely shock me if there are some corner
cases where different behavior could be allowed or even more
correct, but my recollection is that what you have shown is all
required to work that way.

Generally, I recommend avoiding CHAR(n) columns like the plague.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#3Kohei KaiGai
kaigai@kaigai.gr.jp
In reply to: Kevin Grittner (#2)
Re: textlike under the LIKE operator for char(n)

2016-05-06 23:17 GMT+09:00 Kevin Grittner <kgrittn@gmail.com>:

On Fri, May 6, 2016 at 8:58 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:

postgres=# select 'abcd'::char(20) LIKE 'ab%cd';
?column?
----------
f
(1 row)

postgres=# select 'abcd'::char(4) LIKE 'ab%cd';
?column?
----------
t
(1 row)

LIKE operator (that is eventually processed by textlike) considers the
padding space of char(n) data type as a part of string.

The SQL standard generally requires this for CHAR(n) columns.

On the other hands, equal operator ignores the padding space when it
compares two strings.

postgres=# select 'abcd'::char(20) = 'abcd';
?column?
----------
t
(1 row)

postgres=# select 'abcd'::char(4) = 'abcd';
?column?
----------
t
(1 row)

The SQL standard specifically requires this exception to the
general rule.

Is this behavior as expected? or, bug?

This has been discussed on community lists multiple times in the
past; you might want to search the archives. I'm not inclined to
dig through the standard for details on this point again right now,
but in general the behaviors we provide for CHAR(n) are mandated by
standard. It would not entirely shock me if there are some corner
cases where different behavior could be allowed or even more
correct, but my recollection is that what you have shown is all
required to work that way.

Thanks, I couldn't find out the reason of the behavior shortly.
Requirement by SQL standard is a clear guidance even if it looks
a bit mysterious.

Generally, I recommend avoiding CHAR(n) columns like the plague.

Yep, I agree. I found this matter when I port LIKE operator on GPU,
not a time when some real-life query tried to use char(n).

Best regards,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers