length coerce for bpchar is broken since 7.0

Started by Tatsuo Ishiiabout 25 years ago15 messages
#1Tatsuo Ishii
t-ishii@sra.co.jp

It seems the length coerce for bpchar is broken since 7.0.
In 6.5 when a string is inserted, bpchar() is called to properly clip
the string. However in 7.0 (and probably current) bpchar() is not
called anymore.

coerce_type_typmod() calls exprTypmod(). exprTypmod() returns VARSIZE
of the bpchar data only if the data type is bpchar (if the data type
is varchar, exprTypmod just returns -1 and the parser add a function
node to call varchar(). so there is no problem for varchar). If
VARSIZE returned from exprTypmod() and atttypmod passed to
coerce_type_typmod() is equal, the function node to call bpchar()
would not be added.

I'm not sure if this was an intended efect of the change. Anyway we
have to do the length coerce for bpchar somewhere and I'm thinking now
is doing in bpcharin(). This would also solve the problem in copy in a
mail I have posted.

Comments?
--
Tatsuo Ishii

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tatsuo Ishii (#1)
Re: length coerce for bpchar is broken since 7.0

Tatsuo Ishii <t-ishii@sra.co.jp> writes:

If VARSIZE returned from exprTypmod() and atttypmod passed to
coerce_type_typmod() is equal, the function node to call bpchar()
would not be added.

Um, what's wrong with that? Seems to me that parse_coerce is doing
exactly what it's supposed to, ie, adding only length coercions
that are needed.

regards, tom lane

#3Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Tom Lane (#2)
Re: length coerce for bpchar is broken since 7.0

If VARSIZE returned from exprTypmod() and atttypmod passed to
coerce_type_typmod() is equal, the function node to call bpchar()
would not be added.

Um, what's wrong with that? Seems to me that parse_coerce is doing
exactly what it's supposed to, ie, adding only length coercions
that are needed.

Simply clipping multibyte strings by atttypmode might produce
incorrect multibyte strings. Consider a case inserting 3 multibyte
letters (each consisting of 2 bytes) into a char(5) column.

Or this kind of consideration should be in bpcharin() as I said in the
earilier mail?
--
Tatsuo Ishii

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tatsuo Ishii (#3)
Re: length coerce for bpchar is broken since 7.0

Tatsuo Ishii <t-ishii@sra.co.jp> writes:

If VARSIZE returned from exprTypmod() and atttypmod passed to
coerce_type_typmod() is equal, the function node to call bpchar()
would not be added.

Um, what's wrong with that? Seems to me that parse_coerce is doing
exactly what it's supposed to, ie, adding only length coercions
that are needed.

Simply clipping multibyte strings by atttypmode might produce
incorrect multibyte strings. Consider a case inserting 3 multibyte
letters (each consisting of 2 bytes) into a char(5) column.

It seems to me that this means that atttypmod or exprTypmod() isn't
correctly defined for MULTIBYTE char(n) values. We should define
typmod in such a way that they agree iff the string is correctly
clipped. This might be easier said than done, perhaps, but I don't
like the idea of having to apply length-coercion functions all the
time because we can't figure out whether they're needed or not.

regards, tom lane

#5Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Tom Lane (#4)
Re: length coerce for bpchar is broken since 7.0

Simply clipping multibyte strings by atttypmode might produce
incorrect multibyte strings. Consider a case inserting 3 multibyte
letters (each consisting of 2 bytes) into a char(5) column.

It seems to me that this means that atttypmod or exprTypmod() isn't
correctly defined for MULTIBYTE char(n) values. We should define
typmod in such a way that they agree iff the string is correctly
clipped. This might be easier said than done, perhaps, but I don't
like the idea of having to apply length-coercion functions all the
time because we can't figure out whether they're needed or not.

Before going further, may I ask you a question. Why in exprTypmod() is
bpchar() treated differently from other data types such as varchar?

switch (con->consttype)
{
case BPCHAROID:
if (!con->constisnull)
return VARSIZE(DatumGetPointer(con->constvalue));
break;
default:
break;
}

--
Tatsuo Ishii

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tatsuo Ishii (#5)
Re: length coerce for bpchar is broken since 7.0

Tatsuo Ishii <t-ishii@sra.co.jp> writes:

Before going further, may I ask you a question. Why in exprTypmod() is
bpchar() treated differently from other data types such as varchar?

It's just hardwired knowledge about that particular datatype. In the
light of your comments, it seems clear that the code here is wrong
for the MULTIBYTE case: instead of plain VARSIZE(), it should be
returning the number of multibyte characters + 4 (or whatever
atttypmod is defined to mean for MULTIBYTE bpchar). I think I wrote
this code to start with, so you can blame me for the fact that it
neglects the MULTIBYTE case :-(

regards, tom lane

#7Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Tom Lane (#6)
Re: length coerce for bpchar is broken since 7.0

It's just hardwired knowledge about that particular datatype. In the
light of your comments, it seems clear that the code here is wrong
for the MULTIBYTE case: instead of plain VARSIZE(), it should be
returning the number of multibyte characters + 4 (or whatever
atttypmod is defined to mean for MULTIBYTE bpchar). I think I wrote
this code to start with, so you can blame me for the fact that it
neglects the MULTIBYTE case :-(

I'm going to fix the problem by changing bpcharin() rather than
changing exprTypmod(). Surely we could fix the problem by changing
exprTypmod() for INSERT, however, we could not fix the similar problem
for COPY FROM in the same way. Changing bpcharin() would solve
problems of both INSERT and COPY FROM. So bpcharin() seems more
appropreate place to fix both problems.
--
Tatsuo Ishii

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tatsuo Ishii (#7)
Re: length coerce for bpchar is broken since 7.0

Tatsuo Ishii <t-ishii@sra.co.jp> writes:

I'm going to fix the problem by changing bpcharin() rather than
changing exprTypmod(). Surely we could fix the problem by changing
exprTypmod() for INSERT, however, we could not fix the similar problem
for COPY FROM in the same way. Changing bpcharin() would solve
problems of both INSERT and COPY FROM. So bpcharin() seems more
appropreate place to fix both problems.

bpcharin() will most definitely NOT fix the problem, because it often
will not know the target column's typmod, if indeed there is an
identifiable target column at all. I agree that it's a good solution
for COPY FROM, but you need to fix exprTypmod() too.

regards, tom lane

#9Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Tom Lane (#8)
Re: length coerce for bpchar is broken since 7.0

Tatsuo Ishii <t-ishii@sra.co.jp> writes:

I'm going to fix the problem by changing bpcharin() rather than
changing exprTypmod(). Surely we could fix the problem by changing
exprTypmod() for INSERT, however, we could not fix the similar problem
for COPY FROM in the same way. Changing bpcharin() would solve
problems of both INSERT and COPY FROM. So bpcharin() seems more
appropreate place to fix both problems.

bpcharin() will most definitely NOT fix the problem, because it often
will not know the target column's typmod, if indeed there is an
identifiable target column at all.

Can you give me any example for this case?

I agree that it's a good solution
for COPY FROM, but you need to fix exprTypmod() too.

Anyway, I'm gotoing to fix exprTypmod() also.
--
Tatsuo Ishii

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tatsuo Ishii (#9)
Re: length coerce for bpchar is broken since 7.0

Tatsuo Ishii <t-ishii@sra.co.jp> writes:

bpcharin() will most definitely NOT fix the problem, because it often
will not know the target column's typmod, if indeed there is an
identifiable target column at all.

Can you give me any example for this case?

UPDATE foo SET bpcharcol = 'a'::char || 'b'::char;

UPDATE foo SET bpcharcol = upper('abc');

In the first case bpcharin() will be invoked, but not in the context
of direct assignment to a table column, so it won't receive a valid
typmod. In the second case bpcharin() will never be invoked at all,
because upper takes and returns text --- so 'abc' is not a bpchar
constant but a text constant. You have to be sure that the parser
handles type length coercion correctly, and I think the cleanest way to
do that is to fix exprTypmod so that it knows how typmod is defined in
the MULTIBYTE case.

regards, tom lane

#11Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Tom Lane (#10)
Re: length coerce for bpchar is broken since 7.0

Tatsuo Ishii <t-ishii@sra.co.jp> writes:

bpcharin() will most definitely NOT fix the problem, because it often
will not know the target column's typmod, if indeed there is an
identifiable target column at all.

Can you give me any example for this case?

UPDATE foo SET bpcharcol = 'a'::char || 'b'::char;

UPDATE foo SET bpcharcol = upper('abc');

In the first case bpcharin() will be invoked, but not in the context
of direct assignment to a table column, so it won't receive a valid
typmod. In the second case bpcharin() will never be invoked at all,
because upper takes and returns text --- so 'abc' is not a bpchar
constant but a text constant. You have to be sure that the parser
handles type length coercion correctly, and I think the cleanest way to
do that is to fix exprTypmod so that it knows how typmod is defined in
the MULTIBYTE case.

In those cases above bpchar() will be called anyway, so I don't see
MULTIBYTE length coerce problems there.
--
Tatsuo Ishii

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tatsuo Ishii (#11)
Re: length coerce for bpchar is broken since 7.0

Tatsuo Ishii <t-ishii@sra.co.jp> writes:

Can you give me any example for this case?

UPDATE foo SET bpcharcol = 'a'::char || 'b'::char;

UPDATE foo SET bpcharcol = upper('abc');

In those cases above bpchar() will be called anyway, so I don't see
MULTIBYTE length coerce problems there.

So it will, but *only* because the parser realizes that it needs to
add a call to bpchar(). If exprTypmod returns incorrect values then
it's possible that the parser would wrongly decide it didn't need to
call bpchar().

regards, tom lane

#13Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tatsuo Ishii (#1)
Re: length coerce for bpchar is broken since 7.0

Can someone comment on the status of this?

It seems the length coerce for bpchar is broken since 7.0.
In 6.5 when a string is inserted, bpchar() is called to properly clip
the string. However in 7.0 (and probably current) bpchar() is not
called anymore.

coerce_type_typmod() calls exprTypmod(). exprTypmod() returns VARSIZE
of the bpchar data only if the data type is bpchar (if the data type
is varchar, exprTypmod just returns -1 and the parser add a function
node to call varchar(). so there is no problem for varchar). If
VARSIZE returned from exprTypmod() and atttypmod passed to
coerce_type_typmod() is equal, the function node to call bpchar()
would not be added.

I'm not sure if this was an intended efect of the change. Anyway we
have to do the length coerce for bpchar somewhere and I'm thinking now
is doing in bpcharin(). This would also solve the problem in copy in a
mail I have posted.

Comments?
--
Tatsuo Ishii

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#13)
Re: length coerce for bpchar is broken since 7.0

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Can someone comment on the status of this?

regression=# create table foo (f1 char(7));
CREATE
regression=# insert into foo values ('123456789');
INSERT 145180 1
regression=# select * from foo;
f1
---------
1234567
(1 row)

Where's the problem?

regards, tom lane

#15Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Bruce Momjian (#13)
Re: length coerce for bpchar is broken since 7.0

I believe this has been fixed.

Show quoted text

Subject: [COMMITTERS] pgsql/src/backend/utils/adt (varchar.c)
From: ishii@postgresql.org
To: pgsql-committers@postgresql.org
Date: Sun, 26 Nov 2000 06:35:23 -0500 (EST)

Can someone comment on the status of this?

It seems the length coerce for bpchar is broken since 7.0.
In 6.5 when a string is inserted, bpchar() is called to properly clip
the string. However in 7.0 (and probably current) bpchar() is not
called anymore.

coerce_type_typmod() calls exprTypmod(). exprTypmod() returns VARSIZE
of the bpchar data only if the data type is bpchar (if the data type
is varchar, exprTypmod just returns -1 and the parser add a function
node to call varchar(). so there is no problem for varchar). If
VARSIZE returned from exprTypmod() and atttypmod passed to
coerce_type_typmod() is equal, the function node to call bpchar()
would not be added.

I'm not sure if this was an intended efect of the change. Anyway we
have to do the length coerce for bpchar somewhere and I'm thinking now
is doing in bpcharin(). This would also solve the problem in copy in a
mail I have posted.

Comments?
--
Tatsuo Ishii

-- 
Bruce Momjian                        |  http://candle.pha.pa.us
pgman@candle.pha.pa.us               |  (610) 853-3000
+  If your life is a hard drive,     |  830 Blythe Avenue
+  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026