Simplifying unknown-literal handling

Started by Tom Laneover 20 years ago9 messages
#1Tom Lane
tgl@sss.pgh.pa.us

For the past couple of releases we've had support for cstring
(null-terminated string) as a full fledged datatype: you set
typlen = -2 to indicate that strlen() must be used to calculate
the actual size of a Datum.

It occurs to me that we should change type UNKNOWN's internal
representation to be like cstring rather than like text. The
advantage of this is that the places in the parser that currently
call unknownin and unknownout could be replaced by just
CStringGetDatum and DatumGetCString, respectively, thus saving
two palloc's and two memcpy's per string literal. It's not much,
but considering that this happens every time we parse a string
literal, I think it'll add up to a savings worth the small amount
of effort needed.

Anyone see a reason not to change this?

regards, tom lane

#2Alvaro Herrera
alvherre@surnet.cl
In reply to: Tom Lane (#1)
Re: Simplifying unknown-literal handling

On Sun, May 29, 2005 at 11:47:18AM -0400, Tom Lane wrote:

For the past couple of releases we've had support for cstring
(null-terminated string) as a full fledged datatype: you set
typlen = -2 to indicate that strlen() must be used to calculate
the actual size of a Datum.

It occurs to me that we should change type UNKNOWN's internal
representation to be like cstring rather than like text. The
advantage of this is that the places in the parser that currently
call unknownin and unknownout could be replaced by just
CStringGetDatum and DatumGetCString, respectively, thus saving
two palloc's and two memcpy's per string literal. It's not much,
but considering that this happens every time we parse a string
literal, I think it'll add up to a savings worth the small amount
of effort needed.

Anyone see a reason not to change this?

Is there any way we use UNKNOWN to represent bytea literals?
Say, comparing a untyped literal to a bytea column?

--
Alvaro Herrera (<alvherre[a]surnet.cl>)
"Sallah, I said NO camels! That's FIVE camels; can't you count?"
(Indiana Jones)

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#2)
Re: Simplifying unknown-literal handling

Alvaro Herrera <alvherre@surnet.cl> writes:

On Sun, May 29, 2005 at 11:47:18AM -0400, Tom Lane wrote:

Anyone see a reason not to change this?

Is there any way we use UNKNOWN to represent bytea literals?
Say, comparing a untyped literal to a bytea column?

We use UNKNOWN to represent the raw string literal before we've
figured out that we need to feed it to byteain. There aren't
going to be any embedded nulls at that point, if that's what
you are wondering.

If we ever decide to try to support embedded nulls in datatype
external representations, there are going to be way more changes
needed than just changing UNKNOWN again ... for starters, changing
the I/O functions of every single built-in and user-defined data type.
I don't think that's ever going to happen, so I'm not particularly
worried about propagating the assumption into one more place.

regards, tom lane

#4Andrew - Supernews
andrew+nonews@supernews.com
In reply to: Tom Lane (#1)
Re: Simplifying unknown-literal handling

On 2005-05-29, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@surnet.cl> writes:

On Sun, May 29, 2005 at 11:47:18AM -0400, Tom Lane wrote:

Anyone see a reason not to change this?

Is there any way we use UNKNOWN to represent bytea literals?
Say, comparing a untyped literal to a bytea column?

We use UNKNOWN to represent the raw string literal before we've
figured out that we need to feed it to byteain. There aren't
going to be any embedded nulls at that point, if that's what
you are wondering.

Are there any cases where UNKNOWN can be received from the frontend as
a binary value? I suspect there are.

--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew - Supernews (#4)
Re: Simplifying unknown-literal handling

Andrew - Supernews <andrew+nonews@supernews.com> writes:

Are there any cases where UNKNOWN can be received from the frontend as
a binary value? I suspect there are.

Sure, but that's transparent because we have binary I/O converters.
You will have trouble if you try to inject an embedded zero that way,
but the end result will look about the same as when you try to inject
an embedded zero now: the data after the zero will be dropped on readout.

regards, tom lane

#6Andrew - Supernews
andrew+nonews@supernews.com
In reply to: Tom Lane (#1)
Re: Simplifying unknown-literal handling

On 2005-05-29, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andrew - Supernews <andrew+nonews@supernews.com> writes:

Are there any cases where UNKNOWN can be received from the frontend as
a binary value? I suspect there are.

Sure, but that's transparent because we have binary I/O converters.
You will have trouble if you try to inject an embedded zero that way,
but the end result will look about the same as when you try to inject
an embedded zero now: the data after the zero will be dropped on readout.

What happens if you send an UNKNOWN from the frontend as binary, and then
when the desired type is figured out, it turns out to be a bytea? It's
obviously not acceptable then to truncate after a zero byte.

--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew - Supernews (#6)
Re: Simplifying unknown-literal handling

Andrew - Supernews <andrew+nonews@supernews.com> writes:

What happens if you send an UNKNOWN from the frontend as binary, and then
when the desired type is figured out, it turns out to be a bytea? It's
obviously not acceptable then to truncate after a zero byte.

This isn't an issue, because if the desired type is something other than
UNKNOWN, we won't be using UNKNOWN's binary input converter. The actual
flow of information in the case you're thinking of is:

1. Client sends Parse message with, say, query
INSERT INTO tab(byteacol) VALUES($1);
and the type of param 1 either not specified or given as UNKNOWN.

2. Backend infers actual type of param 1 from context as BYTEA.

3. Client may or may not bother issuing a Describe to find out actual
type of parameter(s).

4. Client sends BIND with a binary value; backend applies BYTEA's input
converter (which is essentially memcpy).

Offhand I think the only way you could actually invoke UNKNOWN's binary
input converter is by executing a PREPARE with a parameter position
specifically declared as UNKNOWN, viz
PREPARE foo(unknown) AS ...
and then using foo as the target of a binary BIND message. I don't
think we are under contract to promise that such a thing will have any
particular behavior; and certainly not to promise that it will behave
more like bytea than like text. In any case there is no runtime
coercion from UNKNOWN to BYTEA, so you'd really have to work at it
to cons up a case where you got behavior you didn't like.

regards, tom lane

#8Andrew - Supernews
andrew+nonews@supernews.com
In reply to: Tom Lane (#1)
Re: Simplifying unknown-literal handling

On 2005-05-29, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andrew - Supernews <andrew+nonews@supernews.com> writes:

What happens if you send an UNKNOWN from the frontend as binary, and then
when the desired type is figured out, it turns out to be a bytea? It's
obviously not acceptable then to truncate after a zero byte.

This isn't an issue, because if the desired type is something other than
UNKNOWN, we won't be using UNKNOWN's binary input converter. The actual
flow of information in the case you're thinking of is:

1. Client sends Parse message with, say, query
INSERT INTO tab(byteacol) VALUES($1);
and the type of param 1 either not specified or given as UNKNOWN.

2. Backend infers actual type of param 1 from context as BYTEA.

Hrm. I was thinking of the case where the backend can't necessarily do
this, but in fact in that case the Parse seems to fail.

Offhand I think the only way you could actually invoke UNKNOWN's binary
input converter is by executing a PREPARE with a parameter position
specifically declared as UNKNOWN, viz

Which of course leads to the question of why UNKNOWN has a binary input
converter at all...

--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew - Supernews (#8)
Re: Simplifying unknown-literal handling

Andrew - Supernews <andrew+nonews@supernews.com> writes:

On 2005-05-29, Tom Lane <tgl@sss.pgh.pa.us> wrote:

2. Backend infers actual type of param 1 from context as BYTEA.

Hrm. I was thinking of the case where the backend can't necessarily do
this, but in fact in that case the Parse seems to fail.

Right, deliberately so, for precisely the reason that we need to know
the correct input converters to use.

Offhand I think the only way you could actually invoke UNKNOWN's binary
input converter is by executing a PREPARE with a parameter position
specifically declared as UNKNOWN, viz

Which of course leads to the question of why UNKNOWN has a binary input
converter at all...

Maybe it shouldn't. It does need a binary output converter, to avoid
gratuitous failures in cases like
SELECT 'foo';
so I figure it's probably best to leave the input converter there ...

regards, tom lane