Verifying embedded oids in *recv is a bad idea

Started by Andres Freundover 9 years ago3 messages

andres@anarazel.de

over 9 years ago

Hi,

for performance reasons it's a good idea to use the binary protocol. But
doing so between two postgres installations is made unnecessarily hard
by the choice of embedding and verifying oids in composite and array
types. When using extensions, even commonly used ones like hstore, the
datatype oids will often not be the same between systems, even when
using exactly the same version on the same OS.

Specifically I'm talking about

Datum
array_recv(PG_FUNCTION_ARGS)
{
...
element_type = pq_getmsgint(buf, sizeof(Oid));
if (element_type != spec_element_type)
{
/* XXX Can we allow taking the input element type in any cases? */
ereport(ERROR,
(errcode(ERRCODE_DATATYPE_MISMATCH),
errmsg("wrong element type")));
}
...
}

and

Datum
record_recv(PG_FUNCTION_ARGS)
{
...
/* Process each column */
for (i = 0; i < ncolumns; i++)
...
/* Verify column datatype */
coltypoid = pq_getmsgint(buf, sizeof(Oid));
if (coltypoid != column_type)
ereport(ERROR,
(errcode(ERRCODE_DATATYPE_MISMATCH),
errmsg("wrong data type: %u, expected %u",
coltypoid, column_type)));
...
}

given that we're giving up quite some speed and adding complexity to
make send/recv functions portable, this seems like a shame.

I'm failing to see what these checks are buying us? I mean the text
representation of a composite doesn't include type information about
contained columns either, I don't see why the binary version has to?

I'm basically thinking that we should remove the checks on the receiving
side, but leave the sending of oids in place for backward compat.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Kyotaro HORIGUCHI

horiguchi.kyotaro@lab.ntt.co.jp

over 9 years ago

In reply to: Andres Freund (#1)

Re: Verifying embedded oids in *recv is a bad idea

Hello,

At Mon, 25 Apr 2016 17:17:13 -0700, Andres Freund <andres@anarazel.de> wrote in <20160426001713.hbqdiwvf4mkzkg55@alap3.anarazel.de>

Hi,

for performance reasons it's a good idea to use the binary protocol. But
doing so between two postgres installations is made unnecessarily hard
by the choice of embedding and verifying oids in composite and array
types. When using extensions, even commonly used ones like hstore, the
datatype oids will often not be the same between systems, even when
using exactly the same version on the same OS.

Specifically I'm talking about

Datum
array_recv(PG_FUNCTION_ARGS)
{
...
element_type = pq_getmsgint(buf, sizeof(Oid));
if (element_type != spec_element_type)
{
/* XXX Can we allow taking the input element type in any cases? */
ereport(ERROR,
(errcode(ERRCODE_DATATYPE_MISMATCH),
errmsg("wrong element type")));
}
...
}

and

Datum
record_recv(PG_FUNCTION_ARGS)
{
...
/* Process each column */
for (i = 0; i < ncolumns; i++)
...
/* Verify column datatype */
coltypoid = pq_getmsgint(buf, sizeof(Oid));
if (coltypoid != column_type)
ereport(ERROR,
(errcode(ERRCODE_DATATYPE_MISMATCH),
errmsg("wrong data type: %u, expected %u",
coltypoid, column_type)));
...
}

given that we're giving up quite some speed and adding complexity to
make send/recv functions portable, this seems like a shame.

I'm failing to see what these checks are buying us? I mean the text
representation of a composite doesn't include type information about
contained columns either, I don't see why the binary version has to?

I'm basically thinking that we should remove the checks on the receiving
side, but leave the sending of oids in place for backward compat.

FWIW, I think that typsend/typreceive are intended for the use by
COPY FROM/TO and the doc says that the following.

http://www.postgresql.org/docs/devel/static/sql-copy.html

Binary Format

The binary format option causes all data to be stored/read as
binary format rather than as text. It is somewhat faster than the
text and CSV formats, but a binary-format file is less portable
across machine architectures and PostgreSQL versions. Also, the
binary format is very data type specific; for example it will not
work to output binary data from a smallint column and read it
into an integer column, even though that would work fine in text
format.

I haven't considered much about the usefulness of this
restriction, but receiving into an array of a different type
easily leads to a crash.

# However, false matching of type oids also would do so.

I suppose that we should reconsider here if removing the checks.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Jelte Fennema

postgres@jeltef.nl

over 5 years ago

In reply to: Kyotaro HORIGUCHI (#2)

Re: Verifying embedded oids in *recv is a bad idea

So I ran into this as well and I think this should actually be considered
a bug, since COPY ... BINARY does not work between nodes now.
Which is why I have posted it to the pgsql-bugs list with some reproducible
steps to trigger the bug:
/messages/by-id/16485-f7b2dddca52ef2ae@postgresql.org

Like Andres said, the oids are not checked for the text protocol versions
either. And like Kyotaro says, if the type that has a different OID would
actually have a different binary encoding, parsing will simply fail at that
point.
I don't think there's any practical reason to keep these checks. They only
get in the way of using the more efficient binary encoding for anything
other than simple types.
The only case where these *_recv functions would currently work is if you
COPY ... BINARY to the exact same postgres instance (when not dropping and
recreating types). But in that case there's little reason to use COPY at
all.

On Tue, 9 Jun 2020 at 18:33, Kyotaro HORIGUCHI <
horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Show quoted text

Hello,

At Mon, 25 Apr 2016 17:17:13 -0700, Andres Freund <andres@anarazel.de>
wrote in <20160426001713.hbqdiwvf4mkzkg55@alap3.anarazel.de>

Hi,

for performance reasons it's a good idea to use the binary protocol. But
doing so between two postgres installations is made unnecessarily hard
by the choice of embedding and verifying oids in composite and array
types. When using extensions, even commonly used ones like hstore, the
datatype oids will often not be the same between systems, even when
using exactly the same version on the same OS.

Specifically I'm talking about

Datum
array_recv(PG_FUNCTION_ARGS)
{
...
element_type = pq_getmsgint(buf, sizeof(Oid));
if (element_type != spec_element_type)
{
/* XXX Can we allow taking the input element type in any

cases? */

ereport(ERROR,
(errcode(ERRCODE_DATATYPE_MISMATCH),
errmsg("wrong element type")));
}
...
}

and

Datum
record_recv(PG_FUNCTION_ARGS)
{
...
/* Process each column */
for (i = 0; i < ncolumns; i++)
...
/* Verify column datatype */
coltypoid = pq_getmsgint(buf, sizeof(Oid));
if (coltypoid != column_type)
ereport(ERROR,

(errcode(ERRCODE_DATATYPE_MISMATCH),

errmsg("wrong data type: %u,

expected %u",

coltypoid,

column_type)));

...
}

given that we're giving up quite some speed and adding complexity to
make send/recv functions portable, this seems like a shame.

I'm failing to see what these checks are buying us? I mean the text
representation of a composite doesn't include type information about
contained columns either, I don't see why the binary version has to?

I'm basically thinking that we should remove the checks on the receiving
side, but leave the sending of oids in place for backward compat.

FWIW, I think that typsend/typreceive are intended for the use by
COPY FROM/TO and the doc says that the following.

http://www.postgresql.org/docs/devel/static/sql-copy.html

Binary Format

The binary format option causes all data to be stored/read as
binary format rather than as text. It is somewhat faster than the
text and CSV formats, but a binary-format file is less portable
across machine architectures and PostgreSQL versions. Also, the
binary format is very data type specific; for example it will not
work to output binary data from a smallint column and read it
into an integer column, even though that would work fine in text
format.

I haven't considered much about the usefulness of this
restriction, but receiving into an array of a different type
easily leads to a crash.

# However, false matching of type oids also would do so.

I suppose that we should reconsider here if removing the checks.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers