Proper query implementation for Postgresql driver

Started by Shay Rojanskyover 11 years ago15 messages
#1Shay Rojansky
roji@roji.org

Hi.

I'm a core developer on npgsql (the Postgresql .NET provider), we're hard
at work on a major 3.0 version. I have a fundamental question that's been
bugging me for a while and would like to get your opinion on it.

Npgsql currently supports three basic query modes: simple, client-side
parameters and prepared. The first two use the Postgresql simple query flow
(client-side parameters means the user specifies parameters
programmatically, just like with prepared queries, but the actual
substitution work is done client-side). Prepared uses the Postgresql
extended query flow.

According to the Postgresql docs (49.2.2), the simple query flow, "the
format of the retrieved values is always text". This creates a burden where
npgsql needs to parse textual (and locale-specific!) info (e.g. dates,
money). The situation is even worse when doing client-side parameters,
since npgsql has to *create* textual representations that match what
Postgresql is expecting. The workaround for this issue up to now has been
to switch to culture-invariant formatting (e.g. lc_monetary=C), but this
approach imposes the setting on users and affects functions in ways they
don't necessarily want.

I would, in theory, love to switch the entire thing to binary and thereby
avoid all textual parsing once and for all. If I understand correctly, this
means all queries must be implemented as extended queries, with numerous
extra client-server roundtrips - which are a bit hard to stomach. Section
49.1.2 of the manual also states that the unnamed prepared statement and
portal are optimized for the case of executing a query only once, hinting
that this is the proper way to do things - but this optimization still
cannot not eliminate the extra roundtrips mentioned above (PREPARE, BIND,
EXECUTE).

Can someone please let me know what the recommended/best practice here
would be?

Thanks,

Shay

#2Marko Tiikkaja
marko@joh.to
In reply to: Shay Rojansky (#1)
Re: Proper query implementation for Postgresql driver

On 9/28/14, 11:53 AM, Shay Rojansky wrote:

I would, in theory, love to switch the entire thing to binary and thereby
avoid all textual parsing once and for all. If I understand correctly, this
means all queries must be implemented as extended queries, with numerous
extra client-server roundtrips - which are a bit hard to stomach. Section
49.1.2 of the manual also states that the unnamed prepared statement and
portal are optimized for the case of executing a query only once, hinting
that this is the proper way to do things - but this optimization still
cannot not eliminate the extra roundtrips mentioned above (PREPARE, BIND,
EXECUTE).

You don't have to do multiple round-trips for that; you can just send
all the messages in one go. See how e.g. libpq does it in PQexecParams().

.marko

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marko Tiikkaja (#2)
Re: Proper query implementation for Postgresql driver

Marko Tiikkaja <marko@joh.to> writes:

On 9/28/14, 11:53 AM, Shay Rojansky wrote:

[ complaint about multiple round trips in extended protocol ]

You don't have to do multiple round-trips for that; you can just send
all the messages in one go. See how e.g. libpq does it in PQexecParams().

Right. The key thing to understand is that after an error, the server
skips messages until it sees a Sync. So you can send out Parse, Bind,
Execute, Sync in one packet and not have to worry that the server will
attempt to execute a query that failed parsing or whatever.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Shay Rojansky
roji@roji.org
In reply to: Tom Lane (#3)
Re: Proper query implementation for Postgresql driver

Thanks guys, that makes perfect sense to me...

Am Sonntag, 28. September 2014 schrieb Tom Lane :

Show quoted text

Marko Tiikkaja <marko@joh.to <javascript:;>> writes:

On 9/28/14, 11:53 AM, Shay Rojansky wrote:

[ complaint about multiple round trips in extended protocol ]

You don't have to do multiple round-trips for that; you can just send
all the messages in one go. See how e.g. libpq does it in

PQexecParams().

Right. The key thing to understand is that after an error, the server
skips messages until it sees a Sync. So you can send out Parse, Bind,
Execute, Sync in one packet and not have to worry that the server will
attempt to execute a query that failed parsing or whatever.

regards, tom lane

#5Shay Rojansky
roji@roji.org
In reply to: Shay Rojansky (#1)
Fwd: Proper query implementation for Postgresql driver

Hi again, a few days ago I sent a question (below) about using the extended
query protocol efficiently (especially for non-prepared statements). Your
responses spawned a good discussion between the npgsql developers, it's
here if you wish to look/weigh in:
https://github.com/npgsql/npgsql/issues/370

The idea of using extended query protocol for non-prepared queries raised
another "best practices" question, which I'm hoping you can help with. To
get rid of text encoding (which is locale-dependent, inefficient etc.) for
certain fields, it seems that we have to get rid of it for *all* fields.
This is because we send queries without knowing their result columns in
advance, and would like to pack all messages into a single packet to avoid
roundtrips. In other words, when we send a Bind message we don't yet have a
RowDescription, and so we can' specify on an individual columnar basis
which fields should be returned as binary and which as text. We're left
with the option of either going "full binary" or having to support text
encoding for *all* fields - the abolition of which was the whole point.

We can split the query process into two roundtrips - wait for the
RowDescription and only then send Bind - in order to pick-and-choose
text/binary on a columnar basis, but this is a non-trivial performance hit
which we want to avoid.

Regarding the option of going "full binary"; selecting all types from
pg_type shows hundreds of types. We'd have to provide binary encode/decode
implementation for most (?) of them. In addition, say a new type is added
(via an extension for example); text encoding at least had the advantage of
not forcing us to support everything: the unknown type would be transferred
as text and we'd provide that text to the user as an unprocessed string.
Going full binary seems to eliminate this possibility.

I'd like to understand what we're supposed to do, as a Postgresql driver.
Are we supposed to:
1) Go full binary and implement all types (but what to do about unknown
ones)?
2) Do two roundtrips for queries, thereby hurting performance?
3) Support text encoding for all fields, and manage somehow with issues
such as locale variations (by forcing the locale to be culture invariant,
as we do now)?

Thanks again for your help!

Shay

---------- Forwarded message ----------
From: Shay Rojansky <roji@roji.org>
Date: Sun, Sep 28, 2014 at 11:53 AM
Subject: Proper query implementation for Postgresql driver
To: pgsql-hackers@postgresql.org

Hi.

I'm a core developer on npgsql (the Postgresql .NET provider), we're hard
at work on a major 3.0 version. I have a fundamental question that's been
bugging me for a while and would like to get your opinion on it.

Npgsql currently supports three basic query modes: simple, client-side
parameters and prepared. The first two use the Postgresql simple query flow
(client-side parameters means the user specifies parameters
programmatically, just like with prepared queries, but the actual
substitution work is done client-side). Prepared uses the Postgresql
extended query flow.

According to the Postgresql docs (49.2.2), the simple query flow, "the
format of the retrieved values is always text". This creates a burden where
npgsql needs to parse textual (and locale-specific!) info (e.g. dates,
money). The situation is even worse when doing client-side parameters,
since npgsql has to *create* textual representations that match what
Postgresql is expecting. The workaround for this issue up to now has been
to switch to culture-invariant formatting (e.g. lc_monetary=C), but this
approach imposes the setting on users and affects functions in ways they
don't necessarily want.

I would, in theory, love to switch the entire thing to binary and thereby
avoid all textual parsing once and for all. If I understand correctly, this
means all queries must be implemented as extended queries, with numerous
extra client-server roundtrips - which are a bit hard to stomach. Section
49.1.2 of the manual also states that the unnamed prepared statement and
portal are optimized for the case of executing a query only once, hinting
that this is the proper way to do things - but this optimization still
cannot not eliminate the extra roundtrips mentioned above (PREPARE, BIND,
EXECUTE).

Can someone please let me know what the recommended/best practice here
would be?

Thanks,

Shay

#6Craig Ringer
craig@2ndquadrant.com
In reply to: Shay Rojansky (#1)
Re: Proper query implementation for Postgresql driver

On 09/28/2014 05:53 PM, Shay Rojansky wrote:

Hi.

I'm a core developer on npgsql (the Postgresql .NET provider), we're
hard at work on a major 3.0 version. I have a fundamental question
that's been bugging me for a while and would like to get your opinion on it.

Npgsql currently supports three basic query modes: simple, client-side
parameters and prepared. The first two use the Postgresql simple query
flow (client-side parameters means the user specifies parameters
programmatically, just like with prepared queries, but the actual
substitution work is done client-side). Prepared uses the Postgresql
extended query flow.

Frankly, I suggest dropping "simple" entirely and using only the
parse/bind/describe/execute flow in the v3 protocol.

You can use this for server-side parameter binding *without* storing a
prepared statement by using unnamed statements.

Client-side parameter binding remains useful if you want to support
parameterisation where the PostgreSQL server its self does not, e.g. in
DDL. If you don't care about that, you could reasonably just drop client
side parameter support entirely.

I would, in theory, love to switch the entire thing to binary and
thereby avoid all textual parsing once and for all. If I understand
correctly, this means all queries must be implemented as extended
queries, with numerous extra client-server roundtrips - which are a bit
hard to stomach.

What round-trips?

You can and should send parse/bind/describe/execute messages
back-to-back without waiting for a server response. Just Sync and wait
for server response at the end.

You can even send a parse then a stream of bind/describe/execute
messages for batch execution of a prepared statement against a list of
params, then a single Sync at the end.

Can someone please let me know what the recommended/best practice here
would be?

You might want to check out what PgJDBC does; it's fairly sane in this area.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Shay Rojansky
roji@roji.org
In reply to: Craig Ringer (#6)
Re: Proper query implementation for Postgresql driver

Thanks for the answer Craig.

The remaining point of confusion is not really about simple vs. extended,
it's about text vs. binary. Sending parse/bind/describe/execute messages
back-to-back means the results have to be all text or all binary.

So the question is: are we supposed to transfer all types to and from the
backend in binary? If so, that raises some difficulties (see my previous
message) which I wanted to get your opinion on.

Thanks for the suggestion to look at PgJDBC, I'll do that.

Shay

On Tue, Sep 30, 2014 at 7:20 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

Show quoted text

On 09/28/2014 05:53 PM, Shay Rojansky wrote:

Hi.

I'm a core developer on npgsql (the Postgresql .NET provider), we're
hard at work on a major 3.0 version. I have a fundamental question
that's been bugging me for a while and would like to get your opinion on

it.

Npgsql currently supports three basic query modes: simple, client-side
parameters and prepared. The first two use the Postgresql simple query
flow (client-side parameters means the user specifies parameters
programmatically, just like with prepared queries, but the actual
substitution work is done client-side). Prepared uses the Postgresql
extended query flow.

Frankly, I suggest dropping "simple" entirely and using only the
parse/bind/describe/execute flow in the v3 protocol.

You can use this for server-side parameter binding *without* storing a
prepared statement by using unnamed statements.

Client-side parameter binding remains useful if you want to support
parameterisation where the PostgreSQL server its self does not, e.g. in
DDL. If you don't care about that, you could reasonably just drop client
side parameter support entirely.

I would, in theory, love to switch the entire thing to binary and
thereby avoid all textual parsing once and for all. If I understand
correctly, this means all queries must be implemented as extended
queries, with numerous extra client-server roundtrips - which are a bit
hard to stomach.

What round-trips?

You can and should send parse/bind/describe/execute messages
back-to-back without waiting for a server response. Just Sync and wait
for server response at the end.

You can even send a parse then a stream of bind/describe/execute
messages for batch execution of a prepared statement against a list of
params, then a single Sync at the end.

Can someone please let me know what the recommended/best practice here
would be?

You might want to check out what PgJDBC does; it's fairly sane in this
area.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Shay Rojansky (#5)
Re: Fwd: Proper query implementation for Postgresql driver

Shay Rojansky <roji@roji.org> writes:

The idea of using extended query protocol for non-prepared queries raised
another "best practices" question, which I'm hoping you can help with. To
get rid of text encoding (which is locale-dependent, inefficient etc.) for
certain fields, it seems that we have to get rid of it for *all* fields.
This is because we send queries without knowing their result columns in
advance, and would like to pack all messages into a single packet to avoid
roundtrips.

FWIW, I'd go with text results, especially if you already have code to
deal with that. PG's on-the-wire binary formats are more efficient to
process in some absolute sense, but they're hardly free: you need to
consider byte endianness for integers and floats, integer vs float
encoding for timestamps, the difference between PG's timestamp
representation and whatever native timestamps are on your platform,
etc etc. It's not a trivial amount of code to deal with. And in the
end I think the efficiency gain is pretty marginal compared to the raw
costs of data transfer, especially if you're not on the same physical
machine as the server.

Binary formats are a good tradeoff for individual applications that know
exactly which data types they need to deal with. It's harder to make the
case that they're worth the trouble in general-purpose client libraries.

Having said that, there has been some talk of letting client libraries
supply a "whitelist" of data types that they'd like to receive in binary.
We could do that (modulo questions of whether it's worth incurring a
protocol version change for), but I'm unclear on what the use case really
is for that. Wouldn't you then have to provide an inconsistent API to
users of your driver, that is some things are presented in text and others
not? Is that really a great thing?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Abhijit Menon-Sen
ams@2ndQuadrant.com
In reply to: Shay Rojansky (#5)
Re: Fwd: Proper query implementation for Postgresql driver

At 2014-09-30 07:09:38 +0200, roji@roji.org wrote:

the unknown type would be transferred as text and we'd provide that
text to the user as an unprocessed string. Going full binary seems to
eliminate this possibility.

If you're willing to hand the user an unprocessed string, why can't that
be the binary encoding just as well as text?

-- Abhijit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Shay Rojansky (#7)
Re: Proper query implementation for Postgresql driver

Shay Rojansky <roji@roji.org> writes:

Thanks for the suggestion to look at PgJDBC, I'll do that.

BTW, libpqtypes (http://libpqtypes.esilo.com) might be worth
studying as well. I've not used it myself, but it claims to
offer datatype-extensible processing of binary formats.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Atri Sharma
atri.jiit@gmail.com
In reply to: Tom Lane (#10)
Re: Proper query implementation for Postgresql driver

On Tue, Sep 30, 2014 at 11:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Shay Rojansky <roji@roji.org> writes:

Thanks for the suggestion to look at PgJDBC, I'll do that.

BTW, libpqtypes (http://libpqtypes.esilo.com) might be worth
studying as well. I've not used it myself, but it claims to
offer datatype-extensible processing of binary formats.

regards, tom lane

--

It does offer that, and is indeed a good idea in the current context.

#12Craig Ringer
craig@2ndquadrant.com
In reply to: Shay Rojansky (#5)
Re: Fwd: Proper query implementation for Postgresql driver

On 09/30/2014 01:09 PM, Shay Rojansky wrote:

The idea of using extended query protocol for non-prepared queries
raised another "best practices" question, which I'm hoping you can help
with. To get rid of text encoding (which is locale-dependent,
inefficient etc.) for certain fields, it seems that we have to get rid
of it for *all* fields. This is because we send queries without knowing
their result columns in advance, and would like to pack all messages
into a single packet to avoid roundtrips. In other words, when we send a
Bind message we don't yet have a RowDescription, and so we can' specify
on an individual columnar basis which fields should be returned as
binary and which as text. We're left with the option of either going
"full binary" or having to support text encoding for *all* fields - the
abolition of which was the whole point.

Even if you can't get rid of text support, dropping simple query
protocol support and the need to support client-side parameter binding
may well be a pleasant improvement.

We can split the query process into two roundtrips - wait for the
RowDescription and only then send Bind - in order to pick-and-choose
text/binary on a columnar basis, but this is a non-trivial performance
hit which we want to avoid.

It may be worth doing exactly this if you're doing large batches where
it might be a real win to use as many binary parameters as possible, but
I agree that you wouldn't want to do it for one-shot queries.

Regarding the option of going "full binary"; selecting all types from
pg_type shows hundreds of types. We'd have to provide binary
encode/decode implementation for most (?) of them.

All of them except 'internal' and 'unknown', really. There are some you
may not see in common queries, but you'll still run into them when
you're looking at the system catalogs.

Quite a few are binary compatible with each other, though, so you'll
need fewer individual implementations than you might expect. Take a look
at the castmethod in pg_cast to identify groups of binary compatible types.

In addition, say a
new type is added (via an extension for example); text encoding at least
had the advantage of not forcing us to support everything: the unknown
type would be transferred as text and we'd provide that text to the user
as an unprocessed string. Going full binary seems to eliminate this
possibility.

It does.

I'd like to understand what we're supposed to do, as a Postgresql
driver. Are we supposed to:
1) Go full binary and implement all types (but what to do about unknown
ones)?

It's also possible for a type not to have send/recv functions, i.e. to
support text-only use.

From the docs
(http://www.postgresql.org/docs/9.3/static/sql-createtype.html):

"The support functions input_function and output_function are required,
while the functions receive_function, send_function, ... are optional."

However, no built-in type lack binary I/O functions.

You could reasonably require that all user defined extension types must
support binary I/O. This will probably be fine in practice. As you said,
though, users would then have to install plugin for nPgSQL for each
custom type they wished to use because nPgSQL won't otherwise know what
to do with the binary data.

2) Do two roundtrips for queries, thereby hurting performance?
3) Support text encoding for all fields, and manage somehow with issues
such as locale variations (by forcing the locale to be culture
invariant, as we do now)?

That's pretty much what PgJDBC does, playing with extra_float_digits,
client_encoding, TimeZone, etc.

It's not lovely.

I would like to be able to specify a top-level option at Bind/Execute
time that asks the server to send binary for built-in types only, or for
a list of oids that we register ourselves as understanding binary for at
a session level. That would require a protocol change, though.

It might be worth taking some time to think about how we can help
clients get best use out of the binary protocol and noting it on the
TODO page for when we do protocol revision 4.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Shay Rojansky
roji@roji.org
In reply to: Craig Ringer (#12)
Re: Fwd: Proper query implementation for Postgresql driver

Thanks for all the answers.

Tom:

FWIW, I'd go with text results, especially if you already have code to
deal with that. PG's on-the-wire binary formats are more efficient to
process in some absolute sense, but they're hardly free: you need to
consider byte endianness for integers and floats, integer vs float
encoding for timestamps, the difference between PG's timestamp
representation and whatever native timestamps are on your platform,
etc etc. It's not a trivial amount of code to deal with. And in the
end I think the efficiency gain is pretty marginal compared to the raw
costs of data transfer, especially if you're not on the same physical
machine as the server.

In my mind data transfer was actually also a good reason to switch to
binary. I don't know how PG's binary timestamp looks like, but isn't it a
safe assumption that it's much more economical (and easy to parse) than any
textual representation? By the way, we already encode numbers and floats as
binary (an handle endianness) in certain contexts (parameters of prepared
statements). I don't underestimate the effort of binary implementation, but
if it's a one-time systematic effort it seems to be worth it?

Having said that, there has been some talk of letting client libraries
supply a "whitelist" of data types that they'd like to receive in binary.
We could do that (modulo questions of whether it's worth incurring a
protocol version change for), but I'm unclear on what the use case really
is for that. Wouldn't you then have to provide an inconsistent API to
users of your driver, that is some things are presented in text and others
not? Is that really a great thing?

Whitelisting binary types would solve the issue entirely for us here. As a
driver, I wouldn't be exposing the whitelist to users in any way; I would
simply use it to tell Postgresql (on a session level ideally) which types
we want to receive as binary. The user would access the data in the usual
way, no perceivable API change as far as I can see.

Abhijit:

If you're willing to hand the user an unprocessed string, why can't that
be the binary encoding just as well as text?

I might be mistaken, but in the case of textual encoding I can just hand
over the text, as I got it from PG, to the user and let them deal with it.
With binary, I get a blob of bytes that has no meaning to anyone. I can
hand it over to the user, but they can't be expected to be able to do
anything with it...

Tom and Atri:

TW, libpqtypes (http://libpqtypes.esilo.com) might be worth
studying as well. I've not used it myself, but it claims to
offer datatype-extensible processing of binary formats.

Thanks for the suggestion, I'll take a look. Since we're a pure .NET
implementation actual use of libpqtypes won't be possible, but it's
definitely possible to learn here. Although given how the protocol
currently looks like, I can't really see what could be done to support
"magical" support of binary encoding of arbitrary, unknown types...?

Craig:

Even if you can't get rid of text support, dropping simple query
protocol support and the need to support client-side parameter binding
may well be a pleasant improvement.

I definitely agree when it comes to dropping client-side parameter binding
(and there seems to be an agreement on that between the devs). But for the
case non-parameterized queries, there doesn't seem to be any benefit of
using the extended protocol over the simple one (if you're still doing
text), is there?

It's also possible for a type not to have send/recv functions, i.e. to
support text-only use.

In that case, what would be the behavior of selecting such a type with an
extended query that specifies "all results in binary"? A PG error?

You could reasonably require that all user defined extension types must
support binary I/O. This will probably be fine in practice. As you said,
though, users would then have to install plugin for nPgSQL for each
custom type they wished to use because nPgSQL won't otherwise know what
to do with the binary data.

This is the only true remaining point of difficulty for me. We could bite
the bullet, sit down and implement binary for everything built-in; but
eliminating the free use of extensions seems like a no-go.

That's pretty much what PgJDBC does, playing with extra_float_digits,
client_encoding, TimeZone, etc.
It's not lovely.

It definitely isn't... And we have user complaints on several counts as
well. The problem is that messing around with extra_float_digits,
lc_monetary and the rest also affect some Postgresql functions which do
text conversion... With extra_float_digits, "SELECT format('%s',
0.28::double precision)" returns 0.280000000000000027 rather than 0.28. The
point is that the hacks we're doing to support textual *wire* encoding also
impact non-wire functionality and affecting users in unwanted ways.

I would like to be able to specify a top-level option at Bind/Execute
time that asks the server to send binary for built-in types only, or for
a list of oids that we register ourselves as understanding binary for at
a session level. That would require a protocol change, though.

I absolutely agree something is missing. I'd go more with whitelisting
approach as Tom said above: a session-level list of OIDs for which binary
encoding is requested for all cases throughout the session. It would cause
somewhat less breakage - no actual message format is changed (although you
suddenly start to get binary returns values in "unexpected" places, i.e.
results of extended queries which requested "all text"). It would solve the
problem in an ideal way.

Shay

On Tue, Sep 30, 2014 at 7:46 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

Show quoted text

On 09/30/2014 01:09 PM, Shay Rojansky wrote:

The idea of using extended query protocol for non-prepared queries
raised another "best practices" question, which I'm hoping you can help
with. To get rid of text encoding (which is locale-dependent,
inefficient etc.) for certain fields, it seems that we have to get rid
of it for *all* fields. This is because we send queries without knowing
their result columns in advance, and would like to pack all messages
into a single packet to avoid roundtrips. In other words, when we send a
Bind message we don't yet have a RowDescription, and so we can' specify
on an individual columnar basis which fields should be returned as
binary and which as text. We're left with the option of either going
"full binary" or having to support text encoding for *all* fields - the
abolition of which was the whole point.

Even if you can't get rid of text support, dropping simple query
protocol support and the need to support client-side parameter binding
may well be a pleasant improvement.

We can split the query process into two roundtrips - wait for the
RowDescription and only then send Bind - in order to pick-and-choose
text/binary on a columnar basis, but this is a non-trivial performance
hit which we want to avoid.

It may be worth doing exactly this if you're doing large batches where
it might be a real win to use as many binary parameters as possible, but
I agree that you wouldn't want to do it for one-shot queries.

Regarding the option of going "full binary"; selecting all types from
pg_type shows hundreds of types. We'd have to provide binary
encode/decode implementation for most (?) of them.

All of them except 'internal' and 'unknown', really. There are some you
may not see in common queries, but you'll still run into them when
you're looking at the system catalogs.

Quite a few are binary compatible with each other, though, so you'll
need fewer individual implementations than you might expect. Take a look
at the castmethod in pg_cast to identify groups of binary compatible types.

In addition, say a
new type is added (via an extension for example); text encoding at least
had the advantage of not forcing us to support everything: the unknown
type would be transferred as text and we'd provide that text to the user
as an unprocessed string. Going full binary seems to eliminate this
possibility.

It does.

I'd like to understand what we're supposed to do, as a Postgresql
driver. Are we supposed to:
1) Go full binary and implement all types (but what to do about unknown
ones)?

It's also possible for a type not to have send/recv functions, i.e. to
support text-only use.

From the docs
(http://www.postgresql.org/docs/9.3/static/sql-createtype.html):

"The support functions input_function and output_function are required,
while the functions receive_function, send_function, ... are optional."

However, no built-in type lack binary I/O functions.

You could reasonably require that all user defined extension types must
support binary I/O. This will probably be fine in practice. As you said,
though, users would then have to install plugin for nPgSQL for each
custom type they wished to use because nPgSQL won't otherwise know what
to do with the binary data.

2) Do two roundtrips for queries, thereby hurting performance?
3) Support text encoding for all fields, and manage somehow with issues
such as locale variations (by forcing the locale to be culture
invariant, as we do now)?

That's pretty much what PgJDBC does, playing with extra_float_digits,
client_encoding, TimeZone, etc.

It's not lovely.

I would like to be able to specify a top-level option at Bind/Execute
time that asks the server to send binary for built-in types only, or for
a list of oids that we register ourselves as understanding binary for at
a session level. That would require a protocol change, though.

It might be worth taking some time to think about how we can help
clients get best use out of the binary protocol and noting it on the
TODO page for when we do protocol revision 4.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Shay Rojansky (#13)
Re: Fwd: Proper query implementation for Postgresql driver

[ too tired to respond to the other points, but: ]

Shay Rojansky <roji@roji.org> writes:

It's also possible for a type not to have send/recv functions, i.e. to
support text-only use.

In that case, what would be the behavior of selecting such a type with an
extended query that specifies "all results in binary"? A PG error?

Yup.

if (!OidIsValid(pt->typsend))
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_FUNCTION),
errmsg("no binary output function available for type %s",
format_type_be(type))));

There's an exactly parallel error if you try to send a parameter in
binary when its datatype hasn't got a typreceive function.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Robert Haas
robertmhaas@gmail.com
In reply to: Craig Ringer (#6)
Re: Proper query implementation for Postgresql driver

On Tue, Sep 30, 2014 at 1:20 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

Frankly, I suggest dropping "simple" entirely and using only the
parse/bind/describe/execute flow in the v3 protocol.

The last time I checked, that was significantly slower.

/messages/by-id/CA+TgmoYJKfnMrtMhODwhNoj1jwcgzs_H1R70erCEcrWJM65DUQ@mail.gmail.com

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers