Request for comment on setting binary format output per session

pgsql@j-davis.com

about 3 years ago

In reply to: Dave Cramer (#1)

Re: Request for comment on setting binary format output per session

On Thu, 2023-03-02 at 09:13 -0500, Dave Cramer wrote:

I'd like to open up this discussion again so that we can
move forward. I prefer the GUC as it is relatively simple and as
Peter mentioned it works, but I'm not married to the idea.

It's not very friendly to extensions, where the types are not
guaranteed to have stable OIDs. Did you consider any proposals that
work with type names?

Regards,
Jeff Davis

pg@fastcrypt.com

about 3 years ago

In reply to: Jeff Davis (#2)

Re: Request for comment on setting binary format output per session

Dave Cramer

On Sat, 4 Mar 2023 at 11:35, Jeff Davis <pgsql@j-davis.com> wrote:

On Thu, 2023-03-02 at 09:13 -0500, Dave Cramer wrote:

I'd like to open up this discussion again so that we can
move forward. I prefer the GUC as it is relatively simple and as
Peter mentioned it works, but I'm not married to the idea.

It's not very friendly to extensions, where the types are not
guaranteed to have stable OIDs. Did you consider any proposals that
work with type names?

I had not.
Most of the clients know how to decode the builtin types. I'm not sure
there is a use case for binary encode types that the clients don't have a
priori knowledge of.

Dave

Show quoted text

Regards,
Jeff Davis

pgsql@j-davis.com

about 3 years ago

In reply to: Dave Cramer (#3)

Re: Request for comment on setting binary format output per session

On Sat, 2023-03-04 at 18:04 -0500, Dave Cramer wrote:

Most of the clients know how to decode the builtin types. I'm not
sure there is a use case for binary encode types that the clients
don't have a priori knowledge of.

The client could, in theory, have a priori knowledge of a non-builtin
type.

I don't have a great solution for that, though. Maybe it's only
practical for builtin types.

Regards,
Jeff Davis

tgl@sss.pgh.pa.us

about 3 years ago

In reply to: Jeff Davis (#4)

Re: Request for comment on setting binary format output per session

Jeff Davis <pgsql@j-davis.com> writes:

On Sat, 2023-03-04 at 18:04 -0500, Dave Cramer wrote:

Most of the clients know how to decode the builtin types. I'm not
sure there is a use case for binary encode types that the clients
don't have a priori knowledge of.

The client could, in theory, have a priori knowledge of a non-builtin
type.

I don't see what's "in theory" about that. There seems plenty of
use for binary I/O of, say, PostGIS types. Even for built-in types,
do we really want to encourage people to hard-wire their OIDs into
applications?

I don't see a big problem with driving this off a GUC, but I think
it should be a list of type names not OIDs. We already have plenty
of precedent for dealing with that sort of thing; see search_path
for the canonical example. IIRC, there's similar caching logic
for temp_tablespaces.

regards, tom lane

David G. Johnston

david.g.johnston@gmail.com

about 3 years ago

In reply to: Tom Lane (#5)

Re: Request for comment on setting binary format output per session

On Sat, Mar 4, 2023 at 5:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jeff Davis <pgsql@j-davis.com> writes:

On Sat, 2023-03-04 at 18:04 -0500, Dave Cramer wrote:

Most of the clients know how to decode the builtin types. I'm not
sure there is a use case for binary encode types that the clients
don't have a priori knowledge of.

The client could, in theory, have a priori knowledge of a non-builtin
type.

I don't see what's "in theory" about that. There seems plenty of
use for binary I/O of, say, PostGIS types. Even for built-in types,
do we really want to encourage people to hard-wire their OIDs into
applications?

I don't see a big problem with driving this off a GUC, but I think
it should be a list of type names not OIDs. We already have plenty
of precedent for dealing with that sort of thing; see search_path
for the canonical example. IIRC, there's similar caching logic
for temp_tablespaces.

This seems slightly different since types depend upon schemas whereas
search_path is top-level and tablespaces are global. But I agree that
names should be accepted, maybe in addition to OIDs, the latter, for core
types in particular, being a way to not have to worry about masking in
user-space.

David J.

pg@fastcrypt.com

about 3 years ago

In reply to: Tom Lane (#5)

Re: Request for comment on setting binary format output per session

On Sat, 4 Mar 2023 at 19:06, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jeff Davis <pgsql@j-davis.com> writes:

On Sat, 2023-03-04 at 18:04 -0500, Dave Cramer wrote:

Most of the clients know how to decode the builtin types. I'm not
sure there is a use case for binary encode types that the clients
don't have a priori knowledge of.

The client could, in theory, have a priori knowledge of a non-builtin
type.

I don't see what's "in theory" about that. There seems plenty of
use for binary I/O of, say, PostGIS types. Even for built-in types,
do we really want to encourage people to hard-wire their OIDs into
applications?

How does a client read these? I'm pretty narrowly focussed. The JDBC API
doesn't really have a way to read a non built-in type. There is a facility
to read a UDT, but the user would have to provide that transcoder. I guess
I'm curious how other clients read binary UDT's ?

I don't see a big problem with driving this off a GUC, but I think
it should be a list of type names not OIDs. We already have plenty
of precedent for dealing with that sort of thing; see search_path
for the canonical example. IIRC, there's similar caching logic
for temp_tablespaces.

I have no issue with allowing names, OID's were compact, but we could
easily support both

Dave

pg@fastcrypt.com

about 3 years ago

In reply to: Dave Cramer (#7)

Re: Request for comment on setting binary format output per session

Dave Cramer

On Sat, 4 Mar 2023 at 19:39, Dave Cramer <davecramer@gmail.com> wrote:

On Sat, 4 Mar 2023 at 19:06, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jeff Davis <pgsql@j-davis.com> writes:

On Sat, 2023-03-04 at 18:04 -0500, Dave Cramer wrote:

Most of the clients know how to decode the builtin types. I'm not
sure there is a use case for binary encode types that the clients
don't have a priori knowledge of.

The client could, in theory, have a priori knowledge of a non-builtin
type.

I don't see what's "in theory" about that. There seems plenty of
use for binary I/O of, say, PostGIS types. Even for built-in types,
do we really want to encourage people to hard-wire their OIDs into
applications?

How does a client read these? I'm pretty narrowly focussed. The JDBC API
doesn't really have a way to read a non built-in type. There is a facility
to read a UDT, but the user would have to provide that transcoder. I guess
I'm curious how other clients read binary UDT's ?

I don't see a big problem with driving this off a GUC, but I think
it should be a list of type names not OIDs. We already have plenty
of precedent for dealing with that sort of thing; see search_path
for the canonical example. IIRC, there's similar caching logic
for temp_tablespaces.

I have no issue with allowing names, OID's were compact, but we could
easily support both

Attached is a preliminary patch that takes a list of OID's. I'd like to
know if this is going in the right direction.

Next step would be to deal with type names as opposed to OID's.
This will be a bit more challenging as type names are schema specific.

Dave

Show quoted text

pgsql@j-davis.com

about 3 years ago

In reply to: Dave Cramer (#8)

Re: Request for comment on setting binary format output per session

On Mon, 2023-03-13 at 16:33 -0400, Dave Cramer wrote:

Attached is a preliminary patch that takes a list of OID's. I'd like
to know if this is going in the right direction.

I found a few issues:

1. Some kind of memory error:

SET format_binary='25,1082,1184';
WARNING: problem in alloc set PortalContext: detected write past
chunk end in block 0x55ba7b5f7610, chunk 0x55ba7b5f7a48
...
SET

2. Easy to confuse psql:

CREATE TABLE a(d date, t timestamptz);
SET format_binary='25,1082,1184';
SELECT * FROM a;
d | t
---+---
! |
(1 row)

3. Some style issues
- use of "//" comments
- findOid should return bool, not int

When you add support for user-defined types, that introduces a couple
other issues:

4. The format_binary GUC would depend on the search_path GUC, which
isn't great.

5. There's a theoretical invalidation problem. It might also be a
practical problem in some testing setups with long-lived connections
that are recreating user-defined types.

We've had this problem with binary for a long time, and it seems
desirable to solve it. But I'm not sure GUCs are the right way.

How hard did you try to solve it in the protocol rather than with a
GUC? I see that the startup message allows protocol extensions by
prefixing a parameter name with "_pq_". Are protocol extensions
documented somewhere and would that be a reasonable thing to do here?

Also, if we're going to make the binary format more practical to use,
can we document the expectations better? It seems the expecatation is
that the binary format just never changes, and that if it does, that's
a new type name.

Regards,
Jeff Davis

#10

pg@fastcrypt.com

about 3 years ago

In reply to: Jeff Davis (#9)

Re: Request for comment on setting binary format output per session

+Paul Ramsey

On Mon, 20 Mar 2023 at 13:05, Jeff Davis <pgsql@j-davis.com> wrote:

On Mon, 2023-03-13 at 16:33 -0400, Dave Cramer wrote:

Attached is a preliminary patch that takes a list of OID's. I'd like
to know if this is going in the right direction.

Thanks for the review. I'm curious what system you are running on as I
don't see any of these errors.

I found a few issues:

1. Some kind of memory error:

SET format_binary='25,1082,1184';
WARNING: problem in alloc set PortalContext: detected write past
chunk end in block 0x55ba7b5f7610, chunk 0x55ba7b5f7a48
...
SET

2. Easy to confuse psql:

CREATE TABLE a(d date, t timestamptz);
SET format_binary='25,1082,1184';
SELECT * FROM a;
d | t
---+---
! |
(1 row)

Well I'm guessing psql doesn't know how to read date or timestamptz in

binary. This is not a failing of the code.

3. Some style issues
- use of "//" comments
- findOid should return bool, not int

Sure will fix see attached patch

When you add support for user-defined types, that introduces a couple
other issues:

4. The format_binary GUC would depend on the search_path GUC, which
isn't great.

This is an interesting question. If the type isn't visible then it's not
visible to the query so

5. There's a theoretical invalidation problem. It might also be a
practical problem in some testing setups with long-lived connections
that are recreating user-defined types.

UDT's seem to be a problem here which candidly have very little use case
for binary output.

We've had this problem with binary for a long time, and it seems
desirable to solve it. But I'm not sure GUCs are the right way.

How hard did you try to solve it in the protocol rather than with a
GUC? I see that the startup message allows protocol extensions by
prefixing a parameter name with "_pq_". Are protocol extensions
documented somewhere and would that be a reasonable thing to do here?

I didn't try to solve it as Tom was OK with using a GUC. Using a startup
GUC is interesting,
but how would that work with pools where we want to reset the connection
when we return it and then
set the binary format on borrow ? By using a GUC when a client borrows a
connection from a pool the client
can reconfigure the oids it wants formatted in binary.

Also, if we're going to make the binary format more practical to use,
can we document the expectations better?

Yes we can do that.

It seems the expecatation is
that the binary format just never changes, and that if it does, that's
a new type name.

I really hadn't considered supporting type names. I have asked Paul

Ramsey about PostGIS and he doesn't see PostGIS using this.

Show quoted text

Regards,
Jeff Davis

#11

pgsql@j-davis.com

about 3 years ago

In reply to: Jeff Davis (#9)

Re: Request for comment on setting binary format output per session

On Mon, 2023-03-20 at 10:04 -0700, Jeff Davis wrote:

CREATE TABLE a(d date, t timestamptz);
SET format_binary='25,1082,1184';
SELECT * FROM a;
d | t
---+---
! |
(1 row)

Oops, missing the following statement after the CREATE TABLE:

INSERT INTO a VALUES('1234-01-01', '2023-03-20 09:00:00');

Regards,
Jeff Davis

#12

tgl@sss.pgh.pa.us

about 3 years ago

In reply to: Dave Cramer (#10)

Re: Request for comment on setting binary format output per session

Dave Cramer <davecramer@gmail.com> writes:

On Mon, 20 Mar 2023 at 13:05, Jeff Davis <pgsql@j-davis.com> wrote:

2. Easy to confuse psql:

CREATE TABLE a(d date, t timestamptz);
SET format_binary='25,1082,1184';
SELECT * FROM a;
d | t
---+---
! |
(1 row)

Well I'm guessing psql doesn't know how to read date or timestamptz in
binary. This is not a failing of the code.

What it is is a strong suggestion that controlling this via a GUC is
not a great choice. There are many inappropriate (wrong abstraction
level) ways to change a GUC and thereby break a client that's not
expecting binary output. I think Jeff's suggestion that we should
treat this as a protocol extension might be a good idea.

If I recall the protocol-extension design correctly, such a setting
could only be set at session start, which could be annoying --- at the
very least we'd have to tolerate entries for unrecognized data types,
since clients couldn't be expected to have checked the list against
the current server in advance.

regards, tom lane

#13

pgsql@j-davis.com

about 3 years ago

In reply to: Dave Cramer (#10)

Re: Request for comment on setting binary format output per session

On Mon, 2023-03-20 at 14:36 -0400, Dave Cramer wrote:

Thanks for the review. I'm curious what system you are running on as
I don't see any of these errors.

Are asserts enabled?

Well I'm guessing psql doesn't know how to read date or timestamptz
in binary. This is not a failing of the code.

It seems strange, and potentially dangerous, to send binary data to a
client that's not expecting it. It feels too easy to cause confusion by
changing the GUC mid-session.

Also, it seems like DISCARD ALL is not resetting it, which I think is a
bug.

This is an interesting question. If the type isn't visible then it's
not visible to the query so

I don't think that's true -- the type could be in a different schema
from the table.

5. There's a theoretical invalidation problem. It might also be a
practical problem in some testing setups with long-lived
connections
that are recreating user-defined types.

UDT's seem to be a problem here which candidly have very little use
case for binary output.

I mostly agree with that, but it also might not be hard to support
UDTs. Is there a design problem here or is it "just a matter of code"?

I didn't try to solve it as Tom was OK with using a GUC. Using a
startup GUC is interesting,
but how would that work with pools where we want to reset the
connection when we return it and then
set the binary format on borrow ? By using a GUC when a client
borrows a connection from a pool the client
can reconfigure the oids it wants formatted in binary.

That's a good point. How common is it to share a connection pool
between different clients (some of which might support a binary format,
and others which don't)? And would the connection pool put connections
with and without the property in different pools?

I really hadn't considered supporting type names. I have asked Paul
Ramsey about PostGIS and he doesn't see PostGIS using this.

One of the things I like about Postgres is that the features all work
together, and that user-defined objects are generally as good as built-
in ones. Sometimes there's a reason to make a special case (e.g. syntax
support or something), but in this case it seems like we could support
user-defined types just fine, right? It's also just more friendly and
readable to use type names, especially if it's a GUC.

Regards,
Jeff Davis

#14

mmoncure@gmail.com

about 3 years ago

In reply to: Dave Cramer (#8)

Re: Request for comment on setting binary format output per session

On Mon, Mar 13, 2023 at 3:33 PM Dave Cramer <davecramer@gmail.com> wrote:

Dave Cramer

On Sat, 4 Mar 2023 at 19:39, Dave Cramer <davecramer@gmail.com> wrote:

On Sat, 4 Mar 2023 at 19:06, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jeff Davis <pgsql@j-davis.com> writes:

On Sat, 2023-03-04 at 18:04 -0500, Dave Cramer wrote:

Most of the clients know how to decode the builtin types. I'm not
sure there is a use case for binary encode types that the clients
don't have a priori knowledge of.

The client could, in theory, have a priori knowledge of a non-builtin
type.

I don't see what's "in theory" about that. There seems plenty of
use for binary I/O of, say, PostGIS types. Even for built-in types,
do we really want to encourage people to hard-wire their OIDs into
applications?

How does a client read these? I'm pretty narrowly focussed. The JDBC API
doesn't really have a way to read a non built-in type. There is a facility
to read a UDT, but the user would have to provide that transcoder. I guess
I'm curious how other clients read binary UDT's ?

I don't see a big problem with driving this off a GUC, but I think
it should be a list of type names not OIDs. We already have plenty
of precedent for dealing with that sort of thing; see search_path
for the canonical example. IIRC, there's similar caching logic
for temp_tablespaces.

I have no issue with allowing names, OID's were compact, but we could
easily support both

Attached is a preliminary patch that takes a list of OID's. I'd like to
know if this is going in the right direction.

Next step would be to deal with type names as opposed to OID's.
This will be a bit more challenging as type names are schema specific.

OIDs are a pain to deal with IMO. They will not survive a dump style
restore, and are hard to keep synchronized between databases...type names
don't have this problem. OIDs are an implementation artifact that ought
not need any extra dependency.

This seems like a protocol or even a driver issue rather than a GUC issue.
Why does the server need to care what format the client might want to
prefer on a query by query basis? I just don't see it. The resultformat
switch in libpq works pretty well, except that it's "all in" on getting
data from the server, with the dead simple workaround of casting to text
which might even be able to be managed from within the driver itself.

merlin

#15

pg@fastcrypt.com

about 3 years ago

In reply to: Merlin Moncure (#14)

Re: Request for comment on setting binary format output per session

On Mon, 20 Mar 2023 at 19:10, Merlin Moncure <mmoncure@gmail.com> wrote:

On Mon, Mar 13, 2023 at 3:33 PM Dave Cramer <davecramer@gmail.com> wrote:

Dave Cramer

On Sat, 4 Mar 2023 at 19:39, Dave Cramer <davecramer@gmail.com> wrote:

On Sat, 4 Mar 2023 at 19:06, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jeff Davis <pgsql@j-davis.com> writes:

On Sat, 2023-03-04 at 18:04 -0500, Dave Cramer wrote:

Most of the clients know how to decode the builtin types. I'm not
sure there is a use case for binary encode types that the clients
don't have a priori knowledge of.

The client could, in theory, have a priori knowledge of a non-builtin
type.

I don't see what's "in theory" about that. There seems plenty of
use for binary I/O of, say, PostGIS types. Even for built-in types,
do we really want to encourage people to hard-wire their OIDs into
applications?

How does a client read these? I'm pretty narrowly focussed. The JDBC API
doesn't really have a way to read a non built-in type. There is a facility
to read a UDT, but the user would have to provide that transcoder. I guess
I'm curious how other clients read binary UDT's ?

I don't see a big problem with driving this off a GUC, but I think
it should be a list of type names not OIDs. We already have plenty
of precedent for dealing with that sort of thing; see search_path
for the canonical example. IIRC, there's similar caching logic
for temp_tablespaces.

I have no issue with allowing names, OID's were compact, but we could
easily support both

Attached is a preliminary patch that takes a list of OID's. I'd like to
know if this is going in the right direction.

Next step would be to deal with type names as opposed to OID's.
This will be a bit more challenging as type names are schema specific.

OIDs are a pain to deal with IMO. They will not survive a dump style
restore, and are hard to keep synchronized between databases...type names
don't have this problem. OIDs are an implementation artifact that ought
not need any extra dependency.

AFAIK, OID's for built-in types don't change.
Clearly we need more thought on how to deal with UDT's

This seems like a protocol or even a driver issue rather than a GUC issue.
Why does the server need to care what format the client might want to
prefer on a query by query basis?

Actually this isn't a query by query basis. The point of this is that the
client wants all the results for given OID's in binary.

Show quoted text

I just don't see it. The resultformat switch in libpq works pretty well,
except that it's "all in" on getting data from the server, with the dead
simple workaround of casting to text which might even be able to be managed
from within the driver itself.

merlin

#16

pg@fastcrypt.com

about 3 years ago

In reply to: Tom Lane (#12)

Re: Request for comment on setting binary format output per session

Dave Cramer

On Mon, 20 Mar 2023 at 15:09, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Dave Cramer <davecramer@gmail.com> writes:

On Mon, 20 Mar 2023 at 13:05, Jeff Davis <pgsql@j-davis.com> wrote:

2. Easy to confuse psql:

CREATE TABLE a(d date, t timestamptz);
SET format_binary='25,1082,1184';
SELECT * FROM a;
d | t
---+---
! |
(1 row)

Well I'm guessing psql doesn't know how to read date or timestamptz in
binary. This is not a failing of the code.

What it is is a strong suggestion that controlling this via a GUC is
not a great choice. There are many inappropriate (wrong abstraction
level) ways to change a GUC and thereby break a client that's not
expecting binary output. I think Jeff's suggestion that we should
treat this as a protocol extension might be a good idea.

If I recall the protocol-extension design correctly, such a setting
could only be set at session start, which could be annoying --- at the
very least we'd have to tolerate entries for unrecognized data types,
since clients couldn't be expected to have checked the list against
the current server in advance.

As mentioned for connection pools we need to be able to set these after the
session starts.
I'm not sure how useful the protocol extension mechanism works given that
it can only be used on startup.

Show quoted text

regards, tom lane

#17

pg@fastcrypt.com

about 3 years ago

In reply to: Jeff Davis (#13)

Re: Request for comment on setting binary format output per session

On Mon, 20 Mar 2023 at 15:09, Jeff Davis <pgsql@j-davis.com> wrote:

On Mon, 2023-03-20 at 14:36 -0400, Dave Cramer wrote:

Thanks for the review. I'm curious what system you are running on as
I don't see any of these errors.

Are asserts enabled?

Well I'm guessing psql doesn't know how to read date or timestamptz
in binary. This is not a failing of the code.

It seems strange, and potentially dangerous, to send binary data to a
client that's not expecting it. It feels too easy to cause confusion by
changing the GUC mid-session.

Also, it seems like DISCARD ALL is not resetting it, which I think is a
bug.

Thanks yes, this is a bug

This is an interesting question. If the type isn't visible then it's
not visible to the query so

I don't think that's true -- the type could be in a different schema
from the table.

Good point. This seems to be the very difficult part.

5. There's a theoretical invalidation problem. It might also be a
practical problem in some testing setups with long-lived
connections
that are recreating user-defined types.

UDT's seem to be a problem here which candidly have very little use
case for binary output.

I mostly agree with that, but it also might not be hard to support
UDTs. Is there a design problem here or is it "just a matter of code"?

I didn't try to solve it as Tom was OK with using a GUC. Using a
startup GUC is interesting,
but how would that work with pools where we want to reset the
connection when we return it and then
set the binary format on borrow ? By using a GUC when a client
borrows a connection from a pool the client
can reconfigure the oids it wants formatted in binary.

That's a good point. How common is it to share a connection pool
between different clients (some of which might support a binary format,
and others which don't)? And would the connection pool put connections
with and without the property in different pools?

For JAVA pools it's probably OK, but for pools like pgbouncer we have no
control of who is going to get the connection next.

Show quoted text

I really hadn't considered supporting type names. I have asked Paul
Ramsey about PostGIS and he doesn't see PostGIS using this.

One of the things I like about Postgres is that the features all work
together, and that user-defined objects are generally as good as built-
in ones. Sometimes there's a reason to make a special case (e.g. syntax
support or something), but in this case it seems like we could support
user-defined types just fine, right? It's also just more friendly and
readable to use type names, especially if it's a GUC.

Regards,
Jeff Davis

#18

mmoncure@gmail.com

about 3 years ago

In reply to: Dave Cramer (#15)

Re: Request for comment on setting binary format output per session

On Mon, Mar 20, 2023 at 7:11 PM Dave Cramer <davecramer@gmail.com> wrote:

On Mon, 20 Mar 2023 at 19:10, Merlin Moncure <mmoncure@gmail.com> wrote:

On Mon, Mar 13, 2023 at 3:33 PM Dave Cramer <davecramer@gmail.com> wrote:

OIDs are a pain to deal with IMO. They will not survive a dump style

restore, and are hard to keep synchronized between databases...type names
don't have this problem. OIDs are an implementation artifact that ought
not need any extra dependency.

AFAIK, OID's for built-in types don't change.
Clearly we need more thought on how to deal with UDT's

Yeah. Not having a solution that handles arrays and composites though
would feel pretty incomplete since they would be the one of the main
beneficiaries from a performance standpoint. I guess minimally you'd
need to expose some mechanic to look up oids, but being able to
specify "foo"."bar", in the GUC would be pretty nice (albeit a lot more
work).

This seems like a protocol or even a driver issue rather than a GUC issue.

Why does the server need to care what format the client might want to
prefer on a query by query basis?

Actually this isn't a query by query basis. The point of this is that the
client wants all the results for given OID's in binary.

Yep. Your rationale is starting to click. How would this interact with
existing code bases? I get that JDBC is the main target, but how does this
interact with libpq code that explicitly sets resultformat? Perhaps the
answer should be as it shouldn't change documented behavior, and a
hypothetical resultformat=2 could be reserved to default to text but allow
for server control, and 3 as the same but default to binary.

merlin

#19

pg@fastcrypt.com

about 3 years ago

In reply to: Merlin Moncure (#18)

Re: Request for comment on setting binary format output per session

On Tue, 21 Mar 2023 at 07:35, Merlin Moncure <mmoncure@gmail.com> wrote:

On Mon, Mar 20, 2023 at 7:11 PM Dave Cramer <davecramer@gmail.com> wrote:

On Mon, 20 Mar 2023 at 19:10, Merlin Moncure <mmoncure@gmail.com> wrote:

On Mon, Mar 13, 2023 at 3:33 PM Dave Cramer <davecramer@gmail.com>
wrote:

OIDs are a pain to deal with IMO. They will not survive a dump style

restore, and are hard to keep synchronized between databases...type names
don't have this problem. OIDs are an implementation artifact that ought
not need any extra dependency.

AFAIK, OID's for built-in types don't change.
Clearly we need more thought on how to deal with UDT's

Yeah. Not having a solution that handles arrays and composites though
would feel pretty incomplete since they would be the one of the main
beneficiaries from a performance standpoint.

I don't think arrays of built-in types are a problem; drivers already know
how to deal with these.

I guess minimally you'd need to expose some mechanic to look up oids, but
being able to specify "foo"."bar", in the GUC would be pretty nice (albeit
a lot more work).

As Jeff mentioned there is a visibility problem if the search path is
changed. The simplest solution IMO is to look up the OID at the time the
format is requested and use the OID going forward to format the output as
binary. If the search path changes and a type with the same name is now
first in the search path then the data would be returned in text.

This seems like a protocol or even a driver issue rather than a GUC

issue. Why does the server need to care what format the client might want
to prefer on a query by query basis?

Actually this isn't a query by query basis. The point of this is that the
client wants all the results for given OID's in binary.

Yep. Your rationale is starting to click. How would this interact with
existing code bases?

Actually JDBC wasn't the first to ask for this. Default result formats
should be settable per session · postgresql-interfaces/enhancement-ideas ·
Discussion #5 (github.com)
<https://github.com/postgresql-interfaces/enhancement-ideas/discussions/5> I've
tested it with JDBC and it requires no code changes on our end. Jack tested
it and it required no code changes on his end either. He did some
performance tests and found "At 100 rows the text format takes 48% longer
than the binary format."
https://github.com/postgresql-interfaces/enhancement-ideas/discussions/5#discussioncomment-3188599

I get that JDBC is the main target, but how does this interact with libpq

code that explicitly sets resultformat?

Honestly I have no idea how it would function with libpq. I presume if the
client did not request binary format then things would work as they do
today.

Dave

Show quoted text

#20

mmoncure@gmail.com

about 3 years ago

In reply to: Dave Cramer (#19)

Re: Request for comment on setting binary format output per session

On Tue, Mar 21, 2023 at 8:22 AM Dave Cramer <davecramer@gmail.com> wrote:

On Tue, 21 Mar 2023 at 07:35, Merlin Moncure <mmoncure@gmail.com> wrote:

On Mon, Mar 20, 2023 at 7:11 PM Dave Cramer <davecramer@gmail.com> wrote:

On Mon, 20 Mar 2023 at 19:10, Merlin Moncure <mmoncure@gmail.com> wrote:

On Mon, Mar 13, 2023 at 3:33 PM Dave Cramer <davecramer@gmail.com>
wrote:

OIDs are a pain to deal with IMO. They will not survive a dump style

restore, and are hard to keep synchronized between databases...type names
don't have this problem. OIDs are an implementation artifact that ought
not need any extra dependency.

AFAIK, OID's for built-in types don't change.
Clearly we need more thought on how to deal with UDT's

Yeah. Not having a solution that handles arrays and composites though
would feel pretty incomplete since they would be the one of the main
beneficiaries from a performance standpoint.

I don't think arrays of built-in types are a problem; drivers already know
how to deal with these.

I guess minimally you'd need to expose some mechanic to look up oids, but
being able to specify "foo"."bar", in the GUC would be pretty nice (albeit
a lot more work).

As Jeff mentioned there is a visibility problem if the search path is
changed.

Only if the name is not fully qualified. By allowing OID to bypass
visibility, it stands to reason visibility ought to be bypassed for type
requests as well, or at least be able to be. If we are setting things in
GUC, that suggests we can establish things in postgresql.conf, and oids
feel out of place there.

Yep. Your rationale is starting to click. How would this interact with

existing code bases?

Actually JDBC wasn't the first to ask for this. Default result formats
should be settable per session · postgresql-interfaces/enhancement-ideas ·
Discussion #5 (github.com)
<https://github.com/postgresql-interfaces/enhancement-ideas/discussions/5> I've
tested it with JDBC and it requires no code changes on our end. Jack tested
it and it required no code changes on his end either. He did some
performance tests and found "At 100 rows the text format takes 48% longer
than the binary format."
https://github.com/postgresql-interfaces/enhancement-ideas/discussions/5#discussioncomment-3188599

Yeah, the general need is very clear IMO.

I get that JDBC is the main target, but how does this interact with libpq

code that explicitly sets resultformat?

Honestly I have no idea how it would function with libpq. I presume if the
client did not request binary format then things would work as they do
today.

I see your argument here, but IMO this is another can of nudge away from
GUC, unless you're willing to establish that behavior. Thinking here is
that the GUC wouldn't do anything for libpq, uses cases, and couldn't,
since resultformat would be overriding the behavior in all interesting
cases...it seems odd to implement server side specified behavior that the
client library doesn't implement.

merlin

#21

pgsql@j-davis.com

about 3 years ago

In reply to: Dave Cramer (#17)

#22

pg@fastcrypt.com

about 3 years ago

In reply to: Jeff Davis (#21)

#23

pgsql@j-davis.com

about 3 years ago

In reply to: Dave Cramer (#19)

#24

peter_e@gmx.net

about 3 years ago

In reply to: Jeff Davis (#2)

#25

peter_e@gmx.net

about 3 years ago

In reply to: Jeff Davis (#9)

#26

peter_e@gmx.net

about 3 years ago

In reply to: Jeff Davis (#9)

#27

pg@fastcrypt.com

about 3 years ago

In reply to: Jeff Davis (#23)

#28

pg@fastcrypt.com

about 3 years ago

In reply to: Tom Lane (#12)

#29

pgsql@j-davis.com

about 3 years ago

In reply to: Peter Eisentraut (#26)

#30

pgsql@j-davis.com

about 3 years ago

In reply to: Peter Eisentraut (#24)

#31

tgl@sss.pgh.pa.us

about 3 years ago

In reply to: Jeff Davis (#29)

#32

pgsql@j-davis.com

about 3 years ago

In reply to: Tom Lane (#31)

#33

pg@fastcrypt.com

about 3 years ago

In reply to: Jeff Davis (#32)

#34

mmoncure@gmail.com

about 3 years ago

In reply to: Jeff Davis (#23)

#35

pgsql@j-davis.com

about 3 years ago

In reply to: Merlin Moncure (#34)

#36

pgsql@j-davis.com

about 3 years ago

In reply to: Dave Cramer (#33)

#37

pg@fastcrypt.com

about 3 years ago

In reply to: Jeff Davis (#36)

#38

pgsql@j-davis.com

about 3 years ago

In reply to: Dave Cramer (#37)

#39

pg@fastcrypt.com

about 3 years ago

In reply to: Jeff Davis (#38)

#40

tgl@sss.pgh.pa.us

about 3 years ago

In reply to: Dave Cramer (#39)

#41

pg@fastcrypt.com

about 3 years ago

In reply to: Tom Lane (#40)

#42

tgl@sss.pgh.pa.us

about 3 years ago

In reply to: Dave Cramer (#41)

#43

pg@fastcrypt.com

about 3 years ago

In reply to: Tom Lane (#42)

#44

Gregory Stark (as CFM)

stark.cfm@gmail.com

about 3 years ago

In reply to: Dave Cramer (#43)

#45

pgsql@j-davis.com

about 3 years ago

In reply to: Dave Cramer (#43)

#46

pg@fastcrypt.com

about 3 years ago

In reply to: Jeff Davis (#45)

#47

pgsql@j-davis.com

about 3 years ago

In reply to: Dave Cramer (#46)

#48

mmoncure@gmail.com

about 3 years ago

In reply to: Jeff Davis (#47)

#49

pgsql@j-davis.com

about 3 years ago

In reply to: Merlin Moncure (#48)

#50

pg@fastcrypt.com

about 3 years ago

In reply to: Jeff Davis (#49)

#51

pg@fastcrypt.com

about 3 years ago

In reply to: Dave Cramer (#50)

#52

mmoncure@gmail.com

about 3 years ago

In reply to: Dave Cramer (#51)

#53

robertmhaas@gmail.com

about 3 years ago

In reply to: Jeff Davis (#47)

#54

pgsql@j-davis.com

about 3 years ago

In reply to: Robert Haas (#53)

#55

robertmhaas@gmail.com

about 3 years ago

In reply to: Jeff Davis (#54)

#56

tgl@sss.pgh.pa.us

about 3 years ago

In reply to: Robert Haas (#55)

#57

robertmhaas@gmail.com

about 3 years ago

In reply to: Tom Lane (#56)

#58

tgl@sss.pgh.pa.us

about 3 years ago

In reply to: Robert Haas (#57)

#59

robertmhaas@gmail.com

about 3 years ago

In reply to: Tom Lane (#58)

#60

pg@fastcrypt.com

about 3 years ago

In reply to: Robert Haas (#59)

#61

Bruce Momjian

bruce@momjian.us

about 3 years ago

In reply to: Tom Lane (#56)

#62

robertmhaas@gmail.com

about 3 years ago

In reply to: Bruce Momjian (#61)

#63

pg@fastcrypt.com

about 3 years ago

In reply to: Dave Cramer (#60)

#64

pg@fastcrypt.com

about 3 years ago

In reply to: Robert Haas (#53)

#65

mmoncure@gmail.com

about 3 years ago

In reply to: Dave Cramer (#63)

#66

pg@fastcrypt.com

about 3 years ago

In reply to: Merlin Moncure (#65)

#67

pg@fastcrypt.com

about 3 years ago

In reply to: Dave Cramer (#66)

#68

Daniel Gustafsson

daniel@yesql.se

almost 3 years ago

In reply to: Dave Cramer (#67)

#69

pg@fastcrypt.com

almost 3 years ago

In reply to: Daniel Gustafsson (#68)

#70

peter_e@gmx.net

over 2 years ago

In reply to: Dave Cramer (#69)

#71

mmoncure@gmail.com

over 2 years ago

In reply to: Peter Eisentraut (#70)

#72

pg@fastcrypt.com

over 2 years ago

In reply to: Peter Eisentraut (#70)

#73

robertmhaas@gmail.com

over 2 years ago

In reply to: Peter Eisentraut (#70)

#74

peter_e@gmx.net

over 2 years ago

In reply to: Merlin Moncure (#71)

#75

peter_e@gmx.net

over 2 years ago

In reply to: Dave Cramer (#72)

#76

peter_e@gmx.net

over 2 years ago

In reply to: Robert Haas (#73)

#77

postgres@jeltef.nl

over 2 years ago

In reply to: Robert Haas (#73)

#78

postgres@jeltef.nl

over 2 years ago

In reply to: Peter Eisentraut (#75)

#79

robertmhaas@gmail.com

over 2 years ago

In reply to: Jelte Fennema-Nio (#78)

#80

pg@fastcrypt.com

over 2 years ago

In reply to: Robert Haas (#79)

#81

pgsql@j-davis.com

over 2 years ago

In reply to: Robert Haas (#73)

#82

postgres@jeltef.nl

over 2 years ago

In reply to: Robert Haas (#79)

#83

postgres@jeltef.nl

over 2 years ago

In reply to: Jeff Davis (#81)

#84

postgres@jeltef.nl

over 2 years ago

In reply to: Dave Cramer (#80)

#85

pg@fastcrypt.com

over 2 years ago

In reply to: Jelte Fennema-Nio (#84)

#86

robertmhaas@gmail.com

over 2 years ago

In reply to: Jeff Davis (#81)

#87

pg@fastcrypt.com

over 2 years ago

In reply to: Robert Haas (#86)

#88

robertmhaas@gmail.com

over 2 years ago

In reply to: Jelte Fennema-Nio (#83)

#89