request for database identifier in the startup packet

Started by Dave Cramerabout 2 years ago9 messageshackers

pg@fastcrypt.com

about 2 years ago

Greetings,

The JDBC driver is currently keeping a per connection cache of types in the
driver. We are seeing cases where the number of columns is quite high. In
one case Prevent fetchFieldMetaData() from being run when unnecessary. ·
Issue #3241 · pgjdbc/pgjdbc (github.com)
<https://github.com/pgjdbc/pgjdbc/issues/3241> 2.6 Million columns.

If we knew that we were connecting to the same database we could use a
single cache across connections.

I think we would require a server/database identifier in the startup
message.

Dave Cramer

David G. Johnston

david.g.johnston@gmail.com

about 2 years ago

In reply to: Dave Cramer (#1)

Re: request for database identifier in the startup packet

On Thursday, May 9, 2024, Dave Cramer <davecramer@gmail.com> wrote:

Greetings,

The JDBC driver is currently keeping a per connection cache of types in
the driver. We are seeing cases where the number of columns is quite high.
In one case Prevent fetchFieldMetaData() from being run when unnecessary.
· Issue #3241 · pgjdbc/pgjdbc (github.com)
<https://github.com/pgjdbc/pgjdbc/issues/3241> 2.6 Million columns.

If we knew that we were connecting to the same database we could use a
single cache across connections.

I think we would require a server/database identifier in the startup
message.

I feel like pgbouncer ruins this plan.

But maybe you can construct a lookup key from some combination of data
provided by these functions:
https://www.postgresql.org/docs/current/functions-info.html#FUNCTIONS-INFO-SESSION

David J.

Robert Haas

robertmhaas@gmail.com

about 2 years ago

In reply to: Dave Cramer (#1)

Re: request for database identifier in the startup packet

On Thu, May 9, 2024 at 8:06 AM Dave Cramer <davecramer@gmail.com> wrote:

The JDBC driver is currently keeping a per connection cache of types in the driver. We are seeing cases where the number of columns is quite high. In one case Prevent fetchFieldMetaData() from being run when unnecessary. · Issue #3241 · pgjdbc/pgjdbc (github.com) 2.6 Million columns.

If we knew that we were connecting to the same database we could use a single cache across connections.

I think we would require a server/database identifier in the startup message.

I understand the desire to share the cache, but not why that would
require any kind of change to the wire protocol.

--
Robert Haas
EDB: http://www.enterprisedb.com

Dave Cramer

pg@fastcrypt.com

about 2 years ago

In reply to: Robert Haas (#3)

Re: request for database identifier in the startup packet

Dave Cramer

On Thu, 9 May 2024 at 12:22, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, May 9, 2024 at 8:06 AM Dave Cramer <davecramer@gmail.com> wrote:

The JDBC driver is currently keeping a per connection cache of types in

the driver. We are seeing cases where the number of columns is quite high.
In one case Prevent fetchFieldMetaData() from being run when unnecessary. ·
Issue #3241 · pgjdbc/pgjdbc (github.com) 2.6 Million columns.

If we knew that we were connecting to the same database we could use a

single cache across connections.

I think we would require a server/database identifier in the startup

message.

I understand the desire to share the cache, but not why that would
require any kind of change to the wire protocol.

The server identity is actually useful for many things such as knowing

which instance of a cluster you are connected to.
For the cache however we can't use the IP address to determine which server
we are connected to as we could be connected to a pooler.
Knowing exactly which server/database makes it relatively easy to have a
common cache across connections. Getting that in the startup message seems
like a good place

Dave

Andres Freund

andres@anarazel.de

about 2 years ago

In reply to: Dave Cramer (#4)

Re: request for database identifier in the startup packet

Hi,

On 2024-05-09 14:20:49 -0400, Dave Cramer wrote:

On Thu, 9 May 2024 at 12:22, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, May 9, 2024 at 8:06 AM Dave Cramer <davecramer@gmail.com> wrote:

The JDBC driver is currently keeping a per connection cache of types in

the driver. We are seeing cases where the number of columns is quite high.
In one case Prevent fetchFieldMetaData() from being run when unnecessary. ·
Issue #3241 · pgjdbc/pgjdbc (github.com) 2.6 Million columns.

If we knew that we were connecting to the same database we could use a

single cache across connections.

I think we would require a server/database identifier in the startup

message.

I understand the desire to share the cache, but not why that would
require any kind of change to the wire protocol.

The server identity is actually useful for many things such as knowing

which instance of a cluster you are connected to.
For the cache however we can't use the IP address to determine which server
we are connected to as we could be connected to a pooler.
Knowing exactly which server/database makes it relatively easy to have a
common cache across connections. Getting that in the startup message seems
like a good place

ISTM that you could just as well query the information you'd like after
connecting. And that's going to be a lot more flexible than having to have
precisely the right information in the startup message, and most clients not
needing it.

Greetings,

Andres Freund

Robert Haas

robertmhaas@gmail.com

about 2 years ago

In reply to: Andres Freund (#5)

Re: request for database identifier in the startup packet

On Thu, May 9, 2024 at 3:14 PM Andres Freund <andres@anarazel.de> wrote:

ISTM that you could just as well query the information you'd like after
connecting. And that's going to be a lot more flexible than having to have
precisely the right information in the startup message, and most clients not
needing it.

I agree with this.

--
Robert Haas
EDB: http://www.enterprisedb.com

Dave Cramer

pg@fastcrypt.com

about 2 years ago

In reply to: Robert Haas (#6)

Re: request for database identifier in the startup packet

On Thu, 9 May 2024 at 15:19, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, May 9, 2024 at 3:14 PM Andres Freund <andres@anarazel.de> wrote:

ISTM that you could just as well query the information you'd like after
connecting. And that's going to be a lot more flexible than having to

have

precisely the right information in the startup message, and most clients

not

needing it.

I agree with this.

Well other than the extra round trip.

Thanks,
Dave

Robert Haas

robertmhaas@gmail.com

about 2 years ago

In reply to: Dave Cramer (#7)

Re: request for database identifier in the startup packet

On Thu, May 9, 2024 at 3:33 PM Dave Cramer <davecramer@gmail.com> wrote:

On Thu, 9 May 2024 at 15:19, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, May 9, 2024 at 3:14 PM Andres Freund <andres@anarazel.de> wrote:

ISTM that you could just as well query the information you'd like after
connecting. And that's going to be a lot more flexible than having to have
precisely the right information in the startup message, and most clients not
needing it.

I agree with this.

Well other than the extra round trip.

I mean, sure, but we can't avoid that for everyone for everything.
There might be some way of doing something like this with, for
example, the infrastructure that was proposed to dynamically add stuff
to the list of PGC_REPORT GUCs, if the values you need are GUCs
already, or were made so. But I think it's just not workable to
unconditionally add a bunch of things to the startup packet. It'll
just grow and grow.

--
Robert Haas
EDB: http://www.enterprisedb.com

Dave Cramer

pg@fastcrypt.com

about 2 years ago

In reply to: Robert Haas (#8)

Re: request for database identifier in the startup packet

On Thu, 9 May 2024 at 15:39, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, May 9, 2024 at 3:33 PM Dave Cramer <davecramer@gmail.com> wrote:

On Thu, 9 May 2024 at 15:19, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, May 9, 2024 at 3:14 PM Andres Freund <andres@anarazel.de>

wrote:

ISTM that you could just as well query the information you'd like

after

connecting. And that's going to be a lot more flexible than having to

have

precisely the right information in the startup message, and most

clients not

needing it.

I agree with this.

Well other than the extra round trip.

I mean, sure, but we can't avoid that for everyone for everything.
There might be some way of doing something like this with, for
example, the infrastructure that was proposed to dynamically add stuff
to the list of PGC_REPORT GUCs, if the values you need are GUCs
already, or were made so. But I think it's just not workable to
unconditionally add a bunch of things to the startup packet. It'll
just grow and grow.

I don't think this is unconditional. These are real world situations where
having this information is useful.
That said, adding them everytime I ask for them would end up growing
uncontrollably. This seems like a decent discussion to have with others.

Dave