Named Prepared statement problems and possible solutions

Started by Dave Cramerover 2 years ago17 messages
#1Dave Cramer
davecramer@gmail.com

Greetings,

At pgcon last week I was speaking to some people about the problem we have
with connection pools and named prepared statements.

For context pgjdbc (and others) use un-named statements and then switch to
named statements after using the statement N (default 5) times. In session
mode this is not a problem. When the connection is closed by the
application the pools generally issue "DISCARD ALL" and close all prepared
statements. The next time the connection is opened the statement is
prepared and all works as it should.

However one of the more interesting use cases for pgbouncer is to use
"TRANSACTION MODE" to manage idle sessions. In transaction mode the
connection is returned to the pool after each transaction. There are usage
patterns in large applications where clients have client pools and
subsequently have large numbers of connections open. Sometimes in the
thousands, unfortunately many of these are idle connections. Using
transaction mode reduces the number of real connections to the database in
many cases by orders of magnitude.

Unfortunately this is incompatible with named prepared statements. From the
client's point of view they have one session and named prepared statements
are session objects. From one transaction to the next the physical
connection can change along with the attached prepared statements.

The idea that was discussed is when we prepare the statement we cache it in
a statement cache and return a queryid much like the queryid used in
pg_stat_statements. Instead of executing the statement name we would
execute the queryid.

If the queryid did not exist, attempting to execute it would cause an error
and cause the running transaction to fail. Retrieving the statement from
the query cache would have to happen before the attempt to execute it and
return an error to the client subsequently the client could re-prepare the
statement and execute. This would have to happen in such a way as to not
cause the transaction to fail.

The one other idea that was proposed was to cache the statements in the
client. However this does nothing to address the issue of managing idle
connections.

Regards,
Dave Cramer

#2Konstantin Knizhnik
knizhnik@garret.ru
In reply to: Dave Cramer (#1)
Re: Named Prepared statement problems and possible solutions

On 07.06.2023 10:48 PM, Dave Cramer wrote:

Greetings,

At pgcon last week I was speaking to some people about the problem we
have with connection pools and named prepared statements.

For context pgjdbc (and others) use un-named statements and then
switch to named statements after using the statement N (default 5)
times. In session mode this is not a problem. When the connection is
closed by the application the pools generally issue "DISCARD ALL" and
close all prepared statements. The next time the connection is opened
the statement is prepared and all works as it should.

However one of the more interesting use cases for pgbouncer is to use
"TRANSACTION MODE" to manage idle sessions. In transaction mode the
connection is returned to the pool after each transaction. There are
usage patterns in large applications where clients have client pools
and subsequently have large numbers of connections open. Sometimes in
the thousands, unfortunately many of these are idle connections. Using
transaction mode reduces the number of real connections to the
database in many cases by orders of magnitude.

Unfortunately this is incompatible with named prepared statements.
From the client's point of view they have one session and named
prepared statements are session objects. From one transaction to the
next the physical connection can change along with the attached
prepared statements.

The idea that was discussed is when we prepare the statement we cache
it in a statement cache and return a queryid much like the queryid
used in pg_stat_statements. Instead of executing the statement name we
would execute the queryid.

If the queryid did not exist, attempting to execute it would cause an
error and cause the running transaction to fail. Retrieving the
statement from the query cache would have to happen before the attempt
to execute it and return an error to the client subsequently the
client could re-prepare the statement and execute. This would have to
happen in such a way as to not cause the transaction to fail.

The one other idea that was proposed was to cache the statements in
the client. However this does nothing to address the issue of managing
idle connections.

Regards,
Dave Cramer

There is a PR with support of prepared statement support to pgbouncer:
https://github.com/pgbouncer/pgbouncer/pull/845
any feedback, reviews and suggestions are welcome.

#3Dave Cramer
davecramer@gmail.com
In reply to: Konstantin Knizhnik (#2)
Re: Named Prepared statement problems and possible solutions

Hi Konstantin,

Yes, I ran into Euler at pgcon and he mentioned this. I intend to test it.
I'd still like to see my proposal in the server.

Dave Cramer

On Thu, 8 Jun 2023 at 02:15, Konstantin Knizhnik <knizhnik@garret.ru> wrote:

Show quoted text

On 07.06.2023 10:48 PM, Dave Cramer wrote:

Greetings,

At pgcon last week I was speaking to some people about the problem we have
with connection pools and named prepared statements.

For context pgjdbc (and others) use un-named statements and then switch to
named statements after using the statement N (default 5) times. In session
mode this is not a problem. When the connection is closed by the
application the pools generally issue "DISCARD ALL" and close all prepared
statements. The next time the connection is opened the statement is
prepared and all works as it should.

However one of the more interesting use cases for pgbouncer is to use
"TRANSACTION MODE" to manage idle sessions. In transaction mode the
connection is returned to the pool after each transaction. There are usage
patterns in large applications where clients have client pools and
subsequently have large numbers of connections open. Sometimes in the
thousands, unfortunately many of these are idle connections. Using
transaction mode reduces the number of real connections to the database in
many cases by orders of magnitude.

Unfortunately this is incompatible with named prepared statements. From
the client's point of view they have one session and named prepared
statements are session objects. From one transaction to the next the
physical connection can change along with the attached prepared statements.

The idea that was discussed is when we prepare the statement we cache it
in a statement cache and return a queryid much like the queryid used in
pg_stat_statements. Instead of executing the statement name we would
execute the queryid.

If the queryid did not exist, attempting to execute it would cause an
error and cause the running transaction to fail. Retrieving the statement
from the query cache would have to happen before the attempt to execute it
and return an error to the client subsequently the client could re-prepare
the statement and execute. This would have to happen in such a way as to
not cause the transaction to fail.

The one other idea that was proposed was to cache the statements in the
client. However this does nothing to address the issue of managing idle
connections.

Regards,
Dave Cramer

There is a PR with support of prepared statement support to pgbouncer:
https://github.com/pgbouncer/pgbouncer/pull/845
any feedback, reviews and suggestions are welcome.

#4Jan Wieck
jan@wi3ck.info
In reply to: Konstantin Knizhnik (#2)
Re: Named Prepared statement problems and possible solutions

On 6/8/23 02:15, Konstantin Knizhnik wrote:

There is a PR with support of prepared statement support to pgbouncer:
https://github.com/pgbouncer/pgbouncer/pull/845
any feedback, reviews and suggestions are welcome.

I was about to say that the support would have to come from the pooler
as it is possible to have multiple applications in different languages
connecting to the same pool(s).

I can certainly give this a try, possibly over the weekend. I have a
TPC-C that can use prepared statements plus pause/resume. That might be
a good stress for it.

Best Regards, Jan

#5Dave Cramer
davecramer@gmail.com
In reply to: Jan Wieck (#4)
Re: Named Prepared statement problems and possible solutions

On Thu, Jun 8, 2023 at 8:43 AM Jan Wieck <jan@wi3ck.info> wrote:

On 6/8/23 02:15, Konstantin Knizhnik wrote:

There is a PR with support of prepared statement support to pgbouncer:
https://github.com/pgbouncer/pgbouncer/pull/845
any feedback, reviews and suggestions are welcome.

I was about to say that the support would have to come from the pooler
as it is possible to have multiple applications in different languages
connecting to the same pool(s).

Why from the pooler? If it were done at the server every client could use
it?

Dave

--

Dave Cramer

#6Konstantin Knizhnik
knizhnik@garret.ru
In reply to: Jan Wieck (#4)
Re: Named Prepared statement problems and possible solutions

On 08.06.2023 3:43 PM, Jan Wieck wrote:

On 6/8/23 02:15, Konstantin Knizhnik wrote:

There is a PR with support of prepared statement support to pgbouncer:
https://github.com/pgbouncer/pgbouncer/pull/845
any feedback, reviews and suggestions are welcome.

I was about to say that the support would have to come from the pooler
as it is possible to have multiple applications in different languages
connecting to the same pool(s)

Ideally, support should be provided by both sides: only pooler knows
mapping between clients and postgres backends and only server knows
which queries require session semantic and which not (in principle it is
possible to make connection pooler to determine it, but it is very
non-trivial).

.

I can certainly give this a try, possibly over the weekend. I have a
TPC-C that can use prepared statements plus pause/resume. That might
be a good stress for it.

By the way, I have done some small benchmarking of different connection
poolers for Postgres.
Benchmark was very simple: I just create small pgbench database with
scale 10 and then
run read-only queries with 100 clients:

pgbench -c 100 -P 10 -T 100 -S -M prepared postgres

Number of connections to the database was limited in an all pooler
configurations to 10. I have tested only transaction mode. If pooler
supports prepared statements, I have also tested them.
Just for reference I also include results with direct connection to
Postgres.
All benchamrking was done at my notebook, so it is not quite
representative scenario.

Direct:
Connections Prepared TPS
10 yes 135507
10 no 73218
100 yes 79042
100 no 59245

Pooler: (100 client connections, 10 server connections, transaction mode)
Pooler Prepared TPS
pgbouncer no 65029
pgbouncer-ps no 65570
pgbouncer-ps yes 65825
odyssey yes 18351
odyssey no 21299
pgagrol no 29673
pgcat no 23247

#7Jan Wieck
jan@wi3ck.info
In reply to: Dave Cramer (#5)
Re: Named Prepared statement problems and possible solutions

On 6/8/23 09:21, Dave Cramer wrote:

On Thu, Jun 8, 2023 at 8:43 AM Jan Wieck <jan@wi3ck.info
<mailto:jan@wi3ck.info>> wrote:

On 6/8/23 02:15, Konstantin Knizhnik wrote:

There is a PR with support of prepared statement support to

pgbouncer:

https://github.com/pgbouncer/pgbouncer/pull/845

<https://github.com/pgbouncer/pgbouncer/pull/845&gt;

any feedback, reviews and suggestions are welcome.

I was about to say that the support would have to come from the pooler
as it is possible to have multiple applications in different languages
connecting to the same pool(s).

Why from the pooler? If it were done at the server every client could
use it?

The server doesn't know about all the clients of the pooler, does it? It
has no way of telling if/when a client disconnects from the pooler.

Jan

#8Jan Wieck
jan@wi3ck.info
In reply to: Jan Wieck (#7)
Re: Named Prepared statement problems and possible solutions

On 6/8/23 09:53, Jan Wieck wrote:

On 6/8/23 09:21, Dave Cramer wrote:
The server doesn't know about all the clients of the pooler, does it? It
has no way of telling if/when a client disconnects from the pooler.

Another problem that complicates doing it in the server is that the
information require to (re-)prepare a statement in a backend that
currently doesn't have it needs to be kept in shared memory. This
includes the query string itself. Doing that without shared memory in a
pooler that is multi-threaded or based on async-IO is much simpler and
allows for easy ballooning.

Jan

#9Dave Cramer
davecramer@gmail.com
In reply to: Jan Wieck (#7)
Re: Named Prepared statement problems and possible solutions

On Thu, 8 Jun 2023 at 09:53, Jan Wieck <jan@wi3ck.info> wrote:

On 6/8/23 09:21, Dave Cramer wrote:

On Thu, Jun 8, 2023 at 8:43 AM Jan Wieck <jan@wi3ck.info
<mailto:jan@wi3ck.info>> wrote:

On 6/8/23 02:15, Konstantin Knizhnik wrote:

There is a PR with support of prepared statement support to

pgbouncer:

https://github.com/pgbouncer/pgbouncer/pull/845

<https://github.com/pgbouncer/pgbouncer/pull/845&gt;

any feedback, reviews and suggestions are welcome.

I was about to say that the support would have to come from the

pooler

as it is possible to have multiple applications in different

languages

connecting to the same pool(s).

Why from the pooler? If it were done at the server every client could
use it?

The server doesn't know about all the clients of the pooler, does it? It
has no way of telling if/when a client disconnects from the pooler.

Why does it have to know if the client disconnects ? It just keeps a cache
of prepared statements.
In large apps it is very likely there will be another client wanting to use
the statement

Dave

Show quoted text
#10Dave Cramer
davecramer@gmail.com
In reply to: Jan Wieck (#8)
Re: Named Prepared statement problems and possible solutions

On Thu, 8 Jun 2023 at 10:31, Jan Wieck <jan@wi3ck.info> wrote:

On 6/8/23 09:53, Jan Wieck wrote:

On 6/8/23 09:21, Dave Cramer wrote:
The server doesn't know about all the clients of the pooler, does it? It
has no way of telling if/when a client disconnects from the pooler.

Another problem that complicates doing it in the server is that the
information require to (re-)prepare a statement in a backend that
currently doesn't have it needs to be kept in shared memory. This
includes the query string itself. Doing that without shared memory in a
pooler that is multi-threaded or based on async-IO is much simpler and
allows for easy ballooning.

I don't expect the server to re-prepare the statement. If the server
responds with "statement doesn't exist" the client would send a prepare.

Dave

#11Jan Wieck
jan@wi3ck.info
In reply to: Dave Cramer (#10)
Re: Named Prepared statement problems and possible solutions

On 6/8/23 10:56, Dave Cramer wrote:

On Thu, 8 Jun 2023 at 10:31, Jan Wieck <jan@wi3ck.info
<mailto:jan@wi3ck.info>> wrote:

On 6/8/23 09:53, Jan Wieck wrote:

On 6/8/23 09:21, Dave Cramer wrote:
The server doesn't know about all the clients of the pooler, does

it? It

has no way of telling if/when a client disconnects from the pooler.

Another problem that complicates doing it in the server is that the
information require to (re-)prepare a statement in a backend that
currently doesn't have it needs to be kept in shared memory. This
includes the query string itself. Doing that without shared memory in a
pooler that is multi-threaded or based on async-IO is much simpler and
allows for easy ballooning.

I don't expect the server to re-prepare the statement. If the server
responds with "statement doesn't exist" the client would send a prepare.

Are you proposing a new libpq protocol version?

Jan

#12Dave Cramer
davecramer@gmail.com
In reply to: Jan Wieck (#11)
Re: Named Prepared statement problems and possible solutions

On Thu, 8 Jun 2023 at 11:15, Jan Wieck <jan@wi3ck.info> wrote:

On 6/8/23 10:56, Dave Cramer wrote:

On Thu, 8 Jun 2023 at 10:31, Jan Wieck <jan@wi3ck.info
<mailto:jan@wi3ck.info>> wrote:

On 6/8/23 09:53, Jan Wieck wrote:

On 6/8/23 09:21, Dave Cramer wrote:
The server doesn't know about all the clients of the pooler, does

it? It

has no way of telling if/when a client disconnects from the

pooler.

Another problem that complicates doing it in the server is that the
information require to (re-)prepare a statement in a backend that
currently doesn't have it needs to be kept in shared memory. This
includes the query string itself. Doing that without shared memory

in a

pooler that is multi-threaded or based on async-IO is much simpler

and

allows for easy ballooning.

I don't expect the server to re-prepare the statement. If the server
responds with "statement doesn't exist" the client would send a prepare.

Are you proposing a new libpq protocol version?

I believe we would need to add this to the protocol, yes.

Dave

Show quoted text

Jan

#13Konstantin Knizhnik
knizhnik@garret.ru
In reply to: Dave Cramer (#12)
Re: Named Prepared statement problems and possible solutions

On 08.06.2023 6:18 PM, Dave Cramer wrote:

On Thu, 8 Jun 2023 at 11:15, Jan Wieck <jan@wi3ck.info> wrote:

On 6/8/23 10:56, Dave Cramer wrote:

On Thu, 8 Jun 2023 at 10:31, Jan Wieck <jan@wi3ck.info
<mailto:jan@wi3ck.info>> wrote:

     On 6/8/23 09:53, Jan Wieck wrote:
      > On 6/8/23 09:21, Dave Cramer wrote:
      > The server doesn't know about all the clients of the

pooler, does

     it? It
      > has no way of telling if/when a client disconnects from

the pooler.

     Another problem that complicates doing it in the server is

that the

     information require to (re-)prepare a statement in a backend

that

     currently doesn't have it needs to be kept in shared memory.

This

     includes the query string itself. Doing that without shared

memory in a

     pooler that is multi-threaded or based on async-IO is much

simpler and

     allows for easy ballooning.

I don't expect the server to re-prepare the statement. If the

server

responds with "statement doesn't exist" the client would send a

prepare.

Are you proposing a new libpq protocol version?

I believe we would need to add this to the protocol, yes.

So it will be responsibility of client to remember text of prepared
query to be able to resend it when statement doesn't exists at server?
IMHO very strange decision. Why not to handle it in connection pooler
(doesn't matter - external or embedded)?

#14Dave Cramer
davecramer@gmail.com
In reply to: Konstantin Knizhnik (#13)
Re: Named Prepared statement problems and possible solutions

On Thu, 8 Jun 2023 at 11:22, Konstantin Knizhnik <knizhnik@garret.ru> wrote:

On 08.06.2023 6:18 PM, Dave Cramer wrote:

On Thu, 8 Jun 2023 at 11:15, Jan Wieck <jan@wi3ck.info> wrote:

On 6/8/23 10:56, Dave Cramer wrote:

On Thu, 8 Jun 2023 at 10:31, Jan Wieck <jan@wi3ck.info
<mailto:jan@wi3ck.info>> wrote:

On 6/8/23 09:53, Jan Wieck wrote:

On 6/8/23 09:21, Dave Cramer wrote:
The server doesn't know about all the clients of the pooler, does

it? It

has no way of telling if/when a client disconnects from the

pooler.

Another problem that complicates doing it in the server is that the
information require to (re-)prepare a statement in a backend that
currently doesn't have it needs to be kept in shared memory. This
includes the query string itself. Doing that without shared memory

in a

pooler that is multi-threaded or based on async-IO is much simpler

and

allows for easy ballooning.

I don't expect the server to re-prepare the statement. If the server
responds with "statement doesn't exist" the client would send a prepare.

Are you proposing a new libpq protocol version?

I believe we would need to add this to the protocol, yes.

So it will be responsibility of client to remember text of prepared query
to be able to resend it when statement doesn't exists at server?
IMHO very strange decision. Why not to handle it in connection pooler
(doesn't matter - external or embedded)?

I may be myopic but in the JDBC world and I assume others we have a
`PreparedStatement` object which has the text of the query.
The text is readily available to us.

Also again from the JDBC point of view we have use un-named statements
normally and then name them after 5 uses so we already have embedded logic
on how to deal with PreparedStatements

Dave

#15Jan Wieck
jan@wi3ck.info
In reply to: Dave Cramer (#14)
Re: Named Prepared statement problems and possible solutions

On 6/8/23 13:31, Dave Cramer wrote:

On Thu, 8 Jun 2023 at 11:22, Konstantin Knizhnik <knizhnik@garret.ru
<mailto:knizhnik@garret.ru>> wrote:

So it will be responsibility of client to remember text of prepared
query to be able to resend it when statement doesn't exists at server?
IMHO very strange decision. Why not to handle it in connection
pooler (doesn't matter - external or embedded)?

I may be myopic but in the JDBC world and I assume others we have a
`PreparedStatement` object which has the text of the query.
The text is readily available to us.

Also again from the JDBC point of view we have use un-named statements
normally and then name them after 5 uses so we already have embedded
logic on how to deal with PreparedStatements

The entire problem only surfaces when using a connection pool of one
sort or another. Without one the session is persistent to the client.

At some point I created a "functional" proof of concept for a connection
pool that did a mapping of the client side name to a pool managed server
side name. It kept track of which query was known by a server. It kept a
hashtable of poolname+username+query MD5 sums. On each prepare request
it would look up if that query is known, add a query-client reference in
another hashtable and so on. On a Bind/Exec message it would check that
the server has the query prepared and issue a P message if not. What was
missing was to keep track of no longer needed queries and deallocate them.

As said, it was a POC. Since it was implemented in Tcl it performed
miserable, but I got it to the point of being able to pause & resume and
the whole thing did work with prepared statements on the transaction
level. So it was a full functioning POC.

What makes this design appealing to me is that it is completely
transparent to every existing client that uses the extended query
protocol for server side prepared statements.

Jan

#16Dave Cramer
davecramer@gmail.com
In reply to: Jan Wieck (#15)
Re: Named Prepared statement problems and possible solutions

On Thu, 8 Jun 2023 at 15:49, Jan Wieck <jan@wi3ck.info> wrote:

On 6/8/23 13:31, Dave Cramer wrote:

On Thu, 8 Jun 2023 at 11:22, Konstantin Knizhnik <knizhnik@garret.ru
<mailto:knizhnik@garret.ru>> wrote:

So it will be responsibility of client to remember text of prepared
query to be able to resend it when statement doesn't exists at

server?

IMHO very strange decision. Why not to handle it in connection
pooler (doesn't matter - external or embedded)?

I may be myopic but in the JDBC world and I assume others we have a
`PreparedStatement` object which has the text of the query.
The text is readily available to us.

Also again from the JDBC point of view we have use un-named statements
normally and then name them after 5 uses so we already have embedded
logic on how to deal with PreparedStatements

The entire problem only surfaces when using a connection pool of one
sort or another. Without one the session is persistent to the client.

At some point I created a "functional" proof of concept for a connection
pool that did a mapping of the client side name to a pool managed server
side name. It kept track of which query was known by a server. It kept a
hashtable of poolname+username+query MD5 sums. On each prepare request
it would look up if that query is known, add a query-client reference in
another hashtable and so on. On a Bind/Exec message it would check that
the server has the query prepared and issue a P message if not. What was
missing was to keep track of no longer needed queries and deallocate them.

As said, it was a POC. Since it was implemented in Tcl it performed
miserable, but I got it to the point of being able to pause & resume and
the whole thing did work with prepared statements on the transaction
level. So it was a full functioning POC.

What makes this design appealing to me is that it is completely
transparent to every existing client that uses the extended query
protocol for server side prepared statements.

Apparently this is coming in pgbouncer Support of prepared statements by
knizhnik · Pull Request #845 · pgbouncer/pgbouncer (github.com)
<https://github.com/pgbouncer/pgbouncer/pull/845&gt;

Dave

Show quoted text

Jan

#17Jan Wieck
jan@wi3ck.info
In reply to: Dave Cramer (#16)
Re: Named Prepared statement problems and possible solutions

On 6/8/23 15:57, Dave Cramer wrote:

Apparently this is coming in pgbouncer Support of prepared statements by
knizhnik · Pull Request #845 · pgbouncer/pgbouncer (github.com)
<https://github.com/pgbouncer/pgbouncer/pull/845&gt;

I am quite interested in that patch. Considering how pgbouncer works
internally I am very curious.

Jan