Question on session timeout

Started by sudover 1 year ago7 messagesgeneral
Jump to latest
#1sud
suds1434@gmail.com

Hello,
We are frequently seeing the total DB connection reaching ~2000+ whereas
the total number of active sessions in pg_stat_activity staying <100 at any
point in time. And when we see the sessions from DB side they are showing
most of the sessions with state as 'idle' having backend_start/xact_start
showing date ~10days older. We do use application level connection pooling,
and we have ~120 sets as both the "max idle" and "max active" connection
count and "maxage" as 7 days, so does this suggest any issue at connection
pool setup?

We do see keep alive queries in the DB (select 1), not sure if that is
making this scenario. When checking the
"idle_in_transaction_session_timeout" it is set as 24hours and
"idle_session_timeout" set as "0". So my question is , should we set the
parameter to a lesser value in DB cluster level like ~5minutes or so, so as
not to keep the idle sessions lying so long in the database and what would
be the advisable value for these parameters?

Regards
Sud

#2Adrian Klaver
adrian.klaver@aklaver.com
In reply to: sud (#1)
Re: Question on session timeout

On 9/30/24 13:01, sud wrote:

Hello,
We are frequently seeing the total DB connection reaching ~2000+ whereas

Where are you getting the ~2000 count from?

the total number of active sessions in pg_stat_activity staying <100 at
any point in time. And when we see the sessions from DB side they are
showing most of the sessions with state as 'idle' having
backend_start/xact_start showing date ~10days older. We do use
application level connection pooling, and we have ~120 sets as both the

What do you mean by ~120 sets, in particular what is a set?

"max idle" and "max active" connection count and "maxage" as 7 days, so
does this suggest any issue at connection pool setup?

Using what pooler?

We do see keep alive queries in the DB (select 1), not sure if that is
making this scenario. When checking the

How often do to keep alive queries run?

"idle_in_transaction_session_timeout" it is set as 24hours and
"idle_session_timeout" set as "0". So my question is , should we set the
parameter to a lesser value in DB cluster level like ~5minutes or so, so
as not to keep the idle sessions lying so long in the database and what

'"idle_in_transaction_session_timeout" it is set as 24hours' is a foot
gun as explained here:

https://www.postgresql.org/docs/current/runtime-config-client.html

idle_in_transaction_session_timeout (integer)

[...]

"This option can be used to ensure that idle sessions do not hold locks
for an unreasonable amount of time. Even when no significant locks are
held, an open transaction prevents vacuuming away recently-dead tuples
that may be visible only to this transaction; so remaining idle for a
long time can contribute to table bloat. See Section 24.1 for more details."

With '"idle_session_timeout" set as "0"' a session without an open
transaction is not going to timeout.

would be the advisable value for these parameters?

Regards
Sud

--
Adrian Klaver
adrian.klaver@aklaver.com

#3sud
suds1434@gmail.com
In reply to: Adrian Klaver (#2)
Re: Question on session timeout

On Tue, Oct 1, 2024 at 4:10 AM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

On 9/30/24 13:01, sud wrote:

Hello,
We are frequently seeing the total DB connection reaching ~2000+ whereas

Where are you getting the ~2000 count from?

the total number of active sessions in pg_stat_activity staying <100 at
any point in time. And when we see the sessions from DB side they are
showing most of the sessions with state as 'idle' having
backend_start/xact_start showing date ~10days older. We do use
application level connection pooling, and we have ~120 sets as both the

What do you mean by ~120 sets, in particular what is a set?

"max idle" and "max active" connection count and "maxage" as 7 days, so
does this suggest any issue at connection pool setup?

Using what pooler?

We do see keep alive queries in the DB (select 1), not sure if that is
making this scenario. When checking the

How often do to keep alive queries run?

"idle_in_transaction_session_timeout" it is set as 24hours and
"idle_session_timeout" set as "0". So my question is , should we set the
parameter to a lesser value in DB cluster level like ~5minutes or so, so
as not to keep the idle sessions lying so long in the database and what

'"idle_in_transaction_session_timeout" it is set as 24hours' is a foot
gun as explained here:

https://www.postgresql.org/docs/current/runtime-config-client.html

idle_in_transaction_session_timeout (integer)

[...]

"This option can be used to ensure that idle sessions do not hold locks
for an unreasonable amount of time. Even when no significant locks are
held, an open transaction prevents vacuuming away recently-dead tuples
that may be visible only to this transaction; so remaining idle for a
long time can contribute to table bloat. See Section 24.1 for more
details."

With '"idle_session_timeout" set as "0"' a session without an open
transaction is not going to timeout.

*Where are you getting the ~2000 count from?*
Seeing this in the "performance insights" dashboard and also its matching
when I query the count of sessions from pg_stat_activity.

*What do you mean by ~120 sets, in particular what is a set?*These are the
values set as mentioned in the properties file which the application team
uses for connection pooling.

*Using what pooler?*I need to check on this as Its Java application(jdbc
driver for connecting to DB), so I thought it must be using standard
connection pooling. Will double check.

*How often do to keep alive queries run?*Need to check. But I am not sure,
in general , if these "keep alive" queries are used for keeping a
transaction alive or a session alive?

As you described, a long time open transaction with a session state as
"idle" will be threatening as that will cause locking and "transaction ID
wrap around" issues to surface whereas having "idle sessions" of a closed
transaction may not cause any issue as they will do no harm. Does it mean
we can have any number of idle sessions or we should also have some non
zero "timeout" setup for the "ide_session_timeout" parameter too
(maybe ~1hr or so)?

Is it correct to assume the session in pg_stat_activity with very old
XACT_START are the one which are part of long running open transaction(i.e.
driven by idle_in_transaction_session_timeout) whereas the ones with older
BACKEND_START or QUERY_START are the one are just the idle session(driven
by idle_session_timeout) but not tied to any open transaction?

Few observations:-

I do see, "MaximumUsedTransactionIDs" staying consistently ~200M for a long
time then coming down. And its matching to "autovacuum_freeze_max_age"
which is set as 200M. Hope it's expected. We have max_connections set as
5000.

"Database connection" touching ~2000 then coming down till 200. And we see
uneven spikes in those, it seems to be matching with the pattern , when we
have some errors occurring during the insert queries which are submitted by
the Java application to insert the data into the tables.

#4Greg Sabino Mullane
greg@turnstep.com
In reply to: sud (#3)
Re: Question on session timeout

On Tue, Oct 1, 2024 at 1:57 AM sud <suds1434@gmail.com> wrote:

*Where are you getting the ~2000 count from?*
Seeing this in the "performance insights" dashboard and also its matching
when I query the count of sessions from pg_stat_activity.

So I'm guessing this is perhaps RDS or Aurora? Stating that up front can be
helpful.

As you described, a long time open transaction with a session state as
"idle" will be threatening as that will cause locking

No, idle is fine, "idle in transaction" is bad. :)

Is it correct to assume the session in pg_stat_activity with very old

XACT_START are the one which are part of long running

<snip rest of question>

You need to look at the "state" column as your primary bit of information.
Second most important is how long something has been in that state, which
you can find with now() - state_change. The best way to learn all of this
is to open a few concurrent sessions in psql and experiment.

We have max_connections set as 5000.

That's quite high. But if you never reach that high, it doesn't matter a
whole lot.

"Database connection" touching ~2000 then coming down till 200. And we see

uneven spikes in those, it seems to be matching with the pattern , when we
have some errors occurring during the insert queries which are submitted by
the Java application to insert the data into the tables.

(What sort of errors?) 2000 is high. Clearly, you are not pooling
connections, or not pooling them well. If you are using plain Postgres,
look into setting up pgbouncer. If using something managed (e.g. RDS) look
into their particular pooling solution. Or fix your application-level
pooling.

Cheers,
Greg

#5Adrian Klaver
adrian.klaver@aklaver.com
In reply to: sud (#3)
Re: Question on session timeout

On 9/30/24 22:57, sud wrote:

On Tue, Oct 1, 2024 at 4:10 AM Adrian Klaver <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>> wrote:

/Where are you getting the ~2000  count from?/
Seeing this in the "performance insights" dashboard and also its
matching when I query the count of sessions from pg_stat_activity.

From your OP:

"... whereas the total number of active sessions in pg_stat_activity
staying <100 at any point in time."

So:

1) They don't match.

2) ""performance insights" dashboard" does not actually tell us
anything. What program and what is it actually measuring?

/What do you mean by ~120 sets, in particular what is a set?
/These are the values set as mentioned in the properties file which the
application team uses for connection pooling

Again this does not tell us anything.

1) A set of what?

2) What properties file?

/Using what pooler?
/I need to check on this as Its Java application(jdbc driver for
connecting to DB), so I thought it must be using standard connection
pooling. Will double check.

Since pooling is what you are concerned with this is should be the
starting point of your investigation.

/How often do to keep alive queries run?
/Need to check. But I am not sure, in general , if these "keep alive"
queries are used for keeping a transaction alive or a session alive?

With an idle_in_transaction_session_timeout of 24 hrs I don't see that
it makes a difference.

As you described, a long time open transaction with a session state as
"idle" will be threatening as that will cause locking and "transaction

That would be idle_in_transaction.

ID wrap around" issues to surface whereas having "idle sessions" of a
closed transaction may not cause any issue as they will do no harm. Does
it mean we can have any number of idle sessions or we should also have
some non zero "timeout" setup for the "ide_session_timeout" parameter
too (maybe ~1hr or so)?

Other then it takes up connections.

Is it correct to assume the session in pg_stat_activity with very old
XACT_START are the one which are part of long running open
transaction(i.e. driven by idle_in_transaction_session_timeout) whereas
the ones with older BACKEND_START or QUERY_START are the one are just
the idle session(driven by idle_session_timeout) but not tied to any
open transaction?

I would read this descriptions here:

https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-PG-STAT-ACTIVITY-VIEW

Few observations:-

I do see, "MaximumUsedTransactionIDs" staying consistently ~200M for a
long time then coming down. And its matching to
"autovacuum_freeze_max_age" which is set as 200M. Hope it's expected. We
have max_connections set as 5000.

"Database connection" touching ~2000 then coming down till 200. And we
see uneven spikes in those, it seems to be matching with the pattern ,
when we have some errors occurring during the insert queries which are
submitted by the Java application to insert the data into the tables.

This would have been a good thing to lead with.

--
Adrian Klaver
adrian.klaver@aklaver.com

#6sud
suds1434@gmail.com
In reply to: Greg Sabino Mullane (#4)
Re: Question on session timeout

On Tue, Oct 1, 2024 at 5:45 PM Greg Sabino Mullane <htamfids@gmail.com>
wrote:

On Tue, Oct 1, 2024 at 1:57 AM sud <suds1434@gmail.com> wrote:

*Where are you getting the ~2000 count from?*
Seeing this in the "performance insights" dashboard and also its matching
when I query the count of sessions from pg_stat_activity.

So I'm guessing this is perhaps RDS or Aurora? Stating that up front can
be helpful.

As you described, a long time open transaction with a session state as
"idle" will be threatening as that will cause locking

No, idle is fine, "idle in transaction" is bad. :)

Is it correct to assume the session in pg_stat_activity with very old

XACT_START are the one which are part of long running

<snip rest of question>

You need to look at the "state" column as your primary bit of information.
Second most important is how long something has been in that state, which
you can find with now() - state_change. The best way to learn all of this
is to open a few concurrent sessions in psql and experiment.

We have max_connections set as 5000.

That's quite high. But if you never reach that high, it doesn't matter a
whole lot.

"Database connection" touching ~2000 then coming down till 200. And we see

uneven spikes in those, it seems to be matching with the pattern , when we
have some errors occurring during the insert queries which are submitted by
the Java application to insert the data into the tables.

(What sort of errors?) 2000 is high. Clearly, you are not pooling
connections, or not pooling them well. If you are using plain Postgres,
look into setting up pgbouncer. If using something managed (e.g. RDS) look
into their particular pooling solution. Or fix your application-level
pooling.

Thanks Greg.
It's a third party app and the application team confirmed they are using
connection pooling at their side. But as you mentioned, the number of
connections *"2000 is high"* . But , isn't it possible because they may be
having a max connection pool size limit set as ~2000 which is why we see
that many connections during peak window. So in that case is it advisable
to reduce the number of Max connections, because we have a number of cores
-32 for this instance.

And yes it's RDS. The errors which we were seeing were related to the data
bit not related to connections.

#7Adrian Klaver
adrian.klaver@aklaver.com
In reply to: sud (#6)
Re: Question on session timeout

On 10/4/24 12:36, sud wrote:

And yes it's RDS. The errors which we were seeing were related to the
data bit not related to connections.

The errors occur in a connection so they are related.

--
Adrian Klaver
adrian.klaver@aklaver.com