Add session statistics to pg_stat_database

Started by Laurenz Albealmost 6 years ago35 messageshackers
Jump to latest
#1Laurenz Albe
laurenz.albe@cybertec.at

Here is a patch that adds the following to pg_stat_database:
- number of connections
- number of sessions that were not disconnected regularly
- total time spent in database sessions
- total time spent executing queries
- total idle in transaction time

This is useful to check if connection pooling is working.
It also helps to estimate the size of the connection pool
required to keep the database busy, which depends on the
percentage of the transaction time that is spent idling.

Yours,
Laurenz Albe

Attachments:

0001-Add-session-statistics-to-pg_stat_database.patchtext/x-patch; charset=UTF-8; name=0001-Add-session-statistics-to-pg_stat_database.patchDownload+324-8
#2Ahsan Hadi
ahsan.hadi@gmail.com
In reply to: Laurenz Albe (#1)
Re: Add session statistics to pg_stat_database

On Wed, Jul 8, 2020 at 4:17 PM Laurenz Albe <laurenz.albe@cybertec.at>
wrote:

Here is a patch that adds the following to pg_stat_database:
- number of connections

Is it expected behaviour to not count idle connections? The connection is
included after it is aborted but not while it was idle.

- number of sessions that were not disconnected regularly
- total time spent in database sessions
- total time spent executing queries
- total idle in transaction time

This is useful to check if connection pooling is working.
It also helps to estimate the size of the connection pool
required to keep the database busy, which depends on the
percentage of the transaction time that is spent idling.

Yours,
Laurenz Albe

--
Highgo Software (Canada/China/Pakistan)
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
EMAIL: mailto: ahsan.hadi@highgo.ca

#3Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Ahsan Hadi (#2)
Re: Add session statistics to pg_stat_database

On Thu, 2020-07-23 at 18:16 +0500, Ahsan Hadi wrote:

On Wed, Jul 8, 2020 at 4:17 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

Here is a patch that adds the following to pg_stat_database:
- number of connections

Is it expected behaviour to not count idle connections? The connection is included after it is aborted but not while it was idle.

Thanks for looking.

Currently, the patch counts connections when they close.
I could change the behavior that they are counted immediately.

Yours,
Laurenz Albe

#4Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Laurenz Albe (#3)
Re: Add session statistics to pg_stat_database

On Tue, 2020-08-11 at 13:53 +0200, I wrote:

On Thu, 2020-07-23 at 18:16 +0500, Ahsan Hadi wrote:

On Wed, Jul 8, 2020 at 4:17 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

Here is a patch that adds the following to pg_stat_database:
- number of connections

Is it expected behaviour to not count idle connections? The connection is included after it is aborted but not while it was idle.

Currently, the patch counts connections when they close.

I could change the behavior that they are counted immediately.

I have changed the code so that connections are counted immediately.

Attached is a new version.

Yours,
Laurenz Albe

Attachments:

0001-Add-session-statistics-to-pg_stat_database.v2.patchtext/x-patch; charset=UTF-8; name=0001-Add-session-statistics-to-pg_stat_database.v2.patchDownload+332-8
#5Soumyadeep Chakraborty
soumyadeep2007@gmail.com
In reply to: Laurenz Albe (#4)
Re: Add session statistics to pg_stat_database

Hello Laurenz,

Thanks for submitting this! Please find my feedback below.

* Are we trying to capture ONLY client initiated disconnects in
m_aborted (we are not handling other disconnects by not accounting for
EOF..like if psql was killed)? If yes, why?

* pgstat_send_connstats(): How about renaming the "force" argument to
"disconnected"?

*

static TimestampTz pgStatActiveStart = DT_NOBEGIN;
static PgStat_Counter pgStatActiveTime = 0;
static TimestampTz pgStatTransactionIdleStart = DT_NOBEGIN;
static PgStat_Counter pgStatTransactionIdleTime = 0;
static bool pgStatSessionReported = false;
bool pgStatSessionDisconnected = false;

I think we can house all of these globals inside PgBackendStatus and can
follow the protocol for reading/writing fields in PgBackendStatus.
Refer: PGSTAT_{BEGIN|END}_WRITE_ACTIVITY

Also, some of these fields are not required:

I don't think we need pgStatActiveStart and pgStatTransactionIdleStart -
instead of these two we could use
PgBackendStatus.st_state_start_timestamp which marks the beginning TS of
the backend's current state (st_state). We can look at that field along
with the current and to-be-transitioned-to states inside
pgstat_report_activity() when there is a transition away from
STATE_RUNNING, STATE_IDLEINTRANSACTION or
STATE_IDLEINTRANSACTION_ABORTED, in order to update pgStatActiveTime and
pgStatTransactionIdleTime. We would also need to update those counters
on disconnect/PGSTAT_STAT_INTERVAL timeout if the backend's current
state was STATE_RUNNING, STATE_IDLEINTRANSACTION or
STATE_IDLEINTRANSACTION_ABORTED (in pgstat_send_connstats())

pgStatSessionDisconnected is not required as it can be determined if a
session has been disconnected by looking at the force argument to
pgstat_report_stat() [unless we would want to distinguish between
client-initiated disconnects, which I am not sure why, as I have
brought up above].

pgStatSessionReported is not required. We can glean this information by
checking if the function local static last_report in
pgstat_report_stat() is 0 and passing this on as another param
"first_report" to pgstat_send_connstats().

* PGSTAT_FILE_FORMAT_ID needs to be updated when a stats collector data
structure changes and we had a change in PgStat_StatDBEntry.

* We can directly use PgBackendStatus.st_proc_start_timestamp for
calculating m_session_time. We can also choose to report session uptime
even when the report is for the not-disconnect case
(PGSTAT_STAT_INTERVAL elapsed). No reason why not. Then we would need to
pass in the value of last_report to pgstat_send_connstats() -> calculate
m_session_time to be number of time units from
PgBackendStatus.st_proc_start_timestamp for the first report and then
number of time units from the last_report for all subsequent reports.

* We would need to bump the catalog version since we have made
changes to system views. Refer: #define CATALOG_VERSION_NO

Regards,
Soumyadeep (VMware)

#6Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Soumyadeep Chakraborty (#5)
Re: Add session statistics to pg_stat_database

On Thu, 2020-09-24 at 14:38 -0700, Soumyadeep Chakraborty wrote:

Thanks for submitting this! Please find my feedback below.

Thanks for the thorough review.

Before I update the patch, I have a few comments and questions.

* Are we trying to capture ONLY client initiated disconnects in
m_aborted (we are not handling other disconnects by not accounting for
EOF..like if psql was killed)? If yes, why?

I thought it was interesting to know how many database sessions are
ended regularly as opposed to ones that get killed or end by unexpected
client death.

* pgstat_send_connstats(): How about renaming the "force" argument to
"disconnected"?

Yes, that might be better. I'll do that.

*

static TimestampTz pgStatActiveStart = DT_NOBEGIN;
static PgStat_Counter pgStatActiveTime = 0;
static TimestampTz pgStatTransactionIdleStart = DT_NOBEGIN;
static PgStat_Counter pgStatTransactionIdleTime = 0;
static bool pgStatSessionReported = false;
bool pgStatSessionDisconnected = false;

I think we can house all of these globals inside PgBackendStatus and can
follow the protocol for reading/writing fields in PgBackendStatus.
Refer: PGSTAT_{BEGIN|END}_WRITE_ACTIVITY

Are you sure that is the right way to go?

Correct me if I am wrong, but isn't PgBackendStatus for relevant status
information that other processes can access?
I'd assume that it is not the correct place to store backend-private data
that are not relevant to others.
Besides, if data is written to this structure more often, readers would
have deal with more contention, which could affect performance.

But I agree with the following:

Also, some of these fields are not required:

I don't think we need pgStatActiveStart and pgStatTransactionIdleStart -
instead of these two we could use
PgBackendStatus.st_state_start_timestamp which marks the beginning TS of
the backend's current state (st_state). We can look at that field along
with the current and to-be-transitioned-to states inside
pgstat_report_activity() when there is a transition away from
STATE_RUNNING, STATE_IDLEINTRANSACTION or
STATE_IDLEINTRANSACTION_ABORTED, in order to update pgStatActiveTime and
pgStatTransactionIdleTime. We would also need to update those counters
on disconnect/PGSTAT_STAT_INTERVAL timeout if the backend's current
state was STATE_RUNNING, STATE_IDLEINTRANSACTION or
STATE_IDLEINTRANSACTION_ABORTED (in pgstat_send_connstats())

Yes, that would be better.

pgStatSessionDisconnected is not required as it can be determined if a
session has been disconnected by looking at the force argument to
pgstat_report_stat() [unless we would want to distinguish between
client-initiated disconnects, which I am not sure why, as I have
brought up above].

But wouldn't that mean that we count *every* end of a session as regular
disconnection, even if the backend was killed?

I personally would want all my database connections to be closed by
the client, unless something unexpected happens.

pgStatSessionReported is not required. We can glean this information by
checking if the function local static last_report in
pgstat_report_stat() is 0 and passing this on as another param
"first_report" to pgstat_send_connstats().

Yes, that is better.

* PGSTAT_FILE_FORMAT_ID needs to be updated when a stats collector data
structure changes and we had a change in PgStat_StatDBEntry.

I think that should be left to the committer.

* We can directly use PgBackendStatus.st_proc_start_timestamp for
calculating m_session_time. We can also choose to report session uptime
even when the report is for the not-disconnect case
(PGSTAT_STAT_INTERVAL elapsed). No reason why not. Then we would need to
pass in the value of last_report to pgstat_send_connstats() -> calculate
m_session_time to be number of time units from
PgBackendStatus.st_proc_start_timestamp for the first report and then
number of time units from the last_report for all subsequent reports.

Yes, that would make for better statistics, since client connections
can last quite long.

* We would need to bump the catalog version since we have made
changes to system views. Refer: #define CATALOG_VERSION_NO

Again, I think that's up to the committer.

Thanks again!

Yours,
Laurenz Albe

#7Soumyadeep Chakraborty
soumyadeep2007@gmail.com
In reply to: Laurenz Albe (#6)
Re: Add session statistics to pg_stat_database

On Tue, Sep 29, 2020 at 2:44 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

* Are we trying to capture ONLY client initiated disconnects in
m_aborted (we are not handling other disconnects by not accounting for
EOF..like if psql was killed)? If yes, why?

I thought it was interesting to know how many database sessions are
ended regularly as opposed to ones that get killed or end by unexpected
client death.

It may very well be. It would also be interesting to find out how many
connections are still open on the database (something we could easily
glean if we had the number of all disconnects, client-initiated or
unnatural). Maybe we could have both?

m_sessions_disconnected;
m_sessions_killed;

*

static TimestampTz pgStatActiveStart = DT_NOBEGIN;
static PgStat_Counter pgStatActiveTime = 0;
static TimestampTz pgStatTransactionIdleStart = DT_NOBEGIN;
static PgStat_Counter pgStatTransactionIdleTime = 0;
static bool pgStatSessionReported = false;
bool pgStatSessionDisconnected = false;

I think we can house all of these globals inside PgBackendStatus and can
follow the protocol for reading/writing fields in PgBackendStatus.
Refer: PGSTAT_{BEGIN|END}_WRITE_ACTIVITY

Are you sure that is the right way to go?

Correct me if I am wrong, but isn't PgBackendStatus for relevant status
information that other processes can access?
I'd assume that it is not the correct place to store backend-private data
that are not relevant to others.
Besides, if data is written to this structure more often, readers would
have deal with more contention, which could affect performance.

You are absolutely right! PgBackendStatus is not the place for any of
these fields. We could place them in LocalPgBackendStatus perhaps. But
I don't feel too strongly about that now, having looked at similar fields
such as pgStatXactCommit, pgStatXactRollback etc. If we decide to stick
with the globals, let's isolate and decorate them with a comment such as
this example from the source:

/*
* Updated by pgstat_count_buffer_*_time macros
*/
extern PgStat_Counter pgStatBlockReadTime;
extern PgStat_Counter pgStatBlockWriteTime;

pgStatSessionDisconnected is not required as it can be determined if a
session has been disconnected by looking at the force argument to
pgstat_report_stat() [unless we would want to distinguish between
client-initiated disconnects, which I am not sure why, as I have
brought up above].

But wouldn't that mean that we count *every* end of a session as regular
disconnection, even if the backend was killed?

See my comment above about client-initiated and unnatural disconnects.

* PGSTAT_FILE_FORMAT_ID needs to be updated when a stats collector data
structure changes and we had a change in PgStat_StatDBEntry.

I think that should be left to the committer.

Fair.

* We would need to bump the catalog version since we have made
changes to system views. Refer: #define CATALOG_VERSION_NO

Again, I think that's up to the committer.

Fair.

Regards,
Soumyadeep (VMware)

#8Masahiro Ikeda
ikedamsh@oss.nttdata.com
In reply to: Laurenz Albe (#4)
Re: Add session statistics to pg_stat_database

On 2020-09-05 00:50, Laurenz Albe wrote:

I have changed the code so that connections are counted immediately.
Attached is a new version.

Thanks for making a patch.
I'm interested in this feature.

I think to add the number of login failures is good for security.
Although we can see the event from log files, it's useful to know the
overview
if the database may be attached or not.

By the way, could you rebase the patch since the latest patches
failed to be applied to the master branch?

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

#9Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Soumyadeep Chakraborty (#7)
Re: Add session statistics to pg_stat_database

On Fri, 2020-10-02 at 15:10 -0700, Soumyadeep Chakraborty wrote:

On Tue, Sep 29, 2020 at 2:44 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

* Are we trying to capture ONLY client initiated disconnects in
m_aborted (we are not handling other disconnects by not accounting for
EOF..like if psql was killed)? If yes, why?

I thought it was interesting to know how many database sessions are
ended regularly as opposed to ones that get killed or end by unexpected
client death.

It may very well be. It would also be interesting to find out how many
connections are still open on the database (something we could easily
glean if we had the number of all disconnects, client-initiated or
unnatural). Maybe we could have both?

m_sessions_disconnected;
m_sessions_killed;

We already have "numbackends" in "pg_stat_database", so we know the number
of active connections, right?

You are absolutely right! PgBackendStatus is not the place for any of
these fields. We could place them in LocalPgBackendStatus perhaps. But
I don't feel too strongly about that now, having looked at similar fields
such as pgStatXactCommit, pgStatXactRollback etc. If we decide to stick
with the globals, let's isolate and decorate them with a comment such as
this example from the source:

/*
* Updated by pgstat_count_buffer_*_time macros
*/
extern PgStat_Counter pgStatBlockReadTime;
extern PgStat_Counter pgStatBlockWriteTime;

I have reduced the number of variables with my latest patch; I think
the rewrite based on your review is definitely an improvement.

The comment you quote is from "pgstat.h", and my only global variable
has a comment there.

pgStatSessionDisconnected is not required as it can be determined if a
session has been disconnected by looking at the force argument to
pgstat_report_stat() [unless we would want to distinguish between
client-initiated disconnects, which I am not sure why, as I have
brought up above].

But wouldn't that mean that we count *every* end of a session as regular
disconnection, even if the backend was killed?

See my comment above about client-initiated and unnatural disconnects.

I decided to leave the functionality as it is; I think it is interesting
information to know if your clients disconnect cleanly or not.

Masahiro Ikeda wrote:

I think to add the number of login failures is good for security.
Although we can see the event from log files, it's useful to know the
overview if the database may be attached or not.

I don't think login failures can be reasonably reported in
"pg_stat_database", since authentication happens before the session is
attached to a database.

What if somebody attempts to connect to a non-existing database?

I agree that this is interesting information, but I don't think it
belongs into this patch.

By the way, could you rebase the patch since the latest patches
failed to be applied to the master branch?

Yes, the patch has bit-rotted.

Attached is v3 with improvements.

Yours,
Laurenz Albe

Attachments:

0001-Add-session-statistics-to-pg_stat_database.v3.patchtext/x-patch; charset=UTF-8; name=0001-Add-session-statistics-to-pg_stat_database.v3.patchDownload+290-7
#10Justin Pryzby
pryzby@telsasoft.com
In reply to: Laurenz Albe (#9)
Re: Add session statistics to pg_stat_database

On Tue, Oct 13, 2020 at 01:44:41PM +0200, Laurenz Albe wrote:

Attached is v3 with improvements.

+      <para>
+       Time spent in database sessions in this database, in milliseconds.
+      </para></entry>

Should say "Total time spent *by* DB sessions..." ?

I think these counters are only accurate as of the last state change, right?
So a session which has been idle for 1hr, that 1hr is not included. I think
the documentation should explain that, or (ideally) the implementation would be
more precise. Maybe the timestamps should only be updated after a session
terminates (and the docs should say so).

+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>connections</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of connections established to this database.

*Total* number of connections established, otherwise it sounds like it might
mean "the number of sessions [currently] established".

+       Number of database sessions to this database that did not end
+       with a regular client disconnection.

Does that mean "sessions which ended irregularly" ? Or does it also include
"sessions which have not ended" ?

+ msg.m_aborted = (!disconnect || pgStatSessionDisconnected) ? 0 : 1;

I think this can be just:
msg.m_aborted = (bool) (disconnect && !pgStatSessionDisconnected);

+       if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+               result = 0;
+       else
+               result = ((double) dbentry->n_session_time) / 1000.0;

I think these can say:
|double result = 0;
|if ((dbentry=..) != NULL)
| result = (double) ..;

That not only uses fewer LOC, but also the assignment to zero is (known to be)
done at compile time (BSS) rather than runtime.

#11Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Justin Pryzby (#10)
Re: Add session statistics to pg_stat_database

Thanks for the --- as always --- valuable review!

On Tue, 2020-10-13 at 17:55 -0500, Justin Pryzby wrote:

On Tue, Oct 13, 2020 at 01:44:41PM +0200, Laurenz Albe wrote:

Attached is v3 with improvements.

+      <para>
+       Time spent in database sessions in this database, in milliseconds.
+      </para></entry>

Should say "Total time spent *by* DB sessions..." ?

That is indeed better. Fixed.

I think these counters are only accurate as of the last state change, right?
So a session which has been idle for 1hr, that 1hr is not included. I think
the documentation should explain that, or (ideally) the implementation would be
more precise. Maybe the timestamps should only be updated after a session
terminates (and the docs should say so).

I agree, and I have added an explanation that the value doesn't include
the duration of the current state.

Of course it would be nice to have totally accurate values, but I think
that the statistics are by nature inaccurate (datagrams can get lost),
and more frequent statistics updates increase the work load.
I don't think that is worth the effort.

+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>connections</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of connections established to this database.

*Total* number of connections established, otherwise it sounds like it might
mean "the number of sessions [currently] established".

Fixed like that.

+       Number of database sessions to this database that did not end
+       with a regular client disconnection.

Does that mean "sessions which ended irregularly" ? Or does it also include
"sessions which have not ended" ?

I have added an explanation for that.

+ msg.m_aborted = (!disconnect || pgStatSessionDisconnected) ? 0 : 1;

I think this can be just:
msg.m_aborted = (bool) (disconnect && !pgStatSessionDisconnected);

I mulled over this and finally decided to leave it as it is.

Since "m_aborted" gets added to the total counter, I'd prefer to
have it be an "int".

Your proposed code works (the cast is actually not necessary, right?).
But I think that my version is more readable if you think of
"m_aborted" as a counter rather than a flag.

+       if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+               result = 0;
+       else
+               result = ((double) dbentry->n_session_time) / 1000.0;

I think these can say:

double result = 0;
if ((dbentry=..) != NULL)
result = (double) ..;

That not only uses fewer LOC, but also the assignment to zero is (known to be)
done at compile time (BSS) rather than runtime.

I didn't know about the performance difference.
Concise code (if readable) is good, so I changed the code like you propose.

The code pattern is actually copied from neighboring functions,
which then should also be changed like this, but that is outside
the scope of this patch.

Attached is v4 of the patch.

Yours,
Laurenz Albe

Attachments:

0001-Add-session-statistics-to-pg_stat_database.v4.patchtext/x-patch; charset=UTF-8; name=0001-Add-session-statistics-to-pg_stat_database.v4.patchDownload+283-7
#12Ahsan Hadi
ahsan.hadi@gmail.com
In reply to: Laurenz Albe (#11)
Re: Add session statistics to pg_stat_database

Hi Laurenz,

I have applied the latest patch on master, all the regression test cases
are passing and the implemented functionality is also looking fine. The
point that I raised about idle connection not included is also addressed.

thanks,
Ahsan

On Wed, Oct 14, 2020 at 2:28 PM Laurenz Albe <laurenz.albe@cybertec.at>
wrote:

Thanks for the --- as always --- valuable review!

On Tue, 2020-10-13 at 17:55 -0500, Justin Pryzby wrote:

On Tue, Oct 13, 2020 at 01:44:41PM +0200, Laurenz Albe wrote:

Attached is v3 with improvements.

+      <para>
+       Time spent in database sessions in this database, in

milliseconds.

+ </para></entry>

Should say "Total time spent *by* DB sessions..." ?

That is indeed better. Fixed.

I think these counters are only accurate as of the last state change,

right?

So a session which has been idle for 1hr, that 1hr is not included. I

think

the documentation should explain that, or (ideally) the implementation

would be

more precise. Maybe the timestamps should only be updated after a

session

terminates (and the docs should say so).

I agree, and I have added an explanation that the value doesn't include
the duration of the current state.

Of course it would be nice to have totally accurate values, but I think
that the statistics are by nature inaccurate (datagrams can get lost),
and more frequent statistics updates increase the work load.
I don't think that is worth the effort.

+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>connections</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of connections established to this database.

*Total* number of connections established, otherwise it sounds like it

might

mean "the number of sessions [currently] established".

Fixed like that.

+       Number of database sessions to this database that did not end
+       with a regular client disconnection.

Does that mean "sessions which ended irregularly" ? Or does it also

include

"sessions which have not ended" ?

I have added an explanation for that.

+ msg.m_aborted = (!disconnect || pgStatSessionDisconnected) ? 0 :

1;

I think this can be just:
msg.m_aborted = (bool) (disconnect && !pgStatSessionDisconnected);

I mulled over this and finally decided to leave it as it is.

Since "m_aborted" gets added to the total counter, I'd prefer to
have it be an "int".

Your proposed code works (the cast is actually not necessary, right?).
But I think that my version is more readable if you think of
"m_aborted" as a counter rather than a flag.

+       if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+               result = 0;
+       else
+               result = ((double) dbentry->n_session_time) / 1000.0;

I think these can say:

double result = 0;
if ((dbentry=..) != NULL)
result = (double) ..;

That not only uses fewer LOC, but also the assignment to zero is (known

to be)

done at compile time (BSS) rather than runtime.

I didn't know about the performance difference.
Concise code (if readable) is good, so I changed the code like you propose.

The code pattern is actually copied from neighboring functions,
which then should also be changed like this, but that is outside
the scope of this patch.

Attached is v4 of the patch.

Yours,
Laurenz Albe

--
Highgo Software (Canada/China/Pakistan)
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
EMAIL: mailto: ahsan.hadi@highgo.ca

#13Georgios Kokolatos
gkokolatos@protonmail.com
In reply to: Ahsan Hadi (#12)
Re: Add session statistics to pg_stat_database

Hi,

I noticed that the cfbot fails for this patch.
For this, I am setting the status to: 'Waiting on Author'.

Cheers,
//Georgios

The new status of this patch is: Waiting on Author

#14Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Georgios Kokolatos (#13)
Re: Add session statistics to pg_stat_database

On Tue, 2020-11-10 at 15:03 +0000, Georgios Kokolatos wrote:

I noticed that the cfbot fails for this patch.

For this, I am setting the status to: 'Waiting on Author'.

Thanks for noticing, it was only the documentation build.

Version 5 attached, status changed back to "waiting for review".

Yours,
Laurenz Albe

Attachments:

0001-Add-session-statistics-to-pg_stat_database.v5.patchtext/x-patch; charset=UTF-8; name=0001-Add-session-statistics-to-pg_stat_database.v5.patchDownload+283-7
#15Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Laurenz Albe (#14)
Re: Add session statistics to pg_stat_database

I wrote:

On Tue, 2020-11-10 at 15:03 +0000, Georgios Kokolatos wrote:

I noticed that the cfbot fails for this patch.

For this, I am setting the status to: 'Waiting on Author'.

Thanks for noticing, it was only the documentation build.

Version 5 attached, status changed back to "waiting for review".

The patch is still failing, so I looked again:

make[3]: Entering directory '/home/travis/build/postgresql-cfbot/postgresql/doc/src/sgml'
{ \
echo "<!ENTITY version \"14devel\">"; \
echo "<!ENTITY majorversion \"14\">"; \
} > version.sgml
'/usr/bin/perl' ./mk_feature_tables.pl YES ../../../src/backend/catalog/sql_feature_packages.txt ../../../src/backend/catalog/sql_features.txt > features-supported.sgml
'/usr/bin/perl' ./mk_feature_tables.pl NO ../../../src/backend/catalog/sql_feature_packages.txt ../../../src/backend/catalog/sql_features.txt > features-unsupported.sgml
'/usr/bin/perl' ./generate-errcodes-table.pl ../../../src/backend/utils/errcodes.txt > errcodes-table.sgml
'/usr/bin/perl' ./generate-keywords-table.pl . > keywords-table.sgml
/usr/bin/xmllint --path . --noout --valid postgres.sgml
error : Unknown IO error
postgres.sgml:21: /usr/bin/bison -Wno-deprecated -d -o gram.c gram.y
warning: failed to load external entity "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd&quot;
]>
^
postgres.sgml:23: element book: validity error : No declaration for attribute id of element book
<book id="postgres">
^
postgres.sgml:24: element title: validity error : No declaration for element title
<title>PostgreSQL &version; Documentation</title>

I have the impression that this is not the fault of my patch, something seems to be
wrong with the cfbot.

I see that other patches are failing with the same error.

Yours,
Laurenz Albe

#16Georgios Kokolatos
gkokolatos@protonmail.com
In reply to: Laurenz Albe (#15)
Re: Add session statistics to pg_stat_database

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, November 12, 2020 9:31 AM, Laurenz Albe <laurenz.albe@cybertec.at> wrote:

I wrote:

On Tue, 2020-11-10 at 15:03 +0000, Georgios Kokolatos wrote:

I noticed that the cfbot fails for this patch.
For this, I am setting the status to: 'Waiting on Author'.

Thanks for noticing, it was only the documentation build.
Version 5 attached, status changed back to "waiting for review".

The patch is still failing, so I looked again:

make[3]: Entering directory '/home/travis/build/postgresql-cfbot/postgresql/doc/src/sgml'
{ \
echo "<!ENTITY version \"14devel\">"; \

echo "<!ENTITY majorversion \\"14\\">"; \\

} > version.sgml
'/usr/bin/perl' ./mk_feature_tables.pl YES ../../../src/backend/catalog/sql_feature_packages.txt ../../../src/backend/catalog/sql_features.txt > features-supported.sgml
'/usr/bin/perl' ./mk_feature_tables.pl NO ../../../src/backend/catalog/sql_feature_packages.txt ../../../src/backend/catalog/sql_features.txt > features-unsupported.sgml
'/usr/bin/perl' ./generate-errcodes-table.pl ../../../src/backend/utils/errcodes.txt > errcodes-table.sgml
'/usr/bin/perl' ./generate-keywords-table.pl . > keywords-table.sgml
/usr/bin/xmllint --path . --noout --valid postgres.sgml
error : Unknown IO error
postgres.sgml:21: /usr/bin/bison -Wno-deprecated -d -o gram.c gram.y
warning: failed to load external entity "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd&quot;
]>

^

postgres.sgml:23: element book: validity error : No declaration for attribute id of element book
<book id="postgres">

^

postgres.sgml:24: element title: validity error : No declaration for element title
<title>PostgreSQL &version; Documentation</title>

I have the impression that this is not the fault of my patch, something seems to be
wrong with the cfbot.

I see that other patches are failing with the same error.

You are indeed correct. Unfortunately the cfbot is a bit unstable due
to some issues related to the documentation. I alerted a contributor
and he was quick to try to address the issue in pgsql-www [1]/messages/by-id/E2EE6B76-2D96-408A-B961-CAE47D1A86F0@yesql.se.

Thank you very much for looking and apologies for the chatter.

Yours,
Laurenz Albe

[1]: /messages/by-id/E2EE6B76-2D96-408A-B961-CAE47D1A86F0@yesql.se

#17Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Ahsan Hadi (#12)
Re: Add session statistics to pg_stat_database

On Fri, 2020-10-16 at 16:24 +0500, Ahsan Hadi wrote:

I have applied the latest patch on master, all the regression test cases are passing
and the implemented functionality is also looking fine. The point that I raised about
idle connection not included is also addressed.

If you think that the patch is ready to go, you could mark it as
"ready for committer" in the commitfest app.

Yours,
Laurenz Albe

#18Magnus Hagander
magnus@hagander.net
In reply to: Laurenz Albe (#17)
Re: Add session statistics to pg_stat_database

On Tue, Nov 17, 2020 at 4:22 PM Laurenz Albe <laurenz.albe@cybertec.at>
wrote:

On Fri, 2020-10-16 at 16:24 +0500, Ahsan Hadi wrote:

I have applied the latest patch on master, all the regression test cases

are passing

and the implemented functionality is also looking fine. The point that

I raised about

idle connection not included is also addressed.

If you think that the patch is ready to go, you could mark it as
"ready for committer" in the commitfest app.

I've taken a look as well, and here are a few short notes:

* It talks about "number of connections" but "number of aborted sessions".
We should probably be consistent about talking either about connections or
sessions? In particular, connections seems wrong in this case, because it
only starts counting after authentication is complete (since otherwise we
send no stats)? (This goes for both docs and actual function names)

* Is there a reason we're counting active and idle in transaction
(including aborted), but not fastpath? In particular, we seem to ignore
fastpath -- if we don't want to single it out specifically, it should
probably be included in active?

* pgstat_send_connstat() but pgstat_recv_connection(). Let's call both
connstat or both connection (I'd vote connstat)?

* Is this actually a fix that's independent of the new stats? It seems in
general to be changing the behaviour of "force", which is more generic?
-               !have_function_stats)
+               !have_function_stats && !force)

* in pgstat_send_connstat() you pass the parameter "force" in as
"disconnect". That behaviour at least requires a comment saying why, I
think. My understanding is it relies on that "force" means this is
a "backend is shutting down", but that is not actually documented anywhere.
Maybe the "force" parameter should actually be renamed to indicate this is
really what it means, to avoid a future mistake in the area? But even with
that, how does that turn into disconnect?

* Maybe rename pgStatSessionDisconnected
to pgStatSessionNormalDisconnected? To avoid having to go back to the
setting point and look it up in a comment.

I wonder if there would also be a way to count "sessions that crashed" as
well. That is,the ones that failed in a way that caused the postmaster to
restart the system. But that's information we'd have to send from the
postmaster, but I'm actually unsure if we're "allowed" to send things to
the stats collector from the postmaster. But I think it could be quite
useful information to have. Maybe we can find some way to piggyback on the
fact that we're restarting the stats collector as a result?

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#19Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Magnus Hagander (#18)
Re: Add session statistics to pg_stat_database

On Tue, 2020-11-17 at 17:33 +0100, Magnus Hagander wrote:

I've taken a look as well, and here are a few short notes:

Much appreciated!

* It talks about "number of connections" but "number of aborted sessions". We should probably
be consistent about talking either about connections or sessions? In particular, connections
seems wrong in this case, because it only starts counting after authentication is complete
(since otherwise we send no stats)? (This goes for both docs and actual function names)

Yes, that is true. I have changed "connections" to "sessions" and renamed the new
column "connections" to "session_count".

I think that most people will understand a session as started after a successful
connection.

* Is there a reason we're counting active and idle in transaction (including aborted),
but not fastpath? In particular, we seem to ignore fastpath -- if we don't want to single
it out specifically, it should probably be included in active?

The only reason is that I didn't think of it. Fixed.

* pgstat_send_connstat() but pgstat_recv_connection(). Let's call both connstat or both
connection (I'd vote connstat)?

Agreed, done.

* Is this actually a fix that's independent of the new stats? It seems in general to be
changing the behaviour of "force", which is more generic?
-               !have_function_stats)
+               !have_function_stats && !force)

The comment right above that reads:
/* Don't expend a clock check if nothing to do */
So it is just a quick exit if there is nothing to do.

But with that patch we have something to do if "force" (see below) is true:
Report the remaining session duration and if the session was closed normally.

Thus the additional check.

* in pgstat_send_connstat() you pass the parameter "force" in as "disconnect".
That behaviour at least requires a comment saying why, I think. My understanding is
it relies on that "force" means this is a "backend is shutting down", but that is not
actually documented anywhere. Maybe the "force" parameter should actually be renamed
to indicate this is really what it means, to avoid a future mistake in the area?
But even with that, how does that turn into disconnect?

"pgstat_report_stat(true)" is only called from "pgstat_beshutdown_hook()", so
it is currently only called when the backend is about to exit.

According the the comments the flag means that "caller wants to force stats out".
I guess that the author thought that there may arise other reasons to force sending
statistics in the future (commit 641912b4d from 2007).

However, since that has not happened, I have renamed the flag to "disconnect" and
adapted the documentation. This doesn't change the current behavior, but establishes
a new rule.

* Maybe rename pgStatSessionDisconnected to pgStatSessionNormalDisconnected?
To avoid having to go back to the setting point and look it up in a comment.

Long, descriptive names are a good thing.
I have decided to use "pgStatSessionDisconnectedNormally", since that is even longer
and seems to fit the "yes or no" category better.

I wonder if there would also be a way to count "sessions that crashed" as well.
That is,the ones that failed in a way that caused the postmaster to restart the system.
But that's information we'd have to send from the postmaster, but I'm actually unsure
if we're "allowed" to send things to the stats collector from the postmaster.
But I think it could be quite useful information to have. Maybe we can find some way
to piggyback on the fact that we're restarting the stats collector as a result?

Sure, a crash count would be useful. I don't know if it is easy for the stats collector
to tell the difference between a start after a backend crash and - say - starting from
a base backup.

Patch v6 attached.

I think that that would be material for another patch, and I don't think it should go
to "pg_stat_database", because a) it might be hard to tell to which database the crashed
backend was attached, b) it might be a background process that doesn't belong to a database
and c) if the crash were caused by - say - corruption in a shared catalog, it would be
misleading.

Yours,
Laurenz Albe

Attachments:

0001-Add-session-statistics-to-pg_stat_database.v6.patchtext/x-patch; charset=UTF-8; name=0001-Add-session-statistics-to-pg_stat_database.v6.patchDownload+292-10
#20Magnus Hagander
magnus@hagander.net
In reply to: Laurenz Albe (#19)
Re: Add session statistics to pg_stat_database

On Fri, Nov 20, 2020 at 3:41 PM Laurenz Albe <laurenz.albe@cybertec.at>
wrote:

On Tue, 2020-11-17 at 17:33 +0100, Magnus Hagander wrote:

I've taken a look as well, and here are a few short notes:

Much appreciated!

Sorry about the delay in getting back to you on this one. FYI, while the
patch has been bumped to the next CF by now, I do intend to continue
working on it before that starts.

* It talks about "number of connections" but "number of aborted
sessions". We should probably

be consistent about talking either about connections or sessions? In

particular, connections

seems wrong in this case, because it only starts counting after

authentication is complete

(since otherwise we send no stats)? (This goes for both docs and

actual function names)

Yes, that is true. I have changed "connections" to "sessions" and renamed
the new
column "connections" to "session_count".

I think that most people will understand a session as started after a
successful
connection.

Yeah, I agree, and as long as it's consistent we don't need more
explanations than that.

Further int he views, it's a bit strange to have session_count and
aborted_session, but I'm not sure what to suggest. "aborted_session_count"
seems too long. Maybe just "sessions" instead of "session_count" -- no
other counters actually have the "_count" suffix.

* Is this actually a fix that's independent of the new stats? It seems in
general to be

changing the behaviour of "force", which is more generic?
-               !have_function_stats)
+               !have_function_stats && !force)

The comment right above that reads:
/* Don't expend a clock check if nothing to do */
So it is just a quick exit if there is nothing to do.

But with that patch we have something to do if "force" (see below) is true:
Report the remaining session duration and if the session was closed
normally.

Thus the additional check.

Ah yeah, makes sense. It becomes more clear with the rename.

* in pgstat_send_connstat() you pass the parameter "force" in as
"disconnect".

That behaviour at least requires a comment saying why, I think. My

understanding is

it relies on that "force" means this is a "backend is shutting down",

but that is not

actually documented anywhere. Maybe the "force" parameter should

actually be renamed

to indicate this is really what it means, to avoid a future mistake in

the area?

But even with that, how does that turn into disconnect?

"pgstat_report_stat(true)" is only called from "pgstat_beshutdown_hook()",
so
it is currently only called when the backend is about to exit.

According the the comments the flag means that "caller wants to force
stats out".
I guess that the author thought that there may arise other reasons to
force sending
statistics in the future (commit 641912b4d from 2007).

However, since that has not happened, I have renamed the flag to
"disconnect" and
adapted the documentation. This doesn't change the current behavior, but
establishes
a new rule.

That makes it a lot more clear. And I agree, if nobody came up with a
reason since 2007, then we are free to repurpose it :)

* Maybe rename pgStatSessionDisconnected to
pgStatSessionNormalDisconnected?

To avoid having to go back to the setting point and look it up in a

comment.

Long, descriptive names are a good thing.
I have decided to use "pgStatSessionDisconnectedNormally", since that is
even longer
and seems to fit the "yes or no" category better.

WFM.

I wonder if there would also be a way to count "sessions that crashed" as
well.

That is,the ones that failed in a way that caused the postmaster to

restart the system.

But that's information we'd have to send from the postmaster, but I'm

actually unsure

if we're "allowed" to send things to the stats collector from the

postmaster.

But I think it could be quite useful information to have. Maybe we can

find some way

to piggyback on the fact that we're restarting the stats collector as a

result?

Sure, a crash count would be useful. I don't know if it is easy for the
stats collector
to tell the difference between a start after a backend crash and - say -
starting from
a base backup.

Patch v6 attached.

I think that that would be material for another patch, and I don't think
it should go
to "pg_stat_database", because a) it might be hard to tell to which
database the crashed
backend was attached, b) it might be a background process that doesn't
belong to a database
and c) if the crash were caused by - say - corruption in a shared catalog,
it would be
misleading

I'm not sure it is outside the scope of this patch, because I think it
might be easier to do than I (and I think you) first thought. We don't need
to track which database crashed -- if we track all *other* ways a database
exits, then crashes are all that remains.

So in fact, we *almost* have all the data we need already. We have the
number of sessions started. We have the number of sessions "aborted". if we
also had the number of sessions that were closed normally, then whatever is
"left" would be the number of sessions crashed. And we do already, in your
patch, send the message in the case of both aborted and non-aborted
sessions. So we just need to keep track of both in the statsfile (which we
don't now), and we'd more or less have it, wouldn't we?

However, some thinking around that also leads me to another question which
is very much in scope for this patch regardless, which is what about
shutdown and admin termination. Right now, when you do a "pg_ctl stop" on
the database, all sessions count as aborted. Same thing for a
pg_terminate_backend(). I wonder if this is also a case that would be
useful to track as a separate thing? One could argue that the docs in your
patch say aborted means "terminated by something else than a regular client
disconnection". But that's true for a "shutdown", but not for a crash, so
whichever way we go with crashes it's slightly incorrect.

But thinking from a usability perspective, wouldn't what we want more be
something like <closed by correct disconnect>, <closed by abnormal
disconnect>, <closed by admin>, <crash>?

What do you think of adapting it to that?

Basically, that would change pgStatSessionDisconnectedNormally into instead
being an enum of reasons, which could be normal disconnect, abnormal
disconnect and admin. And we'd track all those three as separate numbers in
the stats file, meaning we could then calculate the crash by subtracting
all three from the total number of sessions?

(Let me know if you think the idea could work and would prefer it if I
worked up a complete suggestion based on it rather than just spitting ideas)

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#21Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Magnus Hagander (#20)
#22Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Laurenz Albe (#21)
#23Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Laurenz Albe (#22)
#24Magnus Hagander
magnus@hagander.net
In reply to: Laurenz Albe (#23)
#25Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Magnus Hagander (#24)
#26Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Laurenz Albe (#25)
#27Masahiro Ikeda
ikedamsh@oss.nttdata.com
In reply to: Laurenz Albe (#26)
#28Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Masahiro Ikeda (#27)
#29Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Masahiro Ikeda (#27)
#30Masahiro Ikeda
ikedamsh@oss.nttdata.com
In reply to: Laurenz Albe (#29)
#31Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Masahiro Ikeda (#30)
#32Masahiro Ikeda
ikedamsh@oss.nttdata.com
In reply to: Laurenz Albe (#31)
#33Magnus Hagander
magnus@hagander.net
In reply to: Laurenz Albe (#31)
#34Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Magnus Hagander (#33)
#35Magnus Hagander
magnus@hagander.net
In reply to: Laurenz Albe (#34)