Log connection establishment timings
Hi,
Users wishing to debug slow connection establishment have little
visibility into which steps take the most time. We don't expose any
stats and none of the logging includes durations.
The attached patch logs durations for authentication, forking the
Postgres backend, and the end-to-end connection establishment duration
starting from when the postmaster calls accept() and ending the first
time the forked backend is ready for query.
As an example of how this could be useful, I artificially slowed down
authentication and you can see that that is now visible in logs:
LOG: backend ready for query. pid=86341. socket=10. connection
establishment times (ms): total: 8, fork: 0, authentication: 0
LOG: backend ready for query. pid=86794. socket=10. connection
establishment times (ms): total: 108, fork: 0, authentication: 100
Two notes on implementation:
To make this portable, the timestamps captured in the postmaster
(socket creation time, fork initiation time) are passed through the
ClientSocket and BackendStartupData structures instead of simply saved
in backend local memory inherited by the child process.
Secondly, I used TimestampTz for the times, but it is pretty unclear
to me when instr_time should be used vs TimestampTz for such things.
instr_time says it is more portable, but there are all sorts of places
that use TimestampTz that would probably need to be portable.
Also, INSTR_TIME_SUBTRACT() isn't overflow safe, which must be because
the actual data type of ticks is platform dependent and so the caller
is responsible for not passing it values that would overflow when
subtracted.
- Melanie
Attachments:
v1-0001-Add-connection-establishment-duration-logging.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Add-connection-establishment-duration-logging.patchDownload+100-1
On Mon, 16 Dec 2024 at 22:00, Melanie Plageman
<melanieplageman@gmail.com> wrote:
Users wishing to debug slow connection establishment have little
visibility into which steps take the most time. We don't expose any
stats and none of the logging includes durations.
Two thoughts:
1. Would it make sense to also expose these timings in some pg_stat_xyz view?
2. As a user I'd be curious to know how much of the time is spent on
the network/client vs inside postgres. For example for the scram/sasl
handshake, how much of the authentication_time is spent waiting on the
first "read" after the server has called sendAuthRequest.
Hello,
Le lun. 16 déc. 2024 à 22:00, Melanie Plageman <melanieplageman@gmail.com>
a écrit :
Hi,
Users wishing to debug slow connection establishment have little
visibility into which steps take the most time. We don't expose any
stats and none of the logging includes durations.The attached patch logs durations for authentication, forking the
Postgres backend, and the end-to-end connection establishment duration
starting from when the postmaster calls accept() and ending the first
time the forked backend is ready for query.As an example of how this could be useful, I artificially slowed down
authentication and you can see that that is now visible in logs:LOG: backend ready for query. pid=86341. socket=10. connection
establishment times (ms): total: 8, fork: 0, authentication: 0
LOG: backend ready for query. pid=86794. socket=10. connection
establishment times (ms): total: 108, fork: 0, authentication: 100Two notes on implementation:
To make this portable, the timestamps captured in the postmaster
(socket creation time, fork initiation time) are passed through the
ClientSocket and BackendStartupData structures instead of simply saved
in backend local memory inherited by the child process.Secondly, I used TimestampTz for the times, but it is pretty unclear
to me when instr_time should be used vs TimestampTz for such things.
instr_time says it is more portable, but there are all sorts of places
that use TimestampTz that would probably need to be portable.
Also, INSTR_TIME_SUBTRACT() isn't overflow safe, which must be because
the actual data type of ticks is platform dependent and so the caller
is responsible for not passing it values that would overflow when
subtracted.
Just a quick note. I like this patch a lot and it would help to diagnose
some situations our customers may have. I tried the patch on my laptop, and
it works fine. I'll find some time to read the code as well, but in the
meantime, this looks like a nice thing to have in PostgreSQL.
Thanks Melanie.
--
Guillaume.
Hi,
On Mon, Jan 06, 2025 at 10:08:36PM +0100, Guillaume Lelarge wrote:
Hello,
Le lun. 16 d�c. 2024 � 22:00, Melanie Plageman <melanieplageman@gmail.com>
a �crit :Hi,
Users wishing to debug slow connection establishment have little
visibility into which steps take the most time. We don't expose any
stats and none of the logging includes durations.The attached patch logs durations for authentication, forking the
Postgres backend, and the end-to-end connection establishment duration
starting from when the postmaster calls accept() and ending the first
time the forked backend is ready for query.As an example of how this could be useful, I artificially slowed down
authentication and you can see that that is now visible in logs:LOG: backend ready for query. pid=86341. socket=10. connection
establishment times (ms): total: 8, fork: 0, authentication: 0
LOG: backend ready for query. pid=86794. socket=10. connection
establishment times (ms): total: 108, fork: 0, authentication: 100Two notes on implementation:
To make this portable, the timestamps captured in the postmaster
(socket creation time, fork initiation time) are passed through the
ClientSocket and BackendStartupData structures instead of simply saved
in backend local memory inherited by the child process.Secondly, I used TimestampTz for the times, but it is pretty unclear
to me when instr_time should be used vs TimestampTz for such things.
instr_time says it is more portable, but there are all sorts of places
that use TimestampTz that would probably need to be portable.
Also, INSTR_TIME_SUBTRACT() isn't overflow safe, which must be because
the actual data type of ticks is platform dependent and so the caller
is responsible for not passing it values that would overflow when
subtracted.Just a quick note. I like this patch a lot and it would help to diagnose
some situations our customers may have. I tried the patch on my laptop, and
it works fine. I'll find some time to read the code as well, but in the
meantime, this looks like a nice thing to have in PostgreSQL.
Also +1 on the idea.
The patch needed a rebase due to 34486b6092e. I did it in v2 attached (it's
a minor rebase due to the AmRegularBackendProcess() introduction in miscadmin.h).
v2 could rely on AmRegularBackendProcess() instead of AmClientBackendProcess()
but I kept it with AmClientBackendProcess() to reduce "my" changes as compared to
v1.
Regarding the TimestampTz vs instr_time choice, we have things like:
+ TimestampTz fork_time = ((BackendStartupData *) startup_data)->fork_time;
+ TimestampTz cur_time = GetCurrentTimestamp();
+
+ conn_timing.fork_duration = TimestampDifferenceMicroseconds(fork_time,
+ cur_time);
but looking at TimestampDifferenceMicroseconds():
"
/* If start_time is in the future or now, no time has elapsed */
if (start_time >= stop_time)
return 0;
"
I think that it can happen due to time changes.
So with TimestampTz, we would:
1. return 0 if we moved the time backward
2. provide an inflated duration including the time jump (if the time move
forward).
But with instr_time (and on systems that support CLOCK_MONOTONIC) then
pg_clock_gettime_ns() should not be affected by system time change IIUC.
Though time changes are "rare", given the fact that those metrics could provide
"inaccurate" measurements during that particular moment (time change) then it
might be worth considering instr_time instead for this particular metric.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v2-0001-Add-connection-establishment-duration-logging.patchtext/x-diff; charset=us-asciiDownload+100-1
On Mon, Jan 20, 2025 at 7:01 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
Though time changes are "rare", given the fact that those metrics could provide
"inaccurate" measurements during that particular moment (time change) then it
might be worth considering instr_time instead for this particular metric.
+1, I think a CLOCK_MONOTONIC source should be used for this if we've got it.
--
For the EXEC_BACKEND case (which, to be honest, I don't know much
about), I originally wondered if the fork_duration should include any
of the shared memory manipulations or library reloads that are done to
match Unix behavior. But I think I prefer the way the patch does it.
Can the current timestamp be recorded right at the beginning of
SubPostmasterMain(), to avoid counting the time setting up GUCs and
reading the variables file, or do we have to wait?
nit: conn_timing is currently declared in the "interrupts and crit
section" part of miscadmin.h; should it be moved down to the
general-purpose globals?
Thanks,
--Jacob
On Mon, Dec 16, 2024 at 6:26 PM Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
Two thoughts:
1. Would it make sense to also expose these timings in some pg_stat_xyz view?
So, right now there isn't an obvious place this would fit. There are
dynamic statistics views that are per connection -- like
pg_stat_ssl/gssapi. We could perhaps add authentication duration to
one of these. But I don't think this really helps users with the use
case I imagine for connection establishment logging. I thought that
people would look at the logs to see how their connection times have
changed over time -- for example if authentication started taking
longer. This also means a cumulative statistics view wouldn't be very
helpful. We could add a view to calculate connection establishment
median, standard deviation, average, etc, but that doesn't seem as
useful as the history of timings and how they have changed over that
time.
2. As a user I'd be curious to know how much of the time is spent on
the network/client vs inside postgres. For example for the scram/sasl
handshake, how much of the authentication_time is spent waiting on the
first "read" after the server has called sendAuthRequest.
Well, with the current patch you can compare authentication duration
to total connection establishment duration to see how much time was
spent on authentication vs the rest of connection establishment.
Or do you mean adding another more granular level of timing: after
sendAuthRequest until PerformAuthentication() returns?
I'm not totally clear what "read" means in this context.
- Melanie
Thanks for taking a look!
On Mon, Jan 20, 2025 at 10:01 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
The patch needed a rebase due to 34486b6092e. I did it in v2 attached (it's
a minor rebase due to the AmRegularBackendProcess() introduction in miscadmin.h).v2 could rely on AmRegularBackendProcess() instead of AmClientBackendProcess()
but I kept it with AmClientBackendProcess() to reduce "my" changes as compared to
v1.
Thanks for doing this! I have implemented your suggestion in attached v3.
Regarding the TimestampTz vs instr_time choice, we have things like:
< -- snip -- >
So with TimestampTz, we would:
1. return 0 if we moved the time backward
2. provide an inflated duration including the time jump (if the time move
forward).But with instr_time (and on systems that support CLOCK_MONOTONIC) then
pg_clock_gettime_ns() should not be affected by system time change IIUC.Though time changes are "rare", given the fact that those metrics could provide
"inaccurate" measurements during that particular moment (time change) then it
might be worth considering instr_time instead for this particular metric.
Great point. This all makes sense. I've switched to using instr_time in v3.
- Melanie
Attachments:
v3-0001-Add-connection-establishment-duration-logging.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Add-connection-establishment-duration-logging.patchDownload+80-1
Thanks for taking a look!
On Mon, Jan 20, 2025 at 12:53 PM Jacob Champion
<jacob.champion@enterprisedb.com> wrote:
On Mon, Jan 20, 2025 at 7:01 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:Though time changes are "rare", given the fact that those metrics could provide
"inaccurate" measurements during that particular moment (time change) then it
might be worth considering instr_time instead for this particular metric.+1, I think a CLOCK_MONOTONIC source should be used for this if we've got it.
Done in v3 (see [1]/messages/by-id/CAAKRu_YrNsA7-v5L9d318XZu+tPqcxp+ctCGy2EGYrSt69ON=A@mail.gmail.com).
For the EXEC_BACKEND case (which, to be honest, I don't know much
about), I originally wondered if the fork_duration should include any
of the shared memory manipulations or library reloads that are done to
match Unix behavior. But I think I prefer the way the patch does it.
You mean if the EXEC_BACKEND not defined version should calculate the
end of fork_duration basically at the end of
postmaster_child_launch()?
Can the current timestamp be recorded right at the beginning of
SubPostmasterMain(), to avoid counting the time setting up GUCs and
reading the variables file, or do we have to wait?
We actually don't have the startup data until after we
read_backend_variables(), so I did it as soon as I could after that.
You are right that this will include timing from some extra steps.
But, some of these steps are overhead unique to the slow process of
"forking" a backend when EXEC_BACKEND is defined anyway, so it
probably makes sense for them to be included in the timing of "fork"
for these backends.
nit: conn_timing is currently declared in the "interrupts and crit
section" part of miscadmin.h; should it be moved down to the
general-purpose globals?
Whoops, it would help if I read comments and stuff. Thanks! Fixed in v3 in [1]/messages/by-id/CAAKRu_YrNsA7-v5L9d318XZu+tPqcxp+ctCGy2EGYrSt69ON=A@mail.gmail.com.
- Melanie
[1]: /messages/by-id/CAAKRu_YrNsA7-v5L9d318XZu+tPqcxp+ctCGy2EGYrSt69ON=A@mail.gmail.com
On Tue, Feb 25, 2025 at 3:23 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
Thanks for doing this! I have implemented your suggestion in attached v3.
I missed an include in the EXEC_BACKEND not defined case. attached v4
is fixed up.
- Melanie
Attachments:
v4-0001-Add-connection-establishment-duration-logging.patchtext/x-patch; charset=US-ASCII; name=v4-0001-Add-connection-establishment-duration-logging.patchDownload+82-1
On 2025/02/26 6:36, Melanie Plageman wrote:
On Tue, Feb 25, 2025 at 3:23 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:Thanks for doing this! I have implemented your suggestion in attached v3.
I missed an include in the EXEC_BACKEND not defined case. attached v4
is fixed up.
Thanks for updating the patch!
+ /* Capture time Postmaster initiates fork for logging */
+ if (child_type == B_BACKEND)
+ INSTR_TIME_SET_CURRENT(((BackendStartupData *) startup_data)->fork_time);
When log_connections is enabled, walsender connections are also logged.
However, with the patch, it seems the connection time for walsenders isn't captured.
Is this intentional?
With the current patch, when log_connections is enabled, the connection time is always
captured, and which might introduce performance overhead. No? Some users who enable
log_connections may not want this extra detail and want to avoid such overhead.
So, would it make sense to extend log_connections with a new option like "timing" and
log the connection time only when "timing" is specified?
+ ereport(LOG,
+ errmsg("backend ready for query. pid=%d. socket=%d. connection establishment times (ms): total: %ld, fork: %ld, authentication: %ld",
+ MyProcPid,
+ (int) MyClientSocket->sock,
Why expose the socket's file descriptor? I'm not sure how users would use that information.
Including the PID seems unnecessary since it's already available via log_line_prefix with %p?
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Hi,
On Wed, Feb 26, 2025 at 01:46:19PM +0900, Fujii Masao wrote:
On 2025/02/26 6:36, Melanie Plageman wrote:
On Tue, Feb 25, 2025 at 3:23 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:Thanks for doing this! I have implemented your suggestion in attached v3.
Thanks for the new patch version!
+ /* Capture time Postmaster initiates fork for logging */ + if (child_type == B_BACKEND) + INSTR_TIME_SET_CURRENT(((BackendStartupData *) startup_data)->fork_time);When log_connections is enabled, walsender connections are also logged.
However, with the patch, it seems the connection time for walsenders isn't captured.
Is this intentional?
Good point. I'm tempted to say that it should also be, specially because a
connection done as "psql replication=database" is of "walsender" backend type and
would not be reported.
With the current patch, when log_connections is enabled, the connection time is always
captured, and which might introduce performance overhead. No? Some users who enable
log_connections may not want this extra detail and want to avoid such overhead.
So, would it make sense to extend log_connections with a new option like "timing" and
log the connection time only when "timing" is specified?
+1, I also think it's a good idea to let users decide if they want the timing
measurement overhead (and it's common practice with track_io_timing,
track_wal_io_timing, the newly track_cost_delay_timing for example)
+ ereport(LOG, + errmsg("backend ready for query. pid=%d. socket=%d. connection establishment times (ms): total: %ld, fork: %ld, authentication: %ld", + MyProcPid, + (int) MyClientSocket->sock,Why expose the socket's file descriptor? I'm not sure how users would use that information.
Including the PID seems unnecessary since it's already available via log_line_prefix with %p?
Yeah, we would get things like:
[1111539]: LOG: backend ready for query. pid=1111539. socket=9. connection establishment times (ms): total: 2, fork: 0, authentication: 0
[1111539]: LOG: backend ready for query. pid=1111539. socket=9. connection establishment times (ms): total: 2, fork: 0, authentication: 0
[1111539]: LOG: backend ready for query. pid=1111539. socket=9. connection establishment times (ms): total: 2, fork: 0, authentication: 0
[1111539]: LOG: backend ready for query. pid=1111539. socket=9. connection establishment times (ms): total: 2, fork: 0, authentication: 0
I also wonder if "backend ready for query" is worth it. Maybe something like:
2025-02-26 06:44:23.265 UTC [1111539]LOG: backend ready for query. pid=1111539. socket=9. connection establishment times (ms): total: 2, fork: 0, authentication: 0 LOG: connection establishment times (ms): total: 2, fork: 0, authentication: 0
would be good enough?
A few random comments:
=== 1
+typedef struct ConnectionTiming
+{
+ instr_time fork_duration;
+ instr_time auth_duration;
+} ConnectionTiming;
As it's all about instr_time, I wonder if we could use an enum + array instead.
That's probably just a matter of taste but that sounds more flexible to extend
(should we want to add more timing in the future).
=== 2
+ConnectionTiming conn_timing = {0};
There is no padding in ConnectionTiming and anyway we just access its fields
so that's ok to initialize that way.
=== 3
Add a few words in the log_connections GUC doc? (anyway we will have to if
Fujii-san idea above about the timing is implemented)
=== 4
+ /* Calculate total fork duration in child backend for logging */
+ if (child_type == B_BACKEND)
+ {
+ INSTR_TIME_SET_CURRENT(conn_timing.fork_duration);
+ INSTR_TIME_SUBTRACT(conn_timing.fork_duration,
+ ((BackendStartupData *) startup_data)->fork_time);
+ }
+
/* Close the postmaster's sockets */
ClosePostmasterPorts(child_type == B_LOGGER);
@@ -618,6 +630,14 @@ SubPostmasterMain(int argc, char *argv[])
/* Read in the variables file */
read_backend_variables(argv[2], &startup_data, &startup_data_len);
+ /* Calculate total fork duration in child backend for logging */
+ if (child_type == B_BACKEND)
+ {
+ INSTR_TIME_SET_CURRENT(conn_timing.fork_duration);
+ INSTR_TIME_SUBTRACT(conn_timing.fork_duration,
+ ((BackendStartupData *) startup_data)->fork_time);
+ }
worth to add a helper function to avoid code duplication?
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On 26/02/2025 08:41, Bertrand Drouvot wrote:
Hi,
On Wed, Feb 26, 2025 at 01:46:19PM +0900, Fujii Masao wrote:
On 2025/02/26 6:36, Melanie Plageman wrote:
On Tue, Feb 25, 2025 at 3:23 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:Thanks for doing this! I have implemented your suggestion in attached v3.
Thanks for the new patch version!
+1
+ /* Capture time Postmaster initiates fork for logging */ + if (child_type == B_BACKEND) + INSTR_TIME_SET_CURRENT(((BackendStartupData *) startup_data)->fork_time);When log_connections is enabled, walsender connections are also logged.
However, with the patch, it seems the connection time for walsenders isn't captured.
Is this intentional?Good point. I'm tempted to say that it should also be, specially because a
connection done as "psql replication=database" is of "walsender" backend type and
would not be reported.
Agreed.
With the current patch, when log_connections is enabled, the connection time is always
captured, and which might introduce performance overhead. No? Some users who enable
log_connections may not want this extra detail and want to avoid such overhead.
So, would it make sense to extend log_connections with a new option like "timing" and
log the connection time only when "timing" is specified?+1, I also think it's a good idea to let users decide if they want the timing
measurement overhead (and it's common practice with track_io_timing,
track_wal_io_timing, the newly track_cost_delay_timing for example)
track_connection_delay_timing ? I'm fine with this, but I'm a bit afraid
that it will lead us to an awful lot of GUCs for simple things.
+ ereport(LOG, + errmsg("backend ready for query. pid=%d. socket=%d. connection establishment times (ms): total: %ld, fork: %ld, authentication: %ld", + MyProcPid, + (int) MyClientSocket->sock,Why expose the socket's file descriptor? I'm not sure how users would use that information.
Including the PID seems unnecessary since it's already available via log_line_prefix with %p?
Yeah, we would get things like:
[1111539] LOG: connection received: host=[local]
[1111539] LOG: connection authenticated: user="postgres" method=trust (/home/postgres/postgresql/pg_installed/pg18/data/pg_hba.conf:117)
[1111539] LOG: connection authorized: user=postgres database=postgres application_name=psql
[1111539] LOG: backend ready for query. pid=1111539. socket=9. connection establishment times (ms): total: 2, fork: 0, authentication: 0I also wonder if "backend ready for query" is worth it. Maybe something like:
2025-02-26 06:44:23.265 UTC [1111539] LOG: connection establishment times (ms): total: 2, fork: 0, authentication: 0
would be good enough?
Sounds definitely better to me.
A few random comments:
=== 1
+typedef struct ConnectionTiming +{ + instr_time fork_duration; + instr_time auth_duration; +} ConnectionTiming;As it's all about instr_time, I wonder if we could use an enum + array instead.
That's probably just a matter of taste but that sounds more flexible to extend
(should we want to add more timing in the future).=== 2
+ConnectionTiming conn_timing = {0};
There is no padding in ConnectionTiming and anyway we just access its fields
so that's ok to initialize that way.=== 3
Add a few words in the log_connections GUC doc? (anyway we will have to if
Fujii-san idea above about the timing is implemented)=== 4
+ /* Calculate total fork duration in child backend for logging */ + if (child_type == B_BACKEND) + { + INSTR_TIME_SET_CURRENT(conn_timing.fork_duration); + INSTR_TIME_SUBTRACT(conn_timing.fork_duration, + ((BackendStartupData *) startup_data)->fork_time); + } + /* Close the postmaster's sockets */ ClosePostmasterPorts(child_type == B_LOGGER);@@ -618,6 +630,14 @@ SubPostmasterMain(int argc, char *argv[])
/* Read in the variables file */
read_backend_variables(argv[2], &startup_data, &startup_data_len);+ /* Calculate total fork duration in child backend for logging */ + if (child_type == B_BACKEND) + { + INSTR_TIME_SET_CURRENT(conn_timing.fork_duration); + INSTR_TIME_SUBTRACT(conn_timing.fork_duration, + ((BackendStartupData *) startup_data)->fork_time); + }worth to add a helper function to avoid code duplication?
Regards,
--
Guillaume Lelarge
Consultant
https://dalibo.com
Thanks for the review!
On Tue, Feb 25, 2025 at 11:46 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
On 2025/02/26 6:36, Melanie Plageman wrote:
On Tue, Feb 25, 2025 at 3:23 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:+ /* Capture time Postmaster initiates fork for logging */ + if (child_type == B_BACKEND) + INSTR_TIME_SET_CURRENT(((BackendStartupData *) startup_data)->fork_time);When log_connections is enabled, walsender connections are also logged.
However, with the patch, it seems the connection time for walsenders isn't captured.
Is this intentional?
Ah, great point. It was not intentional. I've added a check for wal
sender to the attached v5.
Are these the only backend types that establish connections from outside?
This makes me wonder if I don't need these checks (for backend type)
before capturing the current time in PerformAuthentication() -- that
is, if they are performing authentication, aren't they inherently one
of these backend types?
With the current patch, when log_connections is enabled, the connection time is always
captured, and which might introduce performance overhead. No? Some users who enable
log_connections may not want this extra detail and want to avoid such overhead.
So, would it make sense to extend log_connections with a new option like "timing" and
log the connection time only when "timing" is specified?
Ah yes, I did this intentionally originally because I thought someone
might change the value of log_connections in between the start and end
of the duration. I now see you cannot change log_connections after
connection start. So, I can guard these behind log_connections (done
in attached v5).
I hesitate to have a separate guc controlling whether or not we log
connection timing. If we add far more instances of getting the current
time, then perhaps it makes sense. But, as it is, we are adding six
system calls that take on the order of nanoseconds (esp if using
clock_gettime()), whereas emitting each log message -- which
log_connections allows -- will take on the order of micro or even
milliseconds.
+ ereport(LOG, + errmsg("backend ready for query. pid=%d. socket=%d. connection establishment times (ms): total: %ld, fork: %ld, authentication: %ld", + MyProcPid, + (int) MyClientSocket->sock,Why expose the socket's file descriptor? I'm not sure how users would use that information.
I originally included the socket fd because I thought we might end up
printing the times instead of the durations and then users would have
to parse them to get the durations and would need a way to uniquely
identify a connection. This would ideally be a combination of client
address, client port, server address, server port -- but those are
harder to print out (due to IP versions, etc) and harder to parse.
Also, I did notice other places printing the socket (like
BackendStartup() after forking).
Since this version is just printing one message, I have removed the socket fd.
Including the PID seems unnecessary since it's already available via log_line_prefix with %p?
Ah, great point. I've removed that from the log message in the attached version
- Melanie
Attachments:
v5-0001-Add-connection-establishment-duration-logging.patchtext/x-patch; charset=US-ASCII; name=v5-0001-Add-connection-establishment-duration-logging.patchDownload+92-1
Thanks for the continued review!
On Wed, Feb 26, 2025 at 2:41 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
On Wed, Feb 26, 2025 at 01:46:19PM +0900, Fujii Masao wrote:
With the current patch, when log_connections is enabled, the connection time is always
captured, and which might introduce performance overhead. No? Some users who enable
log_connections may not want this extra detail and want to avoid such overhead.
So, would it make sense to extend log_connections with a new option like "timing" and
log the connection time only when "timing" is specified?+1, I also think it's a good idea to let users decide if they want the timing
measurement overhead (and it's common practice with track_io_timing,
track_wal_io_timing, the newly track_cost_delay_timing for example)
It seems to me like the extra timing collected and the one additional
log message isn't enough overhead to justify its own guc (for now).
Including the PID seems unnecessary since it's already available via log_line_prefix with %p?
Yeah, we would get things like:
[1111539] LOG: connection received: host=[local]
[1111539] LOG: connection authenticated: user="postgres" method=trust (/home/postgres/postgresql/pg_installed/pg18/data/pg_hba.conf:117)
[1111539] LOG: connection authorized: user=postgres database=postgres application_name=psql
[1111539] LOG: backend ready for query. pid=1111539. socket=9. connection establishment times (ms): total: 2, fork: 0, authentication: 0I also wonder if "backend ready for query" is worth it. Maybe something like:
2025-02-26 06:44:23.265 UTC [1111539] LOG: connection establishment times (ms): total: 2, fork: 0, authentication: 0
would be good enough?
Yes, thank you. v5 attached in [1]/messages/by-id/CAAKRu_Y9sgZAWCiQoHtpwx6Mv28fBGav5ztrWyeSrx+B=ACN6g@mail.gmail.com changes the wording as you recommend..
+typedef struct ConnectionTiming +{ + instr_time fork_duration; + instr_time auth_duration; +} ConnectionTiming;As it's all about instr_time, I wonder if we could use an enum + array instead.
That's probably just a matter of taste but that sounds more flexible to extend
(should we want to add more timing in the future).
I think we can change it later if we add many more. For now I prefer
the clarity of accessing members by name. Especially because we don't
have any code yet that loops through all of them or anything like
that.
+ConnectionTiming conn_timing = {0};
There is no padding in ConnectionTiming and anyway we just access its fields
so that's ok to initialize that way.
Yes, this properly zero initializes the struct. In fact it shouldn't
be needed since a global like this should be zero initialized. But all
of the globals defined above conn_timing zero initialize themselves,
so I thought I would be consistent with them.
Add a few words in the log_connections GUC doc? (anyway we will have to if
Fujii-san idea above about the timing is implemented)
I forgot to do this in v5 attached in [1]/messages/by-id/CAAKRu_Y9sgZAWCiQoHtpwx6Mv28fBGav5ztrWyeSrx+B=ACN6g@mail.gmail.com. Let me go ahead and do this next.
+ /* Calculate total fork duration in child backend for logging */ + if (child_type == B_BACKEND) + { + INSTR_TIME_SET_CURRENT(conn_timing.fork_duration); + INSTR_TIME_SUBTRACT(conn_timing.fork_duration, + ((BackendStartupData *) startup_data)->fork_time); + } + /* Close the postmaster's sockets */ ClosePostmasterPorts(child_type == B_LOGGER);@@ -618,6 +630,14 @@ SubPostmasterMain(int argc, char *argv[])
/* Read in the variables file */
read_backend_variables(argv[2], &startup_data, &startup_data_len);+ /* Calculate total fork duration in child backend for logging */ + if (child_type == B_BACKEND) + { + INSTR_TIME_SET_CURRENT(conn_timing.fork_duration); + INSTR_TIME_SUBTRACT(conn_timing.fork_duration, + ((BackendStartupData *) startup_data)->fork_time); + }worth to add a helper function to avoid code duplication?
I've added INSTR_TIME_GET_DURATION_SINCE(start_time). Which I like
because it seems generally useful. It does not however cut down on
LOC, so I'm somewhat on the fence.
- Melanie
[1]: /messages/by-id/CAAKRu_Y9sgZAWCiQoHtpwx6Mv28fBGav5ztrWyeSrx+B=ACN6g@mail.gmail.com
On Wed, Feb 26, 2025 at 1:45 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
Thanks for the continued review!
On Wed, Feb 26, 2025 at 2:41 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:Add a few words in the log_connections GUC doc? (anyway we will have to if
Fujii-san idea above about the timing is implemented)I forgot to do this in v5 attached in [1]. Let me go ahead and do this next.
I took a stab at this in attached v6. I feel that what I have is a bit
stilted, but I'm not sure how to fix it.
- Melanie
Attachments:
v6-0001-Add-connection-establishment-duration-logging.patchtext/x-patch; charset=US-ASCII; name=v6-0001-Add-connection-establishment-duration-logging.patchDownload+93-1
Hi,
On Wed, Feb 26, 2025 at 01:45:39PM -0500, Melanie Plageman wrote:
Thanks for the continued review!
On Wed, Feb 26, 2025 at 2:41 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:On Wed, Feb 26, 2025 at 01:46:19PM +0900, Fujii Masao wrote:
With the current patch, when log_connections is enabled, the connection time is always
captured, and which might introduce performance overhead. No? Some users who enable
log_connections may not want this extra detail and want to avoid such overhead.
So, would it make sense to extend log_connections with a new option like "timing" and
log the connection time only when "timing" is specified?+1, I also think it's a good idea to let users decide if they want the timing
measurement overhead (and it's common practice with track_io_timing,
track_wal_io_timing, the newly track_cost_delay_timing for example)It seems to me like the extra timing collected and the one additional
log message isn't enough overhead to justify its own guc (for now).
Agree. IIUC, I think that Fujii-san's idea was to extend log_connections with
a new option "timing" (i.e move it from ConfigureNamesBool to say
ConfigureNamesEnum with say on, off and timing?). I think that's a good idea.
I just did a quick check and changing a GUC from ConfigureNamesBool to
ConfigureNamesEnum is something that has already been done in the past (see
240067b3b0f and ffd37740ee6 for example).
In my previous up-thead message, I did not mean to suggest to add a new GUC,
just saying that when new "timing" is measured then users have the choice to
enable or disable it.
Including the PID seems unnecessary since it's already available via log_line_prefix with %p?
Yeah, we would get things like:
[1111539] LOG: connection received: host=[local]
[1111539] LOG: connection authenticated: user="postgres" method=trust (/home/postgres/postgresql/pg_installed/pg18/data/pg_hba.conf:117)
[1111539] LOG: connection authorized: user=postgres database=postgres application_name=psql
[1111539] LOG: backend ready for query. pid=1111539. socket=9. connection establishment times (ms): total: 2, fork: 0, authentication: 0I also wonder if "backend ready for query" is worth it. Maybe something like:
2025-02-26 06:44:23.265 UTC [1111539] LOG: connection establishment times (ms): total: 2, fork: 0, authentication: 0
would be good enough?
Yes, thank you. v5 attached in [1] changes the wording as you recommend..
Thanks for the updated version!
+typedef struct ConnectionTiming +{ + instr_time fork_duration; + instr_time auth_duration; +} ConnectionTiming;As it's all about instr_time, I wonder if we could use an enum + array instead.
That's probably just a matter of taste but that sounds more flexible to extend
(should we want to add more timing in the future).I think we can change it later if we add many more. For now I prefer
the clarity of accessing members by name. Especially because we don't
have any code yet that loops through all of them or anything like
that.
Yeah makes sense.
Add a few words in the log_connections GUC doc? (anyway we will have to if
Fujii-san idea above about the timing is implemented)I took a stab at this in attached v6. I feel that what I have is a bit
stilted, but I'm not sure how to fix it.
Yeah, that might be easier to reason about if we're going with Fujii-san
suggestion to extend log_connections with a new option?
+ /* Calculate total fork duration in child backend for logging */ + if (child_type == B_BACKEND) + { + INSTR_TIME_SET_CURRENT(conn_timing.fork_duration); + INSTR_TIME_SUBTRACT(conn_timing.fork_duration, + ((BackendStartupData *) startup_data)->fork_time); + } + /* Close the postmaster's sockets */ ClosePostmasterPorts(child_type == B_LOGGER);@@ -618,6 +630,14 @@ SubPostmasterMain(int argc, char *argv[])
/* Read in the variables file */
read_backend_variables(argv[2], &startup_data, &startup_data_len);+ /* Calculate total fork duration in child backend for logging */ + if (child_type == B_BACKEND) + { + INSTR_TIME_SET_CURRENT(conn_timing.fork_duration); + INSTR_TIME_SUBTRACT(conn_timing.fork_duration, + ((BackendStartupData *) startup_data)->fork_time); + }worth to add a helper function to avoid code duplication?
I've added INSTR_TIME_GET_DURATION_SINCE(start_time). Which I like
because it seems generally useful.
Great idea! Could probably be used in other places but I did not check and
it's outside the scope of this thread anyway.
It does not however cut down on LOC, so I'm somewhat on the fence.
I think that's somehow also around code maintenance (not only LOC), say for example
if we want to add more "child_type" in the check (no need to remember to update both
locations).
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Thu, Feb 27, 2025 at 1:50 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
Agree. IIUC, I think that Fujii-san's idea was to extend log_connections with
a new option "timing" (i.e move it from ConfigureNamesBool to say
ConfigureNamesEnum with say on, off and timing?). I think that's a good idea.I just did a quick check and changing a GUC from ConfigureNamesBool to
ConfigureNamesEnum is something that has already been done in the past (see
240067b3b0f and ffd37740ee6 for example).In my previous up-thead message, I did not mean to suggest to add a new GUC,
just saying that when new "timing" is measured then users have the choice to
enable or disable it.
I was just talking to Andres off-list and he mentioned that the volume
of log_connections messages added in recent releases can really be a
problem for users. He said ideally we would emit one message which
consolidated these (and make sure we did so for failed connections too
detailing the successfully completed stages).
However, since that is a bigger project (with more refactoring, etc),
he suggested that we change log_connections to a GUC_LIST
(ConfigureNamesString) with options like "none", "received,
"authenticated", "authorized", "all".
Then we could add one like "established" for the final message and
timings my patch set adds. I think the overhead of an additional log
message being emitted probably outweighs the overhead of taking those
additional timings.
String GUCs are a lot more work than enum GUCs, so I was thinking if
there is a way to do it as an enum.
I think we want the user to be able to specify a list of all the log
messages they want included, not just have each one include the
previous ones. So, then it probably has to be a list right? There is
no good design that would fit as an enum.
I've added INSTR_TIME_GET_DURATION_SINCE(start_time). Which I like
because it seems generally useful.Great idea! Could probably be used in other places but I did not check and
it's outside the scope of this thread anyway.It does not however cut down on LOC, so I'm somewhat on the fence.
I think that's somehow also around code maintenance (not only LOC), say for example
if we want to add more "child_type" in the check (no need to remember to update both
locations).
I didn't include checking the child_type in that function since it is
unrelated to instr_time, so it sadly wouldn't help with that. We could
macro-ize the child_type check were we to add another child_type.
- Melanie
Hi,
On 2025-02-27 06:50:41 +0000, Bertrand Drouvot wrote:
On Wed, Feb 26, 2025 at 01:45:39PM -0500, Melanie Plageman wrote:
Thanks for the continued review!
On Wed, Feb 26, 2025 at 2:41 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:On Wed, Feb 26, 2025 at 01:46:19PM +0900, Fujii Masao wrote:
With the current patch, when log_connections is enabled, the connection time is always
captured, and which might introduce performance overhead. No? Some users who enable
log_connections may not want this extra detail and want to avoid such overhead.
So, would it make sense to extend log_connections with a new option like "timing" and
log the connection time only when "timing" is specified?+1, I also think it's a good idea to let users decide if they want the timing
measurement overhead (and it's common practice with track_io_timing,
track_wal_io_timing, the newly track_cost_delay_timing for example)It seems to me like the extra timing collected and the one additional
log message isn't enough overhead to justify its own guc (for now).Agree. IIUC, I think that Fujii-san's idea was to extend log_connections with
a new option "timing" (i.e move it from ConfigureNamesBool to say
ConfigureNamesEnum with say on, off and timing?). I think that's a good idea.
I don't think the timing overhead is a relevant factor here - compared to the
fork of a new connection or performing authentication the cost of taking a few
timestamps is neglegible. A timestamp costs 10s to 100s of cycles, a fork many
many millions. Even if you have a really slow timestamp function, it's still
going to be way way cheaper.
Greetings,
Andres Freund
Hi,
On 2025-02-27 11:08:04 -0500, Melanie Plageman wrote:
I was just talking to Andres off-list and he mentioned that the volume
of log_connections messages added in recent releases can really be a
problem for users.
For some added color: I've seen plenty systems where almost all the log volume
is log_connection messages, which they *have* to enable for various regulatory
reasons. It'd still be a lot if we just emitted one message for each
connection, but logging three (and possibly four with $thread), for each
connection makes it substantially worse.
He said ideally we would emit one message which consolidated these (and make
sure we did so for failed connections too detailing the successfully
completed stages).
A combined message would also not *quite* replace all use-cases, e.g. if you
want to debug arriving connections or auth problems you do want the additional
messages. But yea, for normal operation, I do think most folks want just one
message.
However, since that is a bigger project (with more refactoring, etc),
he suggested that we change log_connections to a GUC_LIST
(ConfigureNamesString) with options like "none", "received,
"authenticated", "authorized", "all".
Yep.
Then we could add one like "established" for the final message and
timings my patch set adds. I think the overhead of an additional log
message being emitted probably outweighs the overhead of taking those
additional timings.
To bikeshed a bit: "established" could be the TCP connection establishment
just as well. I'd go for "completed" or "timings".
String GUCs are a lot more work than enum GUCs, so I was thinking if
there is a way to do it as an enum.I think we want the user to be able to specify a list of all the log
messages they want included, not just have each one include the
previous ones. So, then it probably has to be a list right? There is
no good design that would fit as an enum.
I don't see a way to comfortably shove this into an enum either.
Greetings,
Andres Freund
On Thu, Feb 27, 2025 at 11:30 AM Andres Freund <andres@anarazel.de> wrote:
On 2025-02-27 11:08:04 -0500, Melanie Plageman wrote:
However, since that is a bigger project (with more refactoring, etc),
he suggested that we change log_connections to a GUC_LIST
(ConfigureNamesString) with options like "none", "received,
"authenticated", "authorized", "all".Yep.
I've done a draft of this in attached v7 (see 0001). It still needs
polishing (e.g. I need to figure out where to put the new guc hook
functions and enums and such), but I want to see if this is a viable
direction forward.
I'm worried the list of possible connection log messages could get
unwieldy if we add many more.
- Melanie