Patch proposal: New hooks in the connection path
Hi hackers,
While commit 960869da08 added some information about connections that
have been successfully authenticated, there is no metrics for
connections that have not (or did not reached the authentication stage).
Adding metrics about failed connections attempts could also help, for
example with proper sampling, to:
* detect spikes in failed login attempts
* check if there is a correlation between spikes in successful and
failed connection attempts
While the number of successful connections could also already been
tracked with the ClientAuthentication_hook (and also the ones that
failed the authentication) we are missing metrics about:
* why the connection failed (could be bad password, bad database, bad
user, missing CONNECT privilege...)
* number of times the authentication stage has not been reached
* why the authentication stage has not been reached (bad startup
packets, timeout while processing startup packet,...)
Those missing metrics (in addition to the ones that can be already
gathered) could provide value for:
* security investigations
* anomalies detections
* tracking application misconfigurations
In an attempt to be able to provide those metrics, please find attached
a patch proposal to add new hooks in the connection path, that would be
fired if:
* there is a bad startup packet
* there is a timeout while processing the startup packet
* user does not have CONNECT privilege
* database does not exist
For safety those hooks request the use of a const Port parameter, so
that they could be used only for reporting purpose (for example, we are
working on an extension to record detailed login metrics counters).
Another option could be to add those metrics in the engine itself
(instead of providing new hooks to get them), but the new hooks option
gives more flexibility on how to render and exploit them (there is a lot
of information in the Port Struct that one could be interested with).
I’m adding this patch proposal to the commitfest.
Looking forward to your feedback,
Regards,
Bertrand
Attachments:
v1-0001-connection_hooks.patchtext/plain; charset=UTF-8; name=v1-0001-connection_hooks.patchDownload+45-1
On Thu, Jun 30, 2022 at 1:31 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
Hi hackers,
While commit 960869da08 added some information about connections that have been successfully authenticated, there is no metrics for connections that have not (or did not reached the authentication stage).
Adding metrics about failed connections attempts could also help, for example with proper sampling, to:
detect spikes in failed login attempts
check if there is a correlation between spikes in successful and failed connection attemptsWhile the number of successful connections could also already been tracked with the ClientAuthentication_hook (and also the ones that failed the authentication) we are missing metrics about:
why the connection failed (could be bad password, bad database, bad user, missing CONNECT privilege...)
number of times the authentication stage has not been reached
why the authentication stage has not been reached (bad startup packets, timeout while processing startup packet,...)Those missing metrics (in addition to the ones that can be already gathered) could provide value for:
security investigations
anomalies detections
tracking application misconfigurationsIn an attempt to be able to provide those metrics, please find attached a patch proposal to add new hooks in the connection path, that would be fired if:
there is a bad startup packet
there is a timeout while processing the startup packet
user does not have CONNECT privilege
database does not existFor safety those hooks request the use of a const Port parameter, so that they could be used only for reporting purpose (for example, we are working on an extension to record detailed login metrics counters).
Another option could be to add those metrics in the engine itself (instead of providing new hooks to get them), but the new hooks option gives more flexibility on how to render and exploit them (there is a lot of information in the Port Struct that one could be interested with).
I’m adding this patch proposal to the commitfest.
Looking forward to your feedback,
+1 for the idea. I've seen numerous cases where the login metrics
(especially failed connections) are handy in analyzing stuff. And I'm
okay with the hook approach than the postgres emitting the necessary
metrics. However, I'm personally not okay with having multiple hooks
as proposed in the v1 patch. Can we think of having a single hook or
enhancing the existing ClientAuthentication_hook where we pass a
PURPOSE parameter (CONN_SUCCESS, CONN_FAILURE, CONN_FOO, CONN_BAR
....) tp the hook? With this approach, we don't need to spread out the
postgres code with many hooks and the hook implementers will look at
the PURPOSE parameter and deal with it accordingly.
On the security aspect, we must ensure we don't leak any sensitive
information such as password or SSH key to the new hook - if PGPORT
has this information, maybe we need to mask that structure a bit
before handing it off to the hook.
Regards,
Bharath Rupireddy.
Hi,
On 6/30/22 11:23 AM, Bharath Rupireddy wrote:
On Thu, Jun 30, 2022 at 1:31 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
Hi hackers,
While commit 960869da08 added some information about connections that have been successfully authenticated, there is no metrics for connections that have not (or did not reached the authentication stage).
Adding metrics about failed connections attempts could also help, for example with proper sampling, to:
detect spikes in failed login attempts
check if there is a correlation between spikes in successful and failed connection attemptsWhile the number of successful connections could also already been tracked with the ClientAuthentication_hook (and also the ones that failed the authentication) we are missing metrics about:
why the connection failed (could be bad password, bad database, bad user, missing CONNECT privilege...)
number of times the authentication stage has not been reached
why the authentication stage has not been reached (bad startup packets, timeout while processing startup packet,...)Those missing metrics (in addition to the ones that can be already gathered) could provide value for:
security investigations
anomalies detections
tracking application misconfigurationsIn an attempt to be able to provide those metrics, please find attached a patch proposal to add new hooks in the connection path, that would be fired if:
there is a bad startup packet
there is a timeout while processing the startup packet
user does not have CONNECT privilege
database does not existFor safety those hooks request the use of a const Port parameter, so that they could be used only for reporting purpose (for example, we are working on an extension to record detailed login metrics counters).
Another option could be to add those metrics in the engine itself (instead of providing new hooks to get them), but the new hooks option gives more flexibility on how to render and exploit them (there is a lot of information in the Port Struct that one could be interested with).
I’m adding this patch proposal to the commitfest.
Looking forward to your feedback,+1 for the idea. I've seen numerous cases where the login metrics
(especially failed connections) are handy in analyzing stuff. And I'm
okay with the hook approach than the postgres emitting the necessary
metrics.
Thanks for looking at it!
However, I'm personally not okay with having multiple hooks
as proposed in the v1 patch.
I agree that it would be great to reduce the number of proposed hooks.
But,
Can we think of having a single hook
The proposed hooks are triggered during errors (means that the
connection attempt break) and:
- In the connection paths that will not reach the
ClientAuthentication_hook at all: those are the ones related to the bad
startup packet and timeout while processing the startup packet.
or
- After the ClientAuthentication_hook is fired: those are the bad db
oid, bad db name and bad perm ones.
So, It does look like having only one hook would require refactoring in
the connection path and I'm not sure if this is worth it.
or
enhancing the existing ClientAuthentication_hook where we pass a
PURPOSE parameter (CONN_SUCCESS, CONN_FAILURE, CONN_FOO, CONN_BAR
....) tp the hook?
I think one could already "predict" the bad db and bad perm errors
within the current ClientAuthentication_hook.
But in case of multiple "possible" errors (within the same connection
attempt) how could we know for sure the one that will be actually
reported? That's why i think the best way is to put new hooks as close
as possible to the place where the related errors are reported.
What do you think?
With this approach, we don't need to spread out the
postgres code with many hooks and the hook implementers will look at
the PURPOSE parameter and deal with it accordingly.On the security aspect, we must ensure we don't leak any sensitive
information such as password or SSH key to the new hook - if PGPORT
has this information, maybe we need to mask that structure a bit
before handing it off to the hook.
Yeah good point, I'll have a closer look if there is any information
that could be present in the Port struct before the
ClientAuthentication_hook is fired (means during the ones that are
related to the startup packet) and that we would want to mask.
Regards,
Bertrand
On Fri, Jul 01, 2022 at 09:48:40AM +0200, Drouvot, Bertrand wrote:
However, I'm personally not okay with having multiple hooks
as proposed in the v1 patch.I agree that it would be great to reduce the number of proposed hooks.
But,
Can we think of having a single hook
The proposed hooks are triggered during errors (means that the connection
attempt break) and:- In the connection paths that will not reach the ClientAuthentication_hook
at all: those are the ones related to the bad startup packet and timeout
while processing the startup packet.or
- After the ClientAuthentication_hook is fired: those are the bad db oid,
bad db name and bad perm ones.So, It does look like having only one hook would require refactoring in the
connection path and I'm not sure if this is worth it.or
enhancing the existing ClientAuthentication_hook where we pass a
PURPOSE parameter (CONN_SUCCESS, CONN_FAILURE, CONN_FOO, CONN_BAR
....) tp the hook?I think one could already "predict" the bad db and bad perm errors within
the current ClientAuthentication_hook.But in case of multiple "possible" errors (within the same connection
attempt) how could we know for sure the one that will be actually reported?
That's why i think the best way is to put new hooks as close as possible to
the place where the related errors are reported.What do you think?
Could we model this after fmgr_hook? The first argument in that hook
indicates where it is being called from. This doesn't alleviate the need
for several calls to the hook in the authentication logic, but extension
authors would only need to define one hook.
That being said, I don't see why this information couldn't be provided in a
system view. IMO it is generically useful.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
On Fri, Jul 1, 2022 at 5:00 PM Nathan Bossart <nathandbossart@gmail.com>
wrote:
That being said, I don't see why this information couldn't be provided in a
system view. IMO it is generically useful.
+1 for a system view with appropriate permissions, in addition to the
hooks.
That would make the information easily accessible to a number or monitoring
systems besides the admin.
Roberto
—
Crunchy Data — passion for open source PostgreSQL
Show quoted text
Hi,
On 7/2/22 1:00 AM, Nathan Bossart wrote:
Could we model this after fmgr_hook? The first argument in that hook
indicates where it is being called from. This doesn't alleviate the need
for several calls to the hook in the authentication logic, but extension
authors would only need to define one hook.
I like the idea and indeed fmgr.h looks a good place to model it.
Attached a new patch version doing so.
Thanks
--
Bertrand Drouvot
Amazon Web Services: https://aws.amazon.com
Attachments:
v2-0001-connection_hooks.patchtext/plain; charset=UTF-8; name=v2-0001-connection_hooks.patchDownload+42-1
Hi,
On 7/2/22 2:49 AM, Roberto Mello wrote:
On Fri, Jul 1, 2022 at 5:00 PM Nathan Bossart
<nathandbossart@gmail.com> wrote:That being said, I don't see why this information couldn't be
provided in a
system view. IMO it is generically useful.+1 for a system view with appropriate permissions, in addition to the
hooks.That would make the information easily accessible to a number or
monitoring systems besides the admin.
Agree about that.
I'll start another thread and propose a dedicated patch for the
"internal counters" and how to expose them.
Thanks for the feedback,
--
Bertrand Drouvot
Amazon Web Services:https://aws.amazon.com
On Mon, Jul 4, 2022 at 5:54 AM Drouvot, Bertrand <bdrouvot@amazon.com>
wrote:
Hi,
On 7/2/22 1:00 AM, Nathan Bossart wrote:
Could we model this after fmgr_hook? The first argument in that hook
indicates where it is being called from. This doesn't alleviate the need
for several calls to the hook in the authentication logic, but extension
authors would only need to define one hook.I like the idea and indeed fmgr.h looks a good place to model it.
Attached a new patch version doing so.
Thanks
--
Bertrand Drouvot
Amazon Web Services: https://aws.amazon.com
Hi,
+ FCET_SPT, /* startup packet timeout */
+ FCET_BSP, /* bad startup packet */
Looking at existing enum type, such as FmgrHookEventType, the part after
underscore is a word.
I think it would be good to follow existing practice and make the enums
more readable.
Cheers
On Mon, Jul 4, 2022 at 6:29 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
On 7/2/22 2:49 AM, Roberto Mello wrote:
On Fri, Jul 1, 2022 at 5:00 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
That being said, I don't see why this information couldn't be provided in a
system view. IMO it is generically useful.+1 for a system view with appropriate permissions, in addition to the hooks.
That would make the information easily accessible to a number or monitoring systems besides the admin.
Agree about that.
Are we going to have it as a part of shared memory stats? Or a
separate shared memory for connection stats exposing these via a
function and a view can be built on this function like
pg_get_replication_slots and pg_replication_slots?
I'll start another thread and propose a dedicated patch for the "internal counters" and how to expose them.
IMHO, let's have the discussion here in this thread and the patch can be 0002.
Regards,
Bharath Rupireddy.
Hi,
On 7/4/22 3:12 PM, Zhihong Yu wrote:
Hi, + FCET_SPT, /* startup packet timeout */ + FCET_BSP, /* bad startup packet */Looking at existing enum type, such as FmgrHookEventType, the part
after underscore is a word.
I think it would be good to follow existing practice and make the
enums more readable.
Fair point.
Attached a new version which makes the enums more readable.
Thanks for the feedback
Regards
--
Bertrand Drouvot
Amazon Web Services:https://aws.amazon.com
Attachments:
v2-0002-connection_hooks.patchtext/plain; charset=UTF-8; name=v2-0002-connection_hooks.patchDownload+42-1
On Mon, Jul 4, 2022 at 6:23 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
Hi,
On 7/2/22 1:00 AM, Nathan Bossart wrote:
Could we model this after fmgr_hook? The first argument in that hook
indicates where it is being called from. This doesn't alleviate the need
for several calls to the hook in the authentication logic, but extension
authors would only need to define one hook.I like the idea and indeed fmgr.h looks a good place to model it.
Attached a new patch version doing so.
Thanks for the patch. Can we think of enhancing
ClientAuthentication_hook_type itself i.e. make it a generic hook for
all sorts of authentication metrics, info etc. with the type parameter
embedded to it instead of new hook FailedConnection_hook?We can either
add a new parameter for the "event" (the existing
ClientAuthentication_hook_type implementers will have problems), or
embed/multiplex the "event" info to existing Port structure or status
variable (macro or enum) (existing implementers will not have
compatibility problems). IMO, this looks cleaner going forward.
On the v2 patch:
1. Why do we need to place the hook and structures in fmgr.h? Why not in auth.h?
2. Timeout Handler is a signal handler, called as part of SIGALRM
signal handler, most of the times, signal handlers ought to be doing
small things, now that we are handing off the control to hook, which
can do any long running work (writing to a remote storage, file,
aggregate etc.), I don't think it's the right thing to do here.
static void
StartupPacketTimeoutHandler(void)
{
+ if (FailedConnection_hook)
+ (*FailedConnection_hook) (FCET_STARTUP_PACKET_TIMEOUT, MyProcPort);
+ ereport(COMMERROR,
+ (errcode(ERRCODE_PROTOCOL_VIOLATION),
+ errmsg("timeout while processing startup packet")));
3. On "not letting these hooks (ClientAuthentication_hook_type or
FailedConnection_hook_type) expose sensitive info via Port structure"
- it seems like the Port structure has sensitive info like HbaLine,
host address, username, etc. but that's what it is so I think we are
okay with the structure as-is.
Regards,
Bharath Rupireddy.
On 6/30/22 5:23 AM, Bharath Rupireddy wrote:
<snip>
On the security aspect, we must ensure we don't leak any sensitive
information such as password or SSH key to the new hook - if PGPORT
has this information, maybe we need to mask that structure a bit
before handing it off to the hook.
Can you elaborate more on why you see this as necessary? Extensions run
in-process and have no real memory access limits, "masking", which
really means copying data to another struct, is just extra work and
overhead with no actual security gain, IMO.
On 7/5/22 03:37, Bharath Rupireddy wrote:
On Mon, Jul 4, 2022 at 6:23 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
On 7/2/22 1:00 AM, Nathan Bossart wrote:
Could we model this after fmgr_hook? The first argument in that hook
indicates where it is being called from. This doesn't alleviate the need
for several calls to the hook in the authentication logic, but extension
authors would only need to define one hook.I like the idea and indeed fmgr.h looks a good place to model it.
Attached a new patch version doing so.
I was thinking along the same lines, so +1 for the general approach
Thanks for the patch. Can we think of enhancing
ClientAuthentication_hook_type itself i.e. make it a generic hook for
all sorts of authentication metrics, info etc. with the type parameter
embedded to it instead of new hook FailedConnection_hook?We can either
add a new parameter for the "event" (the existing
ClientAuthentication_hook_type implementers will have problems), or
embed/multiplex the "event" info to existing Port structure or status
variable (macro or enum) (existing implementers will not have
compatibility problems). IMO, this looks cleaner going forward.
Not sure I like this though -- I'll have to think about that
On the v2 patch:
1. Why do we need to place the hook and structures in fmgr.h? Why not in auth.h?
agreed -- it does not belong in fmgr.h
2. Timeout Handler is a signal handler, called as part of SIGALRM signal handler, most of the times, signal handlers ought to be doing small things, now that we are handing off the control to hook, which can do any long running work (writing to a remote storage, file, aggregate etc.), I don't think it's the right thing to do here. static void StartupPacketTimeoutHandler(void) { + if (FailedConnection_hook) + (*FailedConnection_hook) (FCET_STARTUP_PACKET_TIMEOUT, MyProcPort); + ereport(COMMERROR, + (errcode(ERRCODE_PROTOCOL_VIOLATION), + errmsg("timeout while processing startup packet")));
Why add the ereport()?
But more to Bharath's point, perhaps this is a case that is better
served by incrementing a stat counter and not exposed as a hook?
Also, a teeny nit:
8<--------------
+ if (status != STATUS_OK) {
+ if (FailedConnection_hook)
8<--------------
does not follow usual practice and probably should be:
8<--------------
+ if (status != STATUS_OK)
+ {
+ if (FailedConnection_hook)
8<--------------
--
Joe Conway
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Hi,
On 7/6/22 12:11 AM, Joe Conway wrote:
On 7/5/22 03:37, Bharath Rupireddy wrote:
On Mon, Jul 4, 2022 at 6:23 PM Drouvot, Bertrand
<bdrouvot@amazon.com> wrote:On 7/2/22 1:00 AM, Nathan Bossart wrote:
Could we model this after fmgr_hook? The first argument in that hook
indicates where it is being called from. This doesn't alleviatethe need
for several calls to the hook in the authentication logic, but
extension
authors would only need to define one hook.
I like the idea and indeed fmgr.h looks a good place to model it.
Attached a new patch version doing so.
I was thinking along the same lines, so +1 for the general approach
Thanks for the review!
Thanks for the patch. Can we think of enhancing
ClientAuthentication_hook_type itself i.e. make it a generic hook for
all sorts of authentication metrics, info etc. with the type parameter
embedded to it instead of new hook FailedConnection_hook?We can either
add a new parameter for the "event" (the existing
ClientAuthentication_hook_type implementers will have problems), or
embed/multiplex the "event" info to existing Port structure or status
variable (macro or enum) (existing implementers will not have
compatibility problems). IMO, this looks cleaner going forward.Not sure I like this though -- I'll have to think about that
Not sure about this one neither.
The "enhanced" ClientAuthentication_hook will have to be fired at the
same places as the new FailedConnection_hook is, but i think those
places are not necessary linked to real authentication per say (making
the name confusing).
On the v2 patch:
1. Why do we need to place the hook and structures in fmgr.h? Why not
in auth.h?agreed -- it does not belong in fmgr.h
Moved to auth.h.
2. Timeout Handler is a signal handler, called as part of SIGALRM signal handler, most of the times, signal handlers ought to be doing small things, now that we are handing off the control to hook, which can do any long running work (writing to a remote storage, file, aggregate etc.), I don't think it's the right thing to do here. static void StartupPacketTimeoutHandler(void) { + if (FailedConnection_hook) + (*FailedConnection_hook) (FCET_STARTUP_PACKET_TIMEOUT, MyProcPort); + ereport(COMMERROR, + (errcode(ERRCODE_PROTOCOL_VIOLATION), + errmsg("timeout while processing startup packet")));Why add the ereport()?
removed it.
But more to Bharath's point, perhaps this is a case that is better
served by incrementing a stat counter and not exposed as a hook?
I think that the advantage of the hook is that it gives the extension
author the ability/flexibility to aggregate the counter based on
information available in the Port Struct (say the client addr for
example) at this stage.
What about to aggregate the stat counter based on the client addr? (Not
sure if there is more useful information (than the client addr) at this
stage though)
That said, i agree that having a hook in a time out handler might not be
the right thing to do (even if at the end that would be to the extension
author responsibility to do "small things" in it), so it has been
removed in the new attached version.
Also, a teeny nit: 8<-------------- + if (status != STATUS_OK) { + if (FailedConnection_hook) 8<--------------does not follow usual practice and probably should be:
8<-------------- + if (status != STATUS_OK) + { + if (FailedConnection_hook) 8<--------------
Thanks!, fixed.
--
Bertrand Drouvot
Amazon Web Services: https://aws.amazon.com
Attachments:
v2-0003-connection_hooks.patchtext/plain; charset=UTF-8; name=v2-0003-connection_hooks.patchDownload+35-0
Hi,
On 7/5/22 9:07 AM, Bharath Rupireddy wrote:
On Mon, Jul 4, 2022 at 6:29 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
On 7/2/22 2:49 AM, Roberto Mello wrote:
On Fri, Jul 1, 2022 at 5:00 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
That being said, I don't see why this information couldn't be provided in a
system view. IMO it is generically useful.+1 for a system view with appropriate permissions, in addition to the hooks.
That would make the information easily accessible to a number or monitoring systems besides the admin.
Agree about that.
Are we going to have it as a part of shared memory stats? Or a
separate shared memory for connection stats exposing these via a
function and a view can be built on this function like
pg_get_replication_slots and pg_replication_slots?
Thanks for looking at it.
I don't have any proposal yet, I'll have to look at it.
--
Bertrand Drouvot
Amazon Web Services: https://aws.amazon.com
On 7/6/22 04:13, Drouvot, Bertrand wrote:
On 7/6/22 12:11 AM, Joe Conway wrote:
On 7/5/22 03:37, Bharath Rupireddy wrote:
2. Timeout Handler is a signal handler, called as part of SIGALRM signal handler, most of the times, signal handlers ought to be doing small things, now that we are handing off the control to hook, which can do any long running work (writing to a remote storage, file, aggregate etc.), I don't think it's the right thing to do here. static void StartupPacketTimeoutHandler(void) { + if (FailedConnection_hook) + (*FailedConnection_hook) (FCET_STARTUP_PACKET_TIMEOUT, MyProcPort);
But more to Bharath's point, perhaps this is a case that is better
served by incrementing a stat counter and not exposed as a hook?I think that the advantage of the hook is that it gives the extension
author the ability/flexibility to aggregate the counter based on
information available in the Port Struct (say the client addr for
example) at this stage.What about to aggregate the stat counter based on the client addr? (Not
sure if there is more useful information (than the client addr) at this
stage though)That said, i agree that having a hook in a time out handler might not be
the right thing to do (even if at the end that would be to the extension
author responsibility to do "small things" in it), so it has been
removed in the new attached version.
It isn't clear to me if having a hook in the timeout handler is a
nonstarter -- perhaps a comment with suitable warning for prospective
extension authors is enough? Anyone else want to weigh in on this issue
specifically?
--
Joe Conway
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Joe Conway <mail@joeconway.com> writes:
It isn't clear to me if having a hook in the timeout handler is a
nonstarter -- perhaps a comment with suitable warning for prospective
extension authors is enough? Anyone else want to weigh in on this issue
specifically?
It doesn't seem like a great place for a hook, because the list of stuff
you could safely do there would be mighty short, possibly the empty set.
Write to shared memory? Not too safe. Write to a file? Even less.
Write to local memory? Pointless, because we're about to _exit(1).
Pretty much anything I can think of that you'd want to do is something
we've already decided the core code can't safely do, and putting it
in a hook won't make it safer.
If someone wants to argue for this hook, I'd like to see a credible
example of a *safe* use-case, keeping in mind the points raised in
the comments in BackendInitialize and process_startup_packet_die.
regards, tom lane
On Fri, Jul 8, 2022 at 1:40 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Joe Conway <mail@joeconway.com> writes:
It isn't clear to me if having a hook in the timeout handler is a
nonstarter -- perhaps a comment with suitable warning for prospective
extension authors is enough? Anyone else want to weigh in on this issue
specifically?It doesn't seem like a great place for a hook, because the list of stuff
you could safely do there would be mighty short, possibly the empty set.
Write to shared memory? Not too safe. Write to a file? Even less.
Write to local memory? Pointless, because we're about to _exit(1).
Pretty much anything I can think of that you'd want to do is something
we've already decided the core code can't safely do, and putting it
in a hook won't make it safer.
I agree with this. But, all of the areas that v2-0003 touched for
connectivity failures, they typically are emitting
ereport(FATAL,/ereport(COMMERROR, (in ProcessStartupPacket) and we
have emit_log_hook already being exposed and the implementers can,
literally, do anything the hook.
Looking at v2-0003 patch and emit_log_hook, how about we filter out
for those connectivity errors either based on error codes and if they
aren't unique, perhaps passing special flags to ereport API indicating
that it's a connectivity error and in the emit_log_hook we can look
for those connectivity error codes or flags to collect the stats about
the failure connections (with MyProcPort being present in
emit_log_hook)? This way, we don't need a new hook. Thoughts?
Regards,
Bharath Rupireddy.
Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes:
On Fri, Jul 8, 2022 at 1:40 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
It doesn't seem like a great place for a hook, because the list of stuff
you could safely do there would be mighty short, possibly the empty set.
I agree with this. But, all of the areas that v2-0003 touched for
connectivity failures, they typically are emitting
ereport(FATAL,/ereport(COMMERROR, (in ProcessStartupPacket) and we
have emit_log_hook already being exposed and the implementers can,
literally, do anything the hook.
This is utterly off-point, because those calls are not inside
signal handlers.
regards, tom lane
Hi,
On 7/7/22 10:10 PM, Tom Lane wrote:
Joe Conway <mail@joeconway.com> writes:
It isn't clear to me if having a hook in the timeout handler is a
nonstarter -- perhaps a comment with suitable warning for prospective
extension authors is enough? Anyone else want to weigh in on this issue
specifically?It doesn't seem like a great place for a hook, because the list of stuff
you could safely do there would be mighty short, possibly the empty set.
Write to shared memory? Not too safe. Write to a file? Even less.
Write to local memory? Pointless, because we're about to _exit(1).
Pretty much anything I can think of that you'd want to do is something
we've already decided the core code can't safely do, and putting it
in a hook won't make it safer.If someone wants to argue for this hook, I'd like to see a credible
example of a *safe* use-case, keeping in mind the points raised in
the comments in BackendInitialize and process_startup_packet_die.
The use case would be to increment a counter in shared memory (or most
probably within an hash table) to reflect the number of times a startup
packet timeout occurred.
Reading the comments in/related to BackendInitialize() I understand
that's definitely not safe to write in shared memory for the
EXEC_BACKEND case, but wouldn't it be safe for the non EXEC_BACKEND case?
BTW, it makes me realize that the hook being fired in the bad startup
packet case:
/*
* Stop here if it was bad or a cancel packet. ProcessStartupPacket
* already did any appropriate error reporting.
*/
if (status != STATUS_OK)
+ {
+ if (FailedConnection_hook)
+ (*FailedConnection_hook)
(FCET_BAD_STARTUP_PACKET, port);
proc_exit(0);
+ }
is not safe for the EXEC_BACKEND case.
Regards,
--
Bertrand Drouvot
Amazon Web Services: https://aws.amazon.com