[PATCH] Expose port->authn_id to extensions and triggers
Hi all,
Stephen pointed out [1]/messages/by-id/CAOuzzgpFpuroNRabEvB9kST_TSyS2jFicBNoXvW7G2pZFixyBw@mail.gmail.com that the authenticated identity that's stored
in MyProcPort can't be retrieved by extensions or triggers. Attached is
a patch that provides both a C API and a SQL function for retrieving
it.
GetAuthenticatedIdentityString() is a mouthful but I wanted to
differentiate it from the existing GetAuthenticatedUserId(); better
names welcome. It only exists as an accessor because I wasn't sure if
extensions outside of contrib were encouraged to rely on the internal
layout of struct Port. (If they can, then that call can go away
entirely.)
Thanks,
--Jacob
[1]: /messages/by-id/CAOuzzgpFpuroNRabEvB9kST_TSyS2jFicBNoXvW7G2pZFixyBw@mail.gmail.com
Attachments:
0001-Add-APIs-to-retrieve-authn_id-from-C-and-SQL.patchtext/x-patch; name=0001-Add-APIs-to-retrieve-authn_id-from-C-and-SQL.patchDownload+43-3
On Thu, Feb 24, 2022 at 12:15:40AM +0000, Jacob Champion wrote:
Stephen pointed out [1] that the authenticated identity that's stored
in MyProcPort can't be retrieved by extensions or triggers. Attached is
a patch that provides both a C API and a SQL function for retrieving
it.GetAuthenticatedIdentityString() is a mouthful but I wanted to
differentiate it from the existing GetAuthenticatedUserId(); better
names welcome. It only exists as an accessor because I wasn't sure if
extensions outside of contrib were encouraged to rely on the internal
layout of struct Port. (If they can, then that call can go away
entirely.)
+char *
+GetAuthenticatedIdentityString(void)
+{
+ if (!MyProcPort || !MyProcPort->authn_id)
+ return NULL;
+
+ return pstrdup(MyProcPort->authn_id);
I don't quite see the additional value that this API brings as
MyProcPort is directly accessible, and contrib modules like
postgres_fdw and sslinfo just use that to find the data of the current
backend. Cannot a module like pgaudit, through the elog hook, do its
work with what we have already in place?
What's the use case for a given session to be able to report back
only its own authn through SQL?
I could still see a use case for that at a more global level with
beentrys, but it looked like there was not much interest the last time
I dropped this idea.
--
Michael
On Thu, 2022-02-24 at 20:39 +0900, Michael Paquier wrote:
I don't quite see the additional value that this API brings as
MyProcPort is directly accessible, and contrib modules like
postgres_fdw and sslinfo just use that to find the data of the current
backend.
Right -- I just didn't know if third-party modules were actually able
to rely on the internal layout of struct Port. Is that guaranteed to
remain constant for a major release line? If so, this new API is
superfluous.
Cannot a module like pgaudit, through the elog hook, do its
work with what we have already in place?
Given the above, I would hope so. Stephen mentioned that pgaudit only
had access to the logged-in role, and when I proposed a miscadmin.h API
he said that would help. CC'ing him to see what he meant; I don't know
if pgaudit has additional constraints.
What's the use case for a given session to be able to report back
only its own authn through SQL?
That's for triggers to be able to grab the current ID for
logging/auditing. Is there a better way to expose this for that use
case?
I could still see a use case for that at a more global level with
beentrys, but it looked like there was not much interest the last time
I dropped this idea.
I agree that this would be useful to have in the stats. From my outside
perspective, it seems like it's difficult to get strings of arbitrary
length in there; is that accurate?
Thanks,
--Jacob
Hi,
On Thu, Feb 24, 2022 at 04:50:59PM +0000, Jacob Champion wrote:
On Thu, 2022-02-24 at 20:39 +0900, Michael Paquier wrote:
I don't quite see the additional value that this API brings as
MyProcPort is directly accessible, and contrib modules like
postgres_fdw and sslinfo just use that to find the data of the current
backend.Right -- I just didn't know if third-party modules were actually able
to rely on the internal layout of struct Port. Is that guaranteed to
remain constant for a major release line? If so, this new API is
superfluous.
Yes, third-party can rely on Port layout. We don't break ABI between minor
release. In some occasions we can add additional fields at the end of a
struct, but nothing more.
I could still see a use case for that at a more global level with
beentrys, but it looked like there was not much interest the last time
I dropped this idea.I agree that this would be useful to have in the stats. From my outside
perspective, it seems like it's difficult to get strings of arbitrary
length in there; is that accurate?
Yes, as it's all in shared memory. The only workaround is using something like
track_activity_query_size, but it's not great.
On Fri, 2022-02-25 at 01:15 +0800, Julien Rouhaud wrote:
On Thu, Feb 24, 2022 at 04:50:59PM +0000, Jacob Champion wrote:
On Thu, 2022-02-24 at 20:39 +0900, Michael Paquier wrote:
I don't quite see the additional value that this API brings as
MyProcPort is directly accessible, and contrib modules like
postgres_fdw and sslinfo just use that to find the data of the current
backend.Right -- I just didn't know if third-party modules were actually able
to rely on the internal layout of struct Port. Is that guaranteed to
remain constant for a major release line? If so, this new API is
superfluous.Yes, third-party can rely on Port layout. We don't break ABI between minor
release. In some occasions we can add additional fields at the end of a
struct, but nothing more.
That simplifies things. PFA a smaller v2; if pgaudit can't make use of
libpq-be.h for some reason, then I guess we can tailor a fix to that
use case.
I could still see a use case for that at a more global level with
beentrys, but it looked like there was not much interest the last time
I dropped this idea.I agree that this would be useful to have in the stats. From my outside
perspective, it seems like it's difficult to get strings of arbitrary
length in there; is that accurate?Yes, as it's all in shared memory. The only workaround is using something like
track_activity_query_size, but it's not great.
Yeah... I was following a similar track with the initial work last
year, but I dropped it when the cost of implementation started to grow
considerably. At the time, though, it looked like some overhauls to the
stats framework were being pursued? I should read up on that thread.
--Jacob
Attachments:
v2-0001-Add-API-to-retrieve-authn_id-from-SQL.patchtext/x-patch; name=v2-0001-Add-API-to-retrieve-authn_id-from-SQL.patchDownload+26-3
On Thu, Feb 24, 2022 at 08:44:08PM +0000, Jacob Champion wrote:
Yeah... I was following a similar track with the initial work last
year, but I dropped it when the cost of implementation started to grow
considerably. At the time, though, it looked like some overhauls to the
stats framework were being pursued? I should read up on that thread.
Do you mean the shared memory stats patchset? IIUC this is unrelated, as the
beentry stuff Michael was talking about is a different infrastructure
(backend_status.[ch]), and I don't think there are any plans to move that to
dynamic shared memory.
Hi,
On 2022-02-25 13:01:26 +0800, Julien Rouhaud wrote:
On Thu, Feb 24, 2022 at 08:44:08PM +0000, Jacob Champion wrote:
Yeah... I was following a similar track with the initial work last
year, but I dropped it when the cost of implementation started to grow
considerably. At the time, though, it looked like some overhauls to the
stats framework were being pursued? I should read up on that thread.Do you mean the shared memory stats patchset? IIUC this is unrelated, as the
beentry stuff Michael was talking about is a different infrastructure
(backend_status.[ch]), and I don't think there are any plans to move that to
dynamic shared memory.
Until a year ago the backend_status.c stuff was in in pgstat.c - I just split
them out because of the shared memory status work - so it'd not be surprising
for them to mentally be thrown in one bucket.
Basically the type of stats we're trying to move to dynamic shared memory is
about counters that should persist for a while and are accumulated across
connections etc. Whereas backend_status.c is more about tracking the current
state (what query is a backend running, what user, etc).
They're not unrelated though: backend_status.c feeds pgstat.c with some
information occasionally.
Greetings,
Andres Freund
Hi,
On Thu, Feb 24, 2022 at 09:18:26PM -0800, Andres Freund wrote:
On 2022-02-25 13:01:26 +0800, Julien Rouhaud wrote:
On Thu, Feb 24, 2022 at 08:44:08PM +0000, Jacob Champion wrote:
Yeah... I was following a similar track with the initial work last
year, but I dropped it when the cost of implementation started to grow
considerably. At the time, though, it looked like some overhauls to the
stats framework were being pursued? I should read up on that thread.Do you mean the shared memory stats patchset? IIUC this is unrelated, as the
beentry stuff Michael was talking about is a different infrastructure
(backend_status.[ch]), and I don't think there are any plans to move that to
dynamic shared memory.Until a year ago the backend_status.c stuff was in in pgstat.c - I just split
them out because of the shared memory status work - so it'd not be surprising
for them to mentally be thrown in one bucket.
Right.
Basically the type of stats we're trying to move to dynamic shared memory is
about counters that should persist for a while and are accumulated across
connections etc. Whereas backend_status.c is more about tracking the current
state (what query is a backend running, what user, etc).
But would it be acceptable to use dynamic shared memory in backend_status and
e.g. have a dsa_pointer rather than a fixed length array? That seems like a
bad idea for query text in general, but for authn_id for instance it seems less
likely to hold gigantic strings, and also have more or less consistent size
when provided.
On Thu, 2022-02-24 at 20:44 +0000, Jacob Champion wrote:
That simplifies things. PFA a smaller v2; if pgaudit can't make use of
libpq-be.h for some reason, then I guess we can tailor a fix to that
use case.
Ha, opr_sanity caught my use of cstring. I'll work on a fix later
today.
--Jacob
On Fri, 2022-02-25 at 16:28 +0000, Jacob Champion wrote:
Ha, opr_sanity caught my use of cstring. I'll work on a fix later
today.
Fixed in v3.
--Jacob
Attachments:
v3-0001-Add-API-to-retrieve-authn_id-from-SQL.patchtext/x-patch; name=v3-0001-Add-API-to-retrieve-authn_id-from-SQL.patchDownload+26-3
On 2022-02-25 20:19:24 +0000, Jacob Champion wrote:
From 2fde60a6bc3739f1894c8c264120e4fa0f04df64 Mon Sep 17 00:00:00 2001
From: Jacob Champion <pchampion@vmware.com>
Date: Mon, 14 Feb 2022 08:10:53 -0800
Subject: [PATCH v3] Add API to retrieve authn_id from SQL
The authn_id field in MyProcPort is currently only accessible to the
backend itself. Add a SQL function, session_authn_id(), to expose the
field to triggers that may want to make use of it.
Looks to me like authn_id isn't synchronized to parallel workers right now. So
the function will return the wrong thing when executed as part of a parallel
query.
I don't think we should add further functions not prefixed with pg_.
Perhaps a few tests for less trivial authn_ids could be worthwhile?
E.g. certificate DNs.
Greetings,
Andres Freund
Hi,
On 2022-02-25 14:25:52 +0800, Julien Rouhaud wrote:
Basically the type of stats we're trying to move to dynamic shared memory is
about counters that should persist for a while and are accumulated across
connections etc. Whereas backend_status.c is more about tracking the current
state (what query is a backend running, what user, etc).But would it be acceptable to use dynamic shared memory in backend_status and
e.g. have a dsa_pointer rather than a fixed length array?
Might be OK, but it does add a fair bit of complexity. Suddenly there's a
bunch more initialization order dependencies that you don't have right
now. I'd not go there for just this.
Greetings,
Andres Freund
On Fri, Feb 25, 2022 at 01:23:49PM -0800, Andres Freund wrote:
Looks to me like authn_id isn't synchronized to parallel workers right now. So
the function will return the wrong thing when executed as part of a parallel
query.
FWIW, I am not completely sure what's the use case for being able to
see the authn of the current session through a trigger. We expose
that when log_connections is enabled, for audit purposes. I can also
get behind something more central so as one can get a full picture of
the authn used by a bunch of session, particularly with complex HBA
policies, but this looks rather limited to me in usability. Perhaps
that's not enough to stand as an objection, though, and the patch is
dead simple.
I don't think we should add further functions not prefixed with pg_.
Yep.
Perhaps a few tests for less trivial authn_ids could be worthwhile?
E.g. certificate DNs.
Yes, src/test/ssl would handle that just fine. Now, this stuff
already looks after authn results with log_connections=on, so that
feels like a duplicate.
--
Michael
On Fri, Feb 25, 2022 at 01:23:49PM -0800, Andres Freund wrote:
Looks to me like authn_id isn't synchronized to parallel workers right now. So
the function will return the wrong thing when executed as part of a parallel
query.
Thanks for the catch. It looks like MyProcPort is left empty, and other
functions that rely on like inet_server_addr() are marked parallel-
restricted, so I've done the same in v4.
On Sat, 2022-02-26 at 14:39 +0900, Michael Paquier wrote:
FWIW, I am not completely sure what's the use case for being able to
see the authn of the current session through a trigger. We expose
that when log_connections is enabled, for audit purposes. I can also
get behind something more central so as one can get a full picture of
the authn used by a bunch of session, particularly with complex HBA
policies, but this looks rather limited to me in usability. Perhaps
that's not enough to stand as an objection, though, and the patch is
dead simple.
I'm primarily motivated by the linked thread -- if the gap between
builtin roles and authn_id are going to be used as ammo against other
security features, then let's close that gap. But I think it's fair to
say that if someone is already using triggers to exhaustively audit a
table, it'd be nice to have this info in the same place too.
I don't think we should add further functions not prefixed with pg_.
Yep.
Fixed.
Perhaps a few tests for less trivial authn_ids could be worthwhile?
E.g. certificate DNs.Yes, src/test/ssl would handle that just fine. Now, this stuff
already looks after authn results with log_connections=on, so that
feels like a duplicate.
It was easy enough to add, so I added it. I suppose it does protect
against any reimplementations of pg_session_authn_id() that can't
handle longer ID strings, though I admit that's a stretch.
Thanks,
--Jacob
Greetings,
* Jacob Champion (pchampion@vmware.com) wrote:
On Fri, Feb 25, 2022 at 01:23:49PM -0800, Andres Freund wrote:
Looks to me like authn_id isn't synchronized to parallel workers right now. So
the function will return the wrong thing when executed as part of a parallel
query.Thanks for the catch. It looks like MyProcPort is left empty, and other
functions that rely on like inet_server_addr() are marked parallel-
restricted, so I've done the same in v4.
That's probably alright.
On Sat, 2022-02-26 at 14:39 +0900, Michael Paquier wrote:
FWIW, I am not completely sure what's the use case for being able to
see the authn of the current session through a trigger. We expose
that when log_connections is enabled, for audit purposes. I can also
get behind something more central so as one can get a full picture of
the authn used by a bunch of session, particularly with complex HBA
policies, but this looks rather limited to me in usability. Perhaps
that's not enough to stand as an objection, though, and the patch is
dead simple.I'm primarily motivated by the linked thread -- if the gap between
builtin roles and authn_id are going to be used as ammo against other
security features, then let's close that gap. But I think it's fair to
say that if someone is already using triggers to exhaustively audit a
table, it'd be nice to have this info in the same place too.
Yeah, we really should make this available to trigger-based auditing
systems too and not just through log_connections which involves a great
deal more log parsing and work external to the database to figure out
who did what.
I don't think we should add further functions not prefixed with pg_.
Yep.
Fixed.
That's fine.
Perhaps a few tests for less trivial authn_ids could be worthwhile?
E.g. certificate DNs.Yes, src/test/ssl would handle that just fine. Now, this stuff
already looks after authn results with log_connections=on, so that
feels like a duplicate.It was easy enough to add, so I added it. I suppose it does protect
against any reimplementations of pg_session_authn_id() that can't
handle longer ID strings, though I admit that's a stretch.Thanks,
--Jacob
commit efec9f040843d1de2fc52f5ce0d020478a5bc75d
Author: Jacob Champion <pchampion@vmware.com>
Date: Mon Feb 28 10:28:51 2022 -0800squash! Add API to retrieve authn_id from SQL
Bleh. :) Squash indeed.
Subject: [PATCH v4] Add API to retrieve authn_id from SQL
The authn_id field in MyProcPort is currently only accessible to the
backend itself. Add a SQL function, pg_session_authn_id(), to expose
the field to triggers that may want to make use of it.
Only did a quick look but generally looks reasonable to me.
Thanks,
Stephen
On Mon, 2022-02-28 at 16:00 -0500, Stephen Frost wrote:
commit efec9f040843d1de2fc52f5ce0d020478a5bc75d
Author: Jacob Champion <pchampion@vmware.com>
Date: Mon Feb 28 10:28:51 2022 -0800squash! Add API to retrieve authn_id from SQL
Bleh. :) Squash indeed.
Ha, I wasn't sure if anyone read the since-diffs :) I'll start
wordsmithing them more in the future.
Subject: [PATCH v4] Add API to retrieve authn_id from SQL
The authn_id field in MyProcPort is currently only accessible to the
backend itself. Add a SQL function, pg_session_authn_id(), to expose
the field to triggers that may want to make use of it.Only did a quick look but generally looks reasonable to me.
Thanks!
--Jacob
On Mon, Feb 28, 2022 at 04:00:36PM -0500, Stephen Frost wrote:
* Jacob Champion (pchampion@vmware.com) wrote:
On Fri, Feb 25, 2022 at 01:23:49PM -0800, Andres Freund wrote:
Looks to me like authn_id isn't synchronized to parallel workers right now. So
the function will return the wrong thing when executed as part of a parallel
query.Thanks for the catch. It looks like MyProcPort is left empty, and other
functions that rely on like inet_server_addr() are marked parallel-
restricted, so I've done the same in v4.That's probably alright.
I'd say as well that this is right as-is. If it happens that there is
a use-case for making this parallel aware in the future, it could be
done. Now, it may be a bit weird to make parallel workers inherit the
authn ID of the parent as these did not go through authentication, no?
Letting this function being run only by the leader feels intuitive.
Yeah, we really should make this available to trigger-based auditing
systems too and not just through log_connections which involves a great
deal more log parsing and work external to the database to figure out
who did what.
Okay, I won't fight hard on that if all of you think that this is
useful for a given session.
Subject: [PATCH v4] Add API to retrieve authn_id from SQL
The authn_id field in MyProcPort is currently only accessible to the
backend itself. Add a SQL function, pg_session_authn_id(), to expose
the field to triggers that may want to make use of it.Only did a quick look but generally looks reasonable to me.
The function and the test are fine, pgperltidy complains a bit about
the format of the tests.
Ayway, this function needs to be documented. I think that you should
just add that in "Session Information Functions" in func.sgml, same
area as current_user(). The last time we talked about the authn ID,
one thing we discussed about was how to describe that in a good way to
the user, which is why the section of log_connections was reworked a
bit. And we don't have yet any references to what an authenticated
identity is in the docs.
There is no need to update catversion.h in the patch, committers
usually take care of that and that's an area of the code that
conflicts a lot.
--
Michael
Greetings,
* Michael Paquier (michael@paquier.xyz) wrote:
On Mon, Feb 28, 2022 at 04:00:36PM -0500, Stephen Frost wrote:
* Jacob Champion (pchampion@vmware.com) wrote:
On Fri, Feb 25, 2022 at 01:23:49PM -0800, Andres Freund wrote:
Looks to me like authn_id isn't synchronized to parallel workers right now. So
the function will return the wrong thing when executed as part of a parallel
query.Thanks for the catch. It looks like MyProcPort is left empty, and other
functions that rely on like inet_server_addr() are marked parallel-
restricted, so I've done the same in v4.That's probably alright.
I'd say as well that this is right as-is. If it happens that there is
a use-case for making this parallel aware in the future, it could be
done. Now, it may be a bit weird to make parallel workers inherit the
authn ID of the parent as these did not go through authentication, no?
Letting this function being run only by the leader feels intuitive.
I'm not really sure why we're arguing about this, but clearly the authn
ID of the leader process is what should be used because that's the
authentication under which the parallel worker is running, just as much
as the effective role is the authorization. Having this be available in
worker processes would certainly be good as it would allow more query
plans to be considered when these functions are used. At this time, I
don't think that outweighs the complications around having it and I'm
not suggesting that Jacob needs to go do that, but surely it would be
better.
Subject: [PATCH v4] Add API to retrieve authn_id from SQL
The authn_id field in MyProcPort is currently only accessible to the
backend itself. Add a SQL function, pg_session_authn_id(), to expose
the field to triggers that may want to make use of it.Only did a quick look but generally looks reasonable to me.
The function and the test are fine, pgperltidy complains a bit about
the format of the tests.Ayway, this function needs to be documented. I think that you should
just add that in "Session Information Functions" in func.sgml, same
area as current_user(). The last time we talked about the authn ID,
one thing we discussed about was how to describe that in a good way to
the user, which is why the section of log_connections was reworked a
bit. And we don't have yet any references to what an authenticated
identity is in the docs.
Agreed that it should be documented and that location seems reasonable
to me.
There is no need to update catversion.h in the patch, committers
usually take care of that and that's an area of the code that
conflicts a lot.
Yeah, best to let committers handle catversion bumps.
Thanks,
Stephen
On 25.02.22 21:19, Jacob Champion wrote:
On Fri, 2022-02-25 at 16:28 +0000, Jacob Champion wrote:
Ha, opr_sanity caught my use of cstring. I'll work on a fix later
today.Fixed in v3.
This patch contains no documentation. I'm having a hard time
understanding what the name "session_authn_id" is supposed to convey.
The comment for the Port.authn_id field says this is the "system
username", which sounds like a clearer terminology.
On Tue, 2022-03-01 at 08:35 -0500, Stephen Frost wrote:
* Michael Paquier (michael@paquier.xyz) wrote:
Ayway, this function needs to be documented. I think that you should
just add that in "Session Information Functions" in func.sgml, same
area as current_user(). The last time we talked about the authn ID,
one thing we discussed about was how to describe that in a good way to
the user, which is why the section of log_connections was reworked a
bit. And we don't have yet any references to what an authenticated
identity is in the docs.Agreed that it should be documented and that location seems reasonable
to me.
Added a first draft in v5, alongside the perltidy fixups mentioned by
Michael.
There is no need to update catversion.h in the patch, committers
usually take care of that and that's an area of the code that
conflicts a lot.Yeah, best to let committers handle catversion bumps.
Heh, that was added for my benefit -- I was tired of forgetting to
initdb after switching dev branches -- but I've dropped it from the
patch and will just carry that diff locally.
Thanks,
--Jacob