System username in pg_stat_activity

Started by Magnus Haganderover 2 years ago37 messageshackers

magnus@hagander.net

over 2 years ago

The attached patch adds a column "authuser" to pg_stat_activity which
contains the username of the externally authenticated user, being the
same value as the SYSTEM_USER keyword returns in a backend.

This overlaps with for example the values in pg_stat_gss, but it will
include values for authentication methods that don't have their own
view such as peer/ident. gss/ssl info will of course still be shown,
it is just in more than one place.

I was originally thinking this column should be "sysuser" to map to
the keyword, but since we already have "usesysid" as a column name in
pg_stat_activity I figured that could be confusing since it actually
means something completely different. But happy to change that back if
people think that's better.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

Aleksander Alekseev

aleksander@timescale.com

over 2 years ago

In reply to: Magnus Hagander (#1)

Re: System username in pg_stat_activity

Hi,

Thanks for the patch.

The attached patch adds a column "authuser" to pg_stat_activity which
contains the username of the externally authenticated user, being the
same value as the SYSTEM_USER keyword returns in a backend.

I believe what was meant is "authname", not "authuser".

This overlaps with for example the values in pg_stat_gss, but it will
include values for authentication methods that don't have their own
view such as peer/ident. gss/ssl info will of course still be shown,
it is just in more than one place.

I was originally thinking this column should be "sysuser" to map to
the keyword, but since we already have "usesysid" as a column name in
pg_stat_activity I figured that could be confusing since it actually
means something completely different. But happy to change that back if
people think that's better.

This part of the documentation is wrong:

```
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>authname</structfield> <type>name</type>
+      </para>
```

Actually the type is `text`:

It hurts my sense of beauty that usename and authname are of different
types. But if I'm the only one, maybe we can close our eyes on this.
Also I suspect that placing usename and authname in a close proximity
can be somewhat confusing. Perhaps adding authname as the last column
of the view will solve both nitpicks?

```
+    /* Information about the authenticated user */
+    char        st_authuser[NAMEDATALEN];
```

Well, here it's called "authuser" and it looks like the intention was
to use `name` datatype... I suggest using "authname" everywhere for
consistency.

Since the patch affects pg_proc.dat I believe the commit message
should remind bumping the catalog version.

--
Best regards,
Aleksander Alekseev

Magnus Hagander

magnus@hagander.net

over 2 years ago

In reply to: Aleksander Alekseev (#2)

Re: System username in pg_stat_activity

On Wed, Jan 10, 2024 at 1:44 PM Aleksander Alekseev
<aleksander@timescale.com> wrote:

Hi,

Thanks for the patch.

The attached patch adds a column "authuser" to pg_stat_activity which
contains the username of the externally authenticated user, being the
same value as the SYSTEM_USER keyword returns in a backend.

I believe what was meant is "authname", not "authuser".

This overlaps with for example the values in pg_stat_gss, but it will
include values for authentication methods that don't have their own
view such as peer/ident. gss/ssl info will of course still be shown,
it is just in more than one place.

I was originally thinking this column should be "sysuser" to map to
the keyword, but since we already have "usesysid" as a column name in
pg_stat_activity I figured that could be confusing since it actually
means something completely different. But happy to change that back if
people think that's better.

This part of the documentation is wrong:
```
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>authname</structfield> <type>name</type>
+      </para>
```
Actually the type is `text`:

```
=# \d pg_stat_activity ;
View "pg_catalog.pg_stat_activity"
Column | Type | Collation | Nullable | Default
------------------+--------------------------+-----------+----------+---------
datid | oid | | |
datname | name | | |
pid | integer | | |
leader_pid | integer | | |
usesysid | oid | | |
usename | name | | |
authname | text | | |
```

It hurts my sense of beauty that usename and authname are of different
types. But if I'm the only one, maybe we can close our eyes on this.
Also I suspect that placing usename and authname in a close proximity
can be somewhat confusing. Perhaps adding authname as the last column
of the view will solve both nitpicks?

But it should probably actually be name, given that's the underlying
datatype. I kept changing it around and ended up half way in
between...

```
+    /* Information about the authenticated user */
+    char        st_authuser[NAMEDATALEN];
```
Well, here it's called "authuser" and it looks like the intention was
to use `name` datatype... I suggest using "authname" everywhere for
consistency.

Yeah, I flipped back and forth a few times and clearly got stuck in
the middle. They should absolutely be the same everywhere - whatever
name is used it should be consistent.

Since the patch affects pg_proc.dat I believe the commit message
should remind bumping the catalog version.

Yes.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

Dagfinn Ilmari Mannsåker

ilmari@ilmari.org

over 2 years ago

In reply to: Magnus Hagander (#3)

Re: System username in pg_stat_activity

Magnus Hagander <magnus@hagander.net> writes:

On Wed, Jan 10, 2024 at 1:44 PM Aleksander Alekseev
<aleksander@timescale.com> wrote:

It hurts my sense of beauty that usename and authname are of different
types. But if I'm the only one, maybe we can close our eyes on this.
Also I suspect that placing usename and authname in a close proximity
can be somewhat confusing. Perhaps adding authname as the last column
of the view will solve both nitpicks?

But it should probably actually be name, given that's the underlying
datatype. I kept changing it around and ended up half way in
between...

https://www.postgresql.org/docs/current/functions-info.html#FUNCTIONS-INFO-SESSION-TABLE
(and pg_typeof(system_user)) says it's text. Which makes sense, since
it can easily be longer than 63 bytes, e.g. in the case of a TLS client
certificate DN.

- ilmari

Magnus Hagander

magnus@hagander.net

over 2 years ago

In reply to: Dagfinn Ilmari Mannsåker (#4)

Re: System username in pg_stat_activity

On Wed, Jan 10, 2024 at 2:27 PM Dagfinn Ilmari Mannsåker
<ilmari@ilmari.org> wrote:

Magnus Hagander <magnus@hagander.net> writes:

On Wed, Jan 10, 2024 at 1:44 PM Aleksander Alekseev
<aleksander@timescale.com> wrote:

It hurts my sense of beauty that usename and authname are of different
types. But if I'm the only one, maybe we can close our eyes on this.
Also I suspect that placing usename and authname in a close proximity
can be somewhat confusing. Perhaps adding authname as the last column
of the view will solve both nitpicks?

But it should probably actually be name, given that's the underlying
datatype. I kept changing it around and ended up half way in
between...

https://www.postgresql.org/docs/current/functions-info.html#FUNCTIONS-INFO-SESSION-TABLE
(and pg_typeof(system_user)) says it's text. Which makes sense, since
it can easily be longer than 63 bytes, e.g. in the case of a TLS client
certificate DN.

We already truncate all those to NAMEDATALEN in pg_stat_ssl for
example. so I think the truncation part of those should be OK. We'll
truncate "a little bit more" since we also have the 'cert:', but not
significantly so I think.

but yeah, conceptually it should probably be text because name is
supposedly a *postgres identifier*, which this is not.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

over 2 years ago

In reply to: Magnus Hagander (#3)

Re: System username in pg_stat_activity

Hi,

On Wed, Jan 10, 2024 at 02:08:03PM +0100, Magnus Hagander wrote:

On Wed, Jan 10, 2024 at 1:44 PM Aleksander Alekseev
<aleksander@timescale.com> wrote:

Hi,

Thanks for the patch.

This overlaps with for example the values in pg_stat_gss, but it will
include values for authentication methods that don't have their own
view such as peer/ident. gss/ssl info will of course still be shown,
it is just in more than one place.

Yeah, I think that's a good idea.

It hurts my sense of beauty that usename and authname are of different
types. But if I'm the only one, maybe we can close our eyes on this.
Also I suspect that placing usename and authname in a close proximity
can be somewhat confusing. Perhaps adding authname as the last column
of the view will solve both nitpicks?

But it should probably actually be name, given that's the underlying
datatype. I kept changing it around and ended up half way in
between...
```
+    /* Information about the authenticated user */
+    char        st_authuser[NAMEDATALEN];
```
Well, here it's called "authuser" and it looks like the intention was
to use `name` datatype... I suggest using "authname" everywhere for
consistency.

I think it depends what we want the new field to reflect. If it is the exact
same thing as the SYSTEM_USER then I think it has to be text (as the SYSTEM_USER
is made of "auth_method:identity"). Now if we want it to be "only" the identity
part of it, then the `name` datatype would be fine. I'd vote for the exact same
thing as the SYSTEM_USER (means auth_method:identity).

+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>authname</structfield> <type>name</type>
+      </para>
+      <para>
+       The authentication method and identity (if any) that the user
+       used to log in. It contains the same value as
+       <xref linkend="system-user" /> returns in the backend.
+      </para></entry>
+     </row>

I'm fine with auth_method:identity.

+ S.authname,

What about using system_user as the field name? (because if we keep
auth_method:identity it's not really the authname anyway).

Also, what about adding a test in say 003_peer.pl to check that the value matches
the SYSTEM_USER one?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Magnus Hagander

magnus@hagander.net

over 2 years ago

In reply to: Bertrand Drouvot (#6)

Re: System username in pg_stat_activity

On Wed, Jan 10, 2024 at 2:56 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

Hi,

On Wed, Jan 10, 2024 at 02:08:03PM +0100, Magnus Hagander wrote:

On Wed, Jan 10, 2024 at 1:44 PM Aleksander Alekseev
<aleksander@timescale.com> wrote:

Hi,

Thanks for the patch.

+1

This overlaps with for example the values in pg_stat_gss, but it will
include values for authentication methods that don't have their own
view such as peer/ident. gss/ssl info will of course still be shown,
it is just in more than one place.

Yeah, I think that's a good idea.
It hurts my sense of beauty that usename and authname are of different
types. But if I'm the only one, maybe we can close our eyes on this.
Also I suspect that placing usename and authname in a close proximity
can be somewhat confusing. Perhaps adding authname as the last column
of the view will solve both nitpicks?

But it should probably actually be name, given that's the underlying
datatype. I kept changing it around and ended up half way in
between...
```
+    /* Information about the authenticated user */
+    char        st_authuser[NAMEDATALEN];
```
Well, here it's called "authuser" and it looks like the intention was
to use `name` datatype... I suggest using "authname" everywhere for
consistency.
I think it depends what we want the new field to reflect. If it is the exact
same thing as the SYSTEM_USER then I think it has to be text (as the SYSTEM_USER
is made of "auth_method:identity"). Now if we want it to be "only" the identity
part of it, then the `name` datatype would be fine. I'd vote for the exact same
thing as the SYSTEM_USER (means auth_method:identity).

I definitely think it should be the same. If it's not exactly the
same, then it should be *two* columns, one with auth method and one
with the name.

And thinking more about it maybe that's cleaner, because that makes it
easier to do things like filter based on auth method?

+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>authname</structfield> <type>name</type>
+      </para>
+      <para>
+       The authentication method and identity (if any) that the user
+       used to log in. It contains the same value as
+       <xref linkend="system-user" /> returns in the backend.
+      </para></entry>
+     </row>
I'm fine with auth_method:identity.

+ S.authname,

What about using system_user as the field name? (because if we keep
auth_method:identity it's not really the authname anyway).

I was worried system_user or sysuser would both be confusing with the
fact that we have usesysid -- which would reference a *different*
sys...

Also, what about adding a test in say 003_peer.pl to check that the value matches
the SYSTEM_USER one?

Yeah, that's a good idea I think. I quickly looked over for tests and
couldn't really find any for pg_stat_activity, btu we can definitely
piggyback them in one or more of the auth tests.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

over 2 years ago

In reply to: Magnus Hagander (#7)

Re: System username in pg_stat_activity

Hi,

On Wed, Jan 10, 2024 at 02:59:42PM +0100, Magnus Hagander wrote:

On Wed, Jan 10, 2024 at 2:56 PM Bertrand Drouvot
I definitely think it should be the same. If it's not exactly the
same, then it should be *two* columns, one with auth method and one
with the name.

And thinking more about it maybe that's cleaner, because that makes it
easier to do things like filter based on auth method?

Yeah, that's sounds even better.

+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>authname</structfield> <type>name</type>
+      </para>
+      <para>
+       The authentication method and identity (if any) that the user
+       used to log in. It contains the same value as
+       <xref linkend="system-user" /> returns in the backend.
+      </para></entry>
+     </row>
I'm fine with auth_method:identity.

+ S.authname,

What about using system_user as the field name? (because if we keep
auth_method:identity it's not really the authname anyway).
I was worried system_user or sysuser would both be confusing with the
fact that we have usesysid -- which would reference a *different*
sys...

If we go the 2 fields way, then what about auth_identity and auth_method then?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Joe Conway

mail@joeconway.com

over 2 years ago

In reply to: Magnus Hagander (#7)

Re: System username in pg_stat_activity

On 1/10/24 08:59, Magnus Hagander wrote:

On Wed, Jan 10, 2024 at 2:56 PM Bertrand Drouvot

I think it depends what we want the new field to reflect. If it is the exact
same thing as the SYSTEM_USER then I think it has to be text (as the SYSTEM_USER
is made of "auth_method:identity"). Now if we want it to be "only" the identity
part of it, then the `name` datatype would be fine. I'd vote for the exact same
thing as the SYSTEM_USER (means auth_method:identity).

I definitely think it should be the same. If it's not exactly the
same, then it should be *two* columns, one with auth method and one
with the name.

And thinking more about it maybe that's cleaner, because that makes it
easier to do things like filter based on auth method?

+1 for the overall feature and +1 for two columns

+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>authname</structfield> <type>name</type>
+      </para>
+      <para>
+       The authentication method and identity (if any) that the user
+       used to log in. It contains the same value as
+       <xref linkend="system-user" /> returns in the backend.
+      </para></entry>
+     </row>
I'm fine with auth_method:identity.

+ S.authname,

What about using system_user as the field name? (because if we keep
auth_method:identity it's not really the authname anyway).
I was worried system_user or sysuser would both be confusing with the
fact that we have usesysid -- which would reference a *different*
sys...

I think if it is exactly "system_user" it would be pretty clearly a
match for SYSTEM_USER

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#10

Magnus Hagander

magnus@hagander.net

over 2 years ago

In reply to: Bertrand Drouvot (#8)

Re: System username in pg_stat_activity

On Wed, Jan 10, 2024 at 3:12 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

Hi,

On Wed, Jan 10, 2024 at 02:59:42PM +0100, Magnus Hagander wrote:

On Wed, Jan 10, 2024 at 2:56 PM Bertrand Drouvot
I definitely think it should be the same. If it's not exactly the
same, then it should be *two* columns, one with auth method and one
with the name.

And thinking more about it maybe that's cleaner, because that makes it
easier to do things like filter based on auth method?

Yeah, that's sounds even better.
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>authname</structfield> <type>name</type>
+      </para>
+      <para>
+       The authentication method and identity (if any) that the user
+       used to log in. It contains the same value as
+       <xref linkend="system-user" /> returns in the backend.
+      </para></entry>
+     </row>
I'm fine with auth_method:identity.

+ S.authname,

What about using system_user as the field name? (because if we keep
auth_method:identity it's not really the authname anyway).
I was worried system_user or sysuser would both be confusing with the
fact that we have usesysid -- which would reference a *different*
sys...
If we go the 2 fields way, then what about auth_identity and auth_method then?

Here is an updated patch based on this idea.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#11

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

over 2 years ago

In reply to: Magnus Hagander (#10)

Re: System username in pg_stat_activity

Hi,

On Thu, Jan 11, 2024 at 02:24:58PM +0100, Magnus Hagander wrote:

On Wed, Jan 10, 2024 at 3:12 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

If we go the 2 fields way, then what about auth_identity and auth_method then?

Here is an updated patch based on this idea.

Thanks!

+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>auth_method</structfield> <type>text</type>
+      </para>
+      <para>
+       The authentication method used for authenticating the connection, or
+       NULL for background processes.
+      </para></entry>

I'm wondering if it would make sense to populate it for parallel workers too.
I think it's doable thanks to d951052, but I'm not sure it's worth it (one could
join based on the leader_pid though). OTOH that would be consistent with
how the SYSTEM_USER behaves with parallel workers (it's populated).

+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>auth_identity</structfield> <type>text</type>
+      </para>
+      <para>
+       The identity (if any) that the user presented during the authentication
+       cycle before they were assigned a database role.  Contains the same
+       value as <xref linkend="system-user" />

Same remark regarding the parallel workers case +:

- Would it be better to use the `name` datatype for auth_identity?
- what about "Contains the same value as the identity part in <xref linkend="system-user" />"?

+                       /*
+                        * Trust doesn't set_authn_id(), but we still need to store the
+                        * auth_method
+                        */
+                       MyClientConnectionInfo.auth_method = uaTrust;

+1, I think it is useful here to provide "trust" and not a NULL value in the
context of this patch.

+# pg_stat_activity shold contain trust and empty string for trust auth

typo: s/shold/should/

+# Users with md5 auth should show both auth method and name in pg_stat_activity

what about "show both auth method and identity"?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#12

Magnus Hagander

magnus@hagander.net

over 2 years ago

In reply to: Bertrand Drouvot (#11)

Re: System username in pg_stat_activity

On Thu, Jan 11, 2024 at 5:55 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

Hi,

On Thu, Jan 11, 2024 at 02:24:58PM +0100, Magnus Hagander wrote:

On Wed, Jan 10, 2024 at 3:12 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

If we go the 2 fields way, then what about auth_identity and auth_method then?

Here is an updated patch based on this idea.

Thanks!
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>auth_method</structfield> <type>text</type>
+      </para>
+      <para>
+       The authentication method used for authenticating the connection, or
+       NULL for background processes.
+      </para></entry>
I'm wondering if it would make sense to populate it for parallel workers too.
I think it's doable thanks to d951052, but I'm not sure it's worth it (one could
join based on the leader_pid though). OTOH that would be consistent with
how the SYSTEM_USER behaves with parallel workers (it's populated).

I guess one could conceptually argue that "authentication happens int
he leader". But we do populate it with the other user records, and
it'd be weird if this one was excluded.

The tricky thing is that pgstat_bestart() is called long before we
deserialize the data. But from what I can tell it should be safe to
change it per the attached? That should be AFAICT an extremely short
window of time longer before we report it, not enough to matter.

+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>auth_identity</structfield> <type>text</type>
+      </para>
+      <para>
+       The identity (if any) that the user presented during the authentication
+       cycle before they were assigned a database role.  Contains the same
+       value as <xref linkend="system-user" />

Same remark regarding the parallel workers case +:

- Would it be better to use the `name` datatype for auth_identity?

I've been going back and forth. And I think my conclusion is that it's
not a postgres identifier, so it shouldn't be. See the earlier
discussion, and for example that that's what we do for cert names when
SSL is used.

- what about "Contains the same value as the identity part in <xref linkend="system-user" />"?
+                       /*
+                        * Trust doesn't set_authn_id(), but we still need to store the
+                        * auth_method
+                        */
+                       MyClientConnectionInfo.auth_method = uaTrust;
+1, I think it is useful here to provide "trust" and not a NULL value in the
context of this patch.

Yeah, that's probably "independently correct", but actually useful here.

+# pg_stat_activity shold contain trust and empty string for trust auth

typo: s/shold/should/

Ops.

+# Users with md5 auth should show both auth method and name in pg_stat_activity

what about "show both auth method and identity"?

Good spot, yeah, I changed it over to identity everywhere else so it
should be here as well.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#13

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

over 2 years ago

In reply to: Magnus Hagander (#12)

Re: System username in pg_stat_activity

Hi,

On Fri, Jan 12, 2024 at 05:16:53PM +0100, Magnus Hagander wrote:

On Thu, Jan 11, 2024 at 5:55 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

I'm wondering if it would make sense to populate it for parallel workers too.
I think it's doable thanks to d951052, but I'm not sure it's worth it (one could
join based on the leader_pid though). OTOH that would be consistent with
how the SYSTEM_USER behaves with parallel workers (it's populated).

I guess one could conceptually argue that "authentication happens int
he leader". But we do populate it with the other user records, and
it'd be weird if this one was excluded.

The tricky thing is that pgstat_bestart() is called long before we
deserialize the data. But from what I can tell it should be safe to
change it per the attached? That should be AFAICT an extremely short
window of time longer before we report it, not enough to matter.

Thanks! Yeah, that seems reasonable to me. Also, I think we should remove the
"MyProcPort" test here then (looking at v3):

+       if (MyProcPort && MyClientConnectionInfo.authn_id)
+               strlcpy(lbeentry.st_auth_identity, MyClientConnectionInfo.authn_id, NAMEDATALEN);
+       else
+               MemSet(&lbeentry.st_auth_identity, 0, sizeof(lbeentry.st_auth_identity));

to get the st_auth_identity propagated to the parallel workers.

Same remark regarding the parallel workers case +:

- Would it be better to use the `name` datatype for auth_identity?

I've been going back and forth. And I think my conclusion is that it's
not a postgres identifier, so it shouldn't be. See the earlier
discussion, and for example that that's what we do for cert names when
SSL is used.

Yeah, Okay let's keep text then.

- what about "Contains the same value as the identity part in <xref linkend="system-user" />"?

Not sure, but looks like you missed this comment?

+                       /*
+                        * Trust doesn't set_authn_id(), but we still need to store the
+                        * auth_method
+                        */
+                       MyClientConnectionInfo.auth_method = uaTrust;
+1, I think it is useful here to provide "trust" and not a NULL value in the
context of this patch.
Yeah, that's probably "independently correct", but actually useful here.

+# Users with md5 auth should show both auth method and name in pg_stat_activity

what about "show both auth method and identity"?

Good spot, yeah, I changed it over to identity everywhere else so it
should be here as well.

Did you forget to share the new revision (aka v4)? I can only see the
"reorder_parallel_worker_bestart.patch" attached.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#14

Magnus Hagander

magnus@hagander.net

over 2 years ago

In reply to: Bertrand Drouvot (#13)

Re: System username in pg_stat_activity

On Mon, Jan 15, 2024 at 11:17 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

Hi,

On Fri, Jan 12, 2024 at 05:16:53PM +0100, Magnus Hagander wrote:

On Thu, Jan 11, 2024 at 5:55 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

I'm wondering if it would make sense to populate it for parallel workers too.
I think it's doable thanks to d951052, but I'm not sure it's worth it (one could
join based on the leader_pid though). OTOH that would be consistent with
how the SYSTEM_USER behaves with parallel workers (it's populated).

I guess one could conceptually argue that "authentication happens int
he leader". But we do populate it with the other user records, and
it'd be weird if this one was excluded.

The tricky thing is that pgstat_bestart() is called long before we
deserialize the data. But from what I can tell it should be safe to
change it per the attached? That should be AFAICT an extremely short
window of time longer before we report it, not enough to matter.

Thanks! Yeah, that seems reasonable to me. Also, I think we should remove the
"MyProcPort" test here then (looking at v3):
+       if (MyProcPort && MyClientConnectionInfo.authn_id)
+               strlcpy(lbeentry.st_auth_identity, MyClientConnectionInfo.authn_id, NAMEDATALEN);
+       else
+               MemSet(&lbeentry.st_auth_identity, 0, sizeof(lbeentry.st_auth_identity));
to get the st_auth_identity propagated to the parallel workers.

Yup, I had done that in v4 which as you noted further down, I forgot to post.

- what about "Contains the same value as the identity part in <xref linkend="system-user" />"?

Not sure, but looks like you missed this comment?

I did. Agree with your comment, and updated now.

+# Users with md5 auth should show both auth method and name in pg_stat_activity

what about "show both auth method and identity"?

Good spot, yeah, I changed it over to identity everywhere else so it
should be here as well.

Did you forget to share the new revision (aka v4)? I can only see the
"reorder_parallel_worker_bestart.patch" attached.

I did. Here it is, and also including that suggested docs fix as well
as a rebase on current master.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#15

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

over 2 years ago

In reply to: Magnus Hagander (#14)

Re: System username in pg_stat_activity

Hi,

On Thu, Jan 18, 2024 at 04:01:33PM +0100, Magnus Hagander wrote:

On Mon, Jan 15, 2024 at 11:17 AM Bertrand Drouvot

Did you forget to share the new revision (aka v4)? I can only see the
"reorder_parallel_worker_bestart.patch" attached.

I did. Here it is, and also including that suggested docs fix as well
as a rebase on current master.

Thanks!

Just a few comments:

1 ===

+       The authentication method used for authenticating the connection, or
+       NULL for background processes.

What about? "NULL for background processes (except for parallel workers which
inherit this information from their leader process)"

2 ===

+       cycle before they were assigned a database role.  Contains the same
+       value as the identity part in <xref linkend="system-user" />, or NULL
+       for background processes.

Same comment about parallel workers.

3 ===

+# pg_stat_activity should contain trust and empty string for trust auth
+$res = $node->safe_psql(
+       'postgres',
+       "SELECT auth_method, auth_identity='' FROM pg_stat_activity WHERE pid=pg_backend_pid()",
+       connstr => "user=scram_role");
+is($res, 'trust|t', 'Users with trust authentication should show correctly in pg_stat_activity');
+
+# pg_stat_activity should contain NULL for auth of background processes
+# (test is a bit out of place here, but this is where the other pg_stat_activity.auth* tests are)
+$res = $node->safe_psql(
+       'postgres',
+       "SELECT auth_method IS NULL, auth_identity IS NULL FROM pg_stat_activity WHERE backend_type='checkpointer'",
+);
+is($res, 't|t', 'Background processes should show NULL for auth in pg_stat_activity');

What do you think about testing the parallel workers cases too? (I'm not 100%
sure it's worth the extra complexity though).

Apart from those 3, it looks good to me.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#16

Aleksander Alekseev

aleksander@timescale.com

over 2 years ago

In reply to: Magnus Hagander (#14)

Re: System username in pg_stat_activity

Hi,

Did you forget to share the new revision (aka v4)? I can only see the
"reorder_parallel_worker_bestart.patch" attached.

I did. Here it is, and also including that suggested docs fix as well
as a rebase on current master.

```
+    lbeentry.st_auth_method = MyClientConnectionInfo.auth_method;
+    if (MyClientConnectionInfo.authn_id)
+        strlcpy(lbeentry.st_auth_identity,
MyClientConnectionInfo.authn_id, NAMEDATALEN);
+    else
+        MemSet(&lbeentry.st_auth_identity, 0,
sizeof(lbeentry.st_auth_identity));
```

I believe using sizeof(lbeentry.st_auth_identity) instead of
NAMEDATALEN is generally considered a better practice.

Calling MemSet for a CString seems to be an overkill. I suggest
setting lbeentry.st_auth_identity[0] to zero.

Except for these nitpicks v4 LGTM.

--
Best regards,
Aleksander Alekseev

#17

Julien Rouhaud

rjuju123@gmail.com

over 2 years ago

In reply to: Magnus Hagander (#14)

Re: System username in pg_stat_activity

Hi,

On Thu, Jan 18, 2024 at 11:01 PM Magnus Hagander <magnus@hagander.net> wrote:

I did. Here it is, and also including that suggested docs fix as well
as a rebase on current master.

+    if (MyClientConnectionInfo.authn_id)
+        strlcpy(lbeentry.st_auth_identity,
MyClientConnectionInfo.authn_id, NAMEDATALEN);
+    else
+        MemSet(&lbeentry.st_auth_identity, 0,
sizeof(lbeentry.st_auth_identity));

Should we use pg_mbcliplen() here? I don't think there's any strong
guarantee that no multibyte character can be used. I also agree with
the nearby comment about MemSet being overkill.

+ value as the identity part in <xref linkend="system-user" />, or NULL
I was looking at
https://www.postgresql.org/docs/current/auth-username-maps.html and
noticed that this page is switching between system-user and
system-username. Should we clean that up while at it?

#18

Magnus Hagander

magnus@hagander.net

about 2 years ago

In reply to: Bertrand Drouvot (#15)

Re: System username in pg_stat_activity

On Fri, Jan 19, 2024 at 7:20 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

Hi,

On Thu, Jan 18, 2024 at 04:01:33PM +0100, Magnus Hagander wrote:

On Mon, Jan 15, 2024 at 11:17 AM Bertrand Drouvot

Did you forget to share the new revision (aka v4)? I can only see the
"reorder_parallel_worker_bestart.patch" attached.

I did. Here it is, and also including that suggested docs fix as well
as a rebase on current master.

Thanks!

Just a few comments:

1 ===
+       The authentication method used for authenticating the connection, or
+       NULL for background processes.
What about? "NULL for background processes (except for parallel workers which
inherit this information from their leader process)"

Ugh. That doesn't read very well at all to me. Maybe just "NULL for
background processes without a user"?

2 ===

+       cycle before they were assigned a database role.  Contains the same
+       value as the identity part in <xref linkend="system-user" />, or NULL
+       for background processes.

Same comment about parallel workers.

3 ===

+# pg_stat_activity should contain trust and empty string for trust auth
+$res = $node->safe_psql(
+       'postgres',
+       "SELECT auth_method, auth_identity='' FROM pg_stat_activity WHERE pid=pg_backend_pid()",
+       connstr => "user=scram_role");
+is($res, 'trust|t', 'Users with trust authentication should show correctly in pg_stat_activity');
+
+# pg_stat_activity should contain NULL for auth of background processes
+# (test is a bit out of place here, but this is where the other pg_stat_activity.auth* tests are)
+$res = $node->safe_psql(
+       'postgres',
+       "SELECT auth_method IS NULL, auth_identity IS NULL FROM pg_stat_activity WHERE backend_type='checkpointer'",
+);
+is($res, 't|t', 'Background processes should show NULL for auth in pg_stat_activity');

What do you think about testing the parallel workers cases too? (I'm not 100%
sure it's worth the extra complexity though).

I'm leaning towards not worth that.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#19

Magnus Hagander

magnus@hagander.net

about 2 years ago

In reply to: Aleksander Alekseev (#16)

Re: System username in pg_stat_activity

On Fri, Jan 19, 2024 at 12:33 PM Aleksander Alekseev
<aleksander@timescale.com> wrote:

Hi,

Did you forget to share the new revision (aka v4)? I can only see the
"reorder_parallel_worker_bestart.patch" attached.

I did. Here it is, and also including that suggested docs fix as well
as a rebase on current master.
```
+    lbeentry.st_auth_method = MyClientConnectionInfo.auth_method;
+    if (MyClientConnectionInfo.authn_id)
+        strlcpy(lbeentry.st_auth_identity,
MyClientConnectionInfo.authn_id, NAMEDATALEN);
+    else
+        MemSet(&lbeentry.st_auth_identity, 0,
sizeof(lbeentry.st_auth_identity));
```
I believe using sizeof(lbeentry.st_auth_identity) instead of
NAMEDATALEN is generally considered a better practice.

We use the NAMEDATALEN method in the rest of the function, so I did it
the same way for consistency. I think if we want to change that, we
should change the whole function at once to keep it consistent.

Calling MemSet for a CString seems to be an overkill. I suggest
setting lbeentry.st_auth_identity[0] to zero.

Fair enough. Will make that change.

//Magnus

#20

Magnus Hagander

magnus@hagander.net

about 2 years ago

In reply to: Julien Rouhaud (#17)

Re: System username in pg_stat_activity

On Fri, Jan 19, 2024 at 1:43 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

Hi,

On Thu, Jan 18, 2024 at 11:01 PM Magnus Hagander <magnus@hagander.net> wrote:

I did. Here it is, and also including that suggested docs fix as well
as a rebase on current master.
+    if (MyClientConnectionInfo.authn_id)
+        strlcpy(lbeentry.st_auth_identity,
MyClientConnectionInfo.authn_id, NAMEDATALEN);
+    else
+        MemSet(&lbeentry.st_auth_identity, 0,
sizeof(lbeentry.st_auth_identity));
Should we use pg_mbcliplen() here? I don't think there's any strong
guarantee that no multibyte character can be used. I also agree with
the nearby comment about MemSet being overkill.

Hm. Good question. I don't think there is such a guarantee, no. So
something like attached v5?

Also, wouldn't that problem already exist a few lines down for the SSL parts?

+ value as the identity part in <xref linkend="system-user" />, or NULL
I was looking at
https://www.postgresql.org/docs/current/auth-username-maps.html and
noticed that this page is switching between system-user and
system-username. Should we clean that up while at it?

Seems like something we should clean up yes, but not as part of this patch.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/