libpq: Fix lots of discrepancies in PQtrace

Started by Jelte Fennema-Nioover 1 year ago14 messages
#1Jelte Fennema-Nio
Jelte Fennema-Nio
postgres@jeltef.nl
8 attachment(s)

After initially trying to add trace support for
StartupMessage/SSLRequest/GSSENCRequest[1]/messages/by-id/CAGECzQTTN5aGqtDaRifJXPyd_O5qHWQcOxsHJsDSVNqMugGNEA@mail.gmail.com I realized there were many
more cases where PQtrace was not correctly tracing messages, or not
even tracing them at all. These patches fix all of the issues that I
was able to find.

0001 is some cleanup after f4b54e1ed9
0002 does some preparatory changes for 0004 & 0007

All the others improve the tracing, and apart from 0004 and 0007
depending on 0002, none of these patches depend on each other.
Although you could argue that 0007 and 0008 depend on 0006, because
without 0006 the code added by 0007 and 0008 won't ever really be
executed.

To test you can add a PQreset(conn) call to the start of the
test_cancel function in:
src/test/modules/libpq_pipeline/libpq_pipeline.c.

And then run:
ninja -C build all install-quiet &&
build/src/test/modules/libpq_pipeline/libpq_pipeline cancel
'port=5432' -t test.trace

And then look at the top of test.trace

[1]: /messages/by-id/CAGECzQTTN5aGqtDaRifJXPyd_O5qHWQcOxsHJsDSVNqMugGNEA@mail.gmail.com

Attachments:

v1-0002-libpq-Add-suppress-argument-to-pqTraceOutputNchar.patchapplication/x-patch; name=v1-0002-libpq-Add-suppress-argument-to-pqTraceOutputNchar.patch
v1-0001-libpq-Use-PqMsg-macros-in-fe-auth.c.patchapplication/x-patch; name=v1-0001-libpq-Use-PqMsg-macros-in-fe-auth.c.patch
v1-0003-libpq-Trace-StartupMessage-SSLRequest-GSSENCReque.patchapplication/x-patch; name=v1-0003-libpq-Trace-StartupMessage-SSLRequest-GSSENCReque.patch
v1-0004-libpq-Trace-frontend-authentication-challenges.patchapplication/x-patch; name=v1-0004-libpq-Trace-frontend-authentication-challenges.patch
v1-0005-libpq-Trace-responses-to-SSLRequest-and-GSSENCReq.patchapplication/x-patch; name=v1-0005-libpq-Trace-responses-to-SSLRequest-and-GSSENCReq.patch
v1-0006-libpq-Trace-all-messages-received-from-the-server.patchapplication/x-patch; name=v1-0006-libpq-Trace-all-messages-received-from-the-server.patch
v1-0007-libpq-Trace-server-Authentication-messages-in-det.patchapplication/x-patch; name=v1-0007-libpq-Trace-server-Authentication-messages-in-det.patch
v1-0008-libpq-Trace-NegotiateProtocolVersion-correctly.patchapplication/x-patch; name=v1-0008-libpq-Trace-NegotiateProtocolVersion-correctly.patch
#2Nathan Bossart
Nathan Bossart
nathandbossart@gmail.com
In reply to: Jelte Fennema-Nio (#1)
Re: libpq: Fix lots of discrepancies in PQtrace

On Fri, Jun 21, 2024 at 11:22:05AM +0200, Jelte Fennema-Nio wrote:

0001 is some cleanup after f4b54e1ed9

Oops. I'll plan on committing this after the 17beta2 release freeze is
lifted.

--
nathan

#3Nathan Bossart
Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#2)
Re: libpq: Fix lots of discrepancies in PQtrace

On Fri, Jun 21, 2024 at 04:01:55PM -0500, Nathan Bossart wrote:

On Fri, Jun 21, 2024 at 11:22:05AM +0200, Jelte Fennema-Nio wrote:

0001 is some cleanup after f4b54e1ed9

Oops. I'll plan on committing this after the 17beta2 release freeze is
lifted.

Committed 0001.

--
nathan

#4Alvaro Herrera
Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Nathan Bossart (#3)
Re: libpq: Fix lots of discrepancies in PQtrace

On 2024-Jun-26, Nathan Bossart wrote:

On Fri, Jun 21, 2024 at 04:01:55PM -0500, Nathan Bossart wrote:

On Fri, Jun 21, 2024 at 11:22:05AM +0200, Jelte Fennema-Nio wrote:

0001 is some cleanup after f4b54e1ed9

Oops. I'll plan on committing this after the 17beta2 release freeze is
lifted.

Committed 0001.

Thanks, Nathan. I'm holding myself responsible for the rest ... will
handle soon after the branch.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"The problem with the future is that it keeps turning into the present"
(Hobbes)

#5Jelte Fennema-Nio
Jelte Fennema-Nio
postgres@jeltef.nl
In reply to: Alvaro Herrera (#4)
Re: libpq: Fix lots of discrepancies in PQtrace

On Wed, 26 Jun 2024 at 18:36, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Thanks, Nathan. I'm holding myself responsible for the rest ... will
handle soon after the branch.

Sounds great. Out of curiosity, what is the backpatching policy for
something like this? Honestly most of these patches could be
considered bugfixes in PQtrace, so backpatching might make sense. OTOH
I don't think PQtrace is used very much in practice, so backpatching
might carry more risk than it's worth.

#6Michael Paquier
Michael Paquier
michael@paquier.xyz
In reply to: Jelte Fennema-Nio (#5)
Re: libpq: Fix lots of discrepancies in PQtrace

On Wed, Jun 26, 2024 at 10:02:08PM +0200, Jelte Fennema-Nio wrote:

On Wed, 26 Jun 2024 at 18:36, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Thanks, Nathan. I'm holding myself responsible for the rest ... will
handle soon after the branch.

Sounds great. Out of curiosity, what is the backpatching policy for
something like this? Honestly most of these patches could be
considered bugfixes in PQtrace, so backpatching might make sense. OTOH
I don't think PQtrace is used very much in practice, so backpatching
might carry more risk than it's worth.

0001 getting on HEAD after the feature freeze as a cleanup piece
cleanup is no big deal. That's cosmetic, still OK.

Looking at the whole, the rest of the patch set qualifies as a new
feature, even if they're aimed at closing existing gaps.
Particularly, you have bits of new infrastructure introduced in libpq
like the current_auth_response business in 0004, making it a new
feature by structure.

+	conn->current_auth_response = AUTH_RESP_PASSWORD;
 	ret = pqPacketSend(conn, PqMsg_PasswordMessage, pwd_to_send, strlen(pwd_to_send) + 1);
+	conn->current_auth_response = AUTH_RESP_NONE;

It's a surprising approach. Future callers of pqPacketSend() and
pqPutMsgEnd() would easily miss that this flag should be set, as much
as reset. Isn't that something that should be added in input of these
functions?

+	AuthResponseType current_auth_response;
I'd recommend to document what this flag is here for, with a comment.
--
Michael
#7Jelte Fennema-Nio
Jelte Fennema-Nio
postgres@jeltef.nl
In reply to: Michael Paquier (#6)
7 attachment(s)
Re: libpq: Fix lots of discrepancies in PQtrace

On Thu, 27 Jun 2024 at 07:39, Michael Paquier <michael@paquier.xyz> wrote:

Looking at the whole, the rest of the patch set qualifies as a new
feature, even if they're aimed at closing existing gaps.

Alright, seems reasonable. To be clear, I don't care at all about this
being backported personally.

Particularly, you have bits of new infrastructure introduced in libpq
like the current_auth_response business in 0004, making it a new
feature by structure.

+       conn->current_auth_response = AUTH_RESP_PASSWORD;
ret = pqPacketSend(conn, PqMsg_PasswordMessage, pwd_to_send, strlen(pwd_to_send) + 1);
+       conn->current_auth_response = AUTH_RESP_NONE;

It's a surprising approach. Future callers of pqPacketSend() and
pqPutMsgEnd() would easily miss that this flag should be set, as much
as reset. Isn't that something that should be added in input of these
functions?

Yeah, I'm not entirely happy about it either. But adding an argument
to pqPutMsgEnd and pqPutPacketSend would mean all the existing call
sites would need to change, even though only 4 of them would care
about the new argument. You could argue that it's the better solution,
but it would at least greatly increase the size of the diff. Of course
to reduce the diff size you could make the old functions a wrapper
around a new one with the extra argument, but I couldn't think of a
good name for those functions. Overall I went for the chosen approach
here, because it only impacted code at the call sites for these auth
packets (which are the only v3 packets in the protocol that you cannot
interpret based on their contents alone).

I think your worry about easily missing to set/clear the flag is not a
huge problem in practice. We almost never add new authentication
messages and it's only needed there. Also the clearing is not even
strictly necessary for the tracing to behave correctly, but it seemed
like the right thing to do.

+ AuthResponseType current_auth_response;
I'd recommend to document what this flag is here for, with a comment.

Oops, yeah I forgot about that. Done now.

Attachments:

v2-0002-libpq-Add-suppress-argument-to-pqTraceOutputNchar.patchapplication/octet-stream; name=v2-0002-libpq-Add-suppress-argument-to-pqTraceOutputNchar.patch
v2-0003-libpq-Trace-StartupMessage-SSLRequest-GSSENCReque.patchapplication/octet-stream; name=v2-0003-libpq-Trace-StartupMessage-SSLRequest-GSSENCReque.patch
v2-0004-libpq-Trace-frontend-authentication-challenges.patchapplication/octet-stream; name=v2-0004-libpq-Trace-frontend-authentication-challenges.patch
v2-0005-libpq-Trace-responses-to-SSLRequest-and-GSSENCReq.patchapplication/octet-stream; name=v2-0005-libpq-Trace-responses-to-SSLRequest-and-GSSENCReq.patch
v2-0006-libpq-Trace-all-messages-received-from-the-server.patchapplication/octet-stream; name=v2-0006-libpq-Trace-all-messages-received-from-the-server.patch
v2-0007-libpq-Trace-server-Authentication-messages-in-det.patchapplication/octet-stream; name=v2-0007-libpq-Trace-server-Authentication-messages-in-det.patch
v2-0008-libpq-Trace-NegotiateProtocolVersion-correctly.patchapplication/octet-stream; name=v2-0008-libpq-Trace-NegotiateProtocolVersion-correctly.patch
#8Alvaro Herrera
Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Jelte Fennema-Nio (#7)
Re: libpq: Fix lots of discrepancies in PQtrace

Pushed 0002 and 0003. On the latter: I decided against using int32 to
print the request identifiers; by splitting into two int16's, we see
that the numbers match the values in the PG_PROTOCOL() declarations:

2024-08-09 17:37:38.364622 F 8 SSLRequest 1234 5679
and
2024-08-09 17:37:38.422109 F 16 CancelRequest 1234 5678 NNNN NNNN

(I didn't verify GSSEncRequest directly.)

I also verified that in non-regress mode, the values printed by
CancelRequest match those in the BackendKeyData message,
2024-08-09 17:34:27.544686 B 12 BackendKeyData NNNN NNNN

I also added suppression in regress mode for the backend key in the
CancelRequest message, since they would be different each time.

There are no tests for this code. We could add a trace file for the
connection packet in libpq_pipeline by changing PQconnectdb() to
PQconnectStart() and then do PQtrace before polling until the connection
is ready; we would have to have it match for the TAP test. Not sure
this is worth the effort. But doing this in a very crude way allowed me
to verify that, at least on my machine, this code is doing what's
expected.

Thank you,

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

#9Alvaro Herrera
Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Alvaro Herrera (#8)
1 attachment(s)
Re: libpq: Fix lots of discrepancies in PQtrace

Regarding 0004:

I don't want to add 4 bytes to struct pg_conn for tracing support. I'm
tempted to make the new struct member a plain 'char' to reduce overhead
for a feature that almost nobody is going to use. According to pahole
we have a 3 bytes hole in that position of the struct, so if we make it
a 1- or 2-byte member, there's no storage overhead whatsoever.

Also, why not have pqTraceOutputMessage() responsible for resetting the
byte after printing the message? It seems to cause less undesirable
detritus.

I propose something like the attached, but it's as yet untested. What
do you think?

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"El sentido de las cosas no viene de las cosas, sino de
las inteligencias que las aplican a sus problemas diarios
en busca del progreso." (Ernesto Hernández-Novich)

Attachments:

0001-libpq-Trace-frontend-authentication-challenges.patchtext/x-diff; charset=utf-8
#10Jelte Fennema-Nio
Jelte Fennema-Nio
postgres@jeltef.nl
In reply to: Alvaro Herrera (#9)
Re: libpq: Fix lots of discrepancies in PQtrace

On Sat, 10 Aug 2024 at 01:08, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

I don't want to add 4 bytes to struct pg_conn for tracing support. I'm
tempted to make the new struct member a plain 'char' to reduce overhead
for a feature that almost nobody is going to use. According to pahole
we have a 3 bytes hole in that position of the struct, so if we make it
a 1- or 2-byte member, there's no storage overhead whatsoever.

Sounds fine to me.

Also, why not have pqTraceOutputMessage() responsible for resetting the
byte after printing the message? It seems to cause less undesirable
detritus.

Yeah, that's indeed much nicer.

I propose something like the attached, but it's as yet untested. What
do you think?

Looks good, but I haven't tested it yet either.

#11Alvaro Herrera
Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Jelte Fennema-Nio (#10)
Re: libpq: Fix lots of discrepancies in PQtrace

On 2024-Aug-10, Jelte Fennema-Nio wrote:

On Sat, 10 Aug 2024 at 01:08, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

I propose something like the attached, but it's as yet untested. What
do you think?

Looks good, but I haven't tested it yet either.

I tested the SASL exchange and it looks OK. Didn't test the other ones.

Thanks!

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/

#12Alvaro Herrera
Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Jelte Fennema-Nio (#7)
4 attachment(s)
Re: libpq: Fix lots of discrepancies in PQtrace

I gave another look to the remaining patches; here they are again. I
propose some changes:

- to 0005 I change your pqTraceOutputEncryptionRequestResponse()
function name to pqTraceOutputCharResponse and instead of attaching
the "Response" literal in the outpuer to the name given in the
function call, just pass the whole string as argument to the function.

- to 0006 I change function name pqFinishParsingMessage() to
pqParseDone() and reworded the commentary; also moved it to fe-misc.c.
Looks good otherwise.

- 0008 to fix NegotiateProtocolVersion looks correct per [1]https://www.postgresql.org/docs/16/protocol-message-formats.html#PROTOCOL-MESSAGE-FORMATS-NEGOTIATEPROTOCOLVERSION, but I
don't know how to test it. Suggestions?

I didn't look at 0007.

[1]: https://www.postgresql.org/docs/16/protocol-message-formats.html#PROTOCOL-MESSAGE-FORMATS-NEGOTIATEPROTOCOLVERSION

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"No hay hombre que no aspire a la plenitud, es decir,
la suma de experiencias de que un hombre es capaz"

Attachments:

v3-0005-libpq-Trace-responses-to-SSLRequest-and-GSSENCReq.patchtext/x-diff; charset=utf-8
v3-0006-libpq-Trace-all-messages-received-from-the-server.patchtext/x-diff; charset=utf-8
v3-0007-libpq-Trace-server-Authentication-messages-in-det.patchtext/x-diff; charset=utf-8
v3-0008-libpq-Trace-NegotiateProtocolVersion-correctly.patchtext/x-diff; charset=utf-8
#13Jelte Fennema-Nio
Jelte Fennema-Nio
postgres@jeltef.nl
In reply to: Alvaro Herrera (#12)
Re: libpq: Fix lots of discrepancies in PQtrace

On Wed, 14 Aug 2024 at 19:37, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

- to 0005 I change your pqTraceOutputEncryptionRequestResponse()
function name to pqTraceOutputCharResponse and instead of attaching
the "Response" literal in the outpuer to the name given in the
function call, just pass the whole string as argument to the function.

Fine by me

- to 0006 I change function name pqFinishParsingMessage() to
pqParseDone() and reworded the commentary; also moved it to fe-misc.c.
Looks good otherwise.

The following removed comments seems useful to keep (I realize I
already removed them in a previous version of the patch, but I don't
think I did that on purpose)

- /* Drop the processed message and loop around for another */

- /* consume the message and exit */

- /* Completed this message, keep going */
- /* trust the specified message length as what to skip */

- 0008 to fix NegotiateProtocolVersion looks correct per [1], but I
don't know how to test it. Suggestions?

Two options:
1. Manually change code to make sure SendNegotiateProtocolVersion is
called in src/backend/tcop/backend_startup.c
2. Apply my patches from this thread[2]/messages/by-id/CAGECzQTyXDNtMXdq2L-Wp=OvOCPa07r6+U_MGb==h90MrfT+fQ@mail.gmail.com and use
max_protocol_version=latest in the connection string while connecting
to an older postgres server.

[2]: /messages/by-id/CAGECzQTyXDNtMXdq2L-Wp=OvOCPa07r6+U_MGb==h90MrfT+fQ@mail.gmail.com

#14Alvaro Herrera
Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Jelte Fennema-Nio (#13)
Re: libpq: Fix lots of discrepancies in PQtrace

Hello,

On 2024-Aug-14, Jelte Fennema-Nio wrote:

The following removed comments seems useful to keep (I realize I
already removed them in a previous version of the patch, but I don't
think I did that on purpose)
[...]

Ah, yeah, I agree. I put them back, and pushed 0005, 6 and 7 as a
single commit. It didn't seem worth pushing each separately, really. I
added two lines for the CopyData message as well, since otherwise the
output shows the "mismatched length" error when getting COPY data.

I'm leaving 0008 to whoever is doing the NegotiateProtocolVersion stuff;
maybe post that one in that thread you mentioned. I'll mark this CF
entry committed.

Many thanks!

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/