protocol support for labels

Started by Jeremy Schneider10 months ago8 messages
#1Jeremy Schneider
schneider@ardentperf.com

pgconf.dev is coming up soon. I won't be able to make it to Montreal,
but I saw that Dave Cramer posted on twitter asking about postgres
protocol topics.

while I don't have a patch, I wanted to send an email about something:
it'd be great to have a place - perhaps like application_name - for
arbitrary labels; and we need protocol support to pass those labels
inline with the query instead of having to require a separate round
trip. it would also be good if these labels were somehow accessible in
logging formats.

observability frameworks like OpenTelemetry support tracing through all
layers of a stack, and trace_ids can even be passed into sql with
extensions like sqlcommenter. however sqlcommenter puts the trace_id
into a comment which effectively breaks all the utility of
pg_stat_statements since each query has a unique trace_id.

if protocol enhancements are on the table, I think it would be great
for this topic to get a little discussion.

-Jeremy

#2Kirill Reshke
reshkekirill@gmail.com
In reply to: Jeremy Schneider (#1)
Re: protocol support for labels

On Tue, 11 Mar 2025 at 11:09, Jeremy Schneider <schneider@ardentperf.com> wrote:

observability frameworks like OpenTelemetry support tracing through all
layers of a stack, and trace_ids can even be passed into sql with
extensions like sqlcommenter. however sqlcommenter puts the trace_id
into a comment which effectively breaks all the utility of
pg_stat_statements since each query has a unique trace_id.

There are some other use-cases:
1) RO-RW routing. Users can pass target-session-attrs to the server
within query labels to hint about its need. Useful when some kind of
proxy (odyssey,pgbouncer,spqr,pgpool II, pgcat, pg_doorman) is used.
2) pg_comment_stats uses comments in the query to accumulate statistics. [0]git@github.com:munakoiso/pg_comment_stats.git

However, I don't think PostgreSQL community is open for this big
change for much (non-big) matters.

[0]: git@github.com:munakoiso/pg_comment_stats.git

--
Best regards,
Kirill Reshke

#3Jeremy Schneider
schneider@ardentperf.com
In reply to: Kirill Reshke (#2)
Re: protocol support for labels

On Mar 11, 2025, at 3:03 AM, Kirill Reshke <reshkekirill@gmail.com> wrote:

On Tue, 11 Mar 2025 at 11:09, Jeremy Schneider <schneider@ardentperf.com> wrote:

observability frameworks like OpenTelemetry support tracing through all
layers of a stack, and trace_ids can even be passed into sql with
extensions like sqlcommenter. however sqlcommenter puts the trace_id
into a comment which effectively breaks all the utility of
pg_stat_statements since each query has a unique trace_id.

There are some other use-cases:
1) RO-RW routing. Users can pass target-session-attrs to the server
within query labels to hint about its need. Useful when some kind of
proxy (odyssey,pgbouncer,spqr,pgpool II, pgcat, pg_doorman) is used.
2) pg_comment_stats uses comments in the query to accumulate statistics. [0]

Thinking a bit more, my root issue is specifically around pg_stat_statements so maybe it’s also solvable with some changes to how query jumbling is done

But that topic seems like one where we’d never get consensus

Should query jumbling for calculating query_id be customizable somehow? How would we resolve multiple concurrent opinions about how queries should be jumbled (eg if comment_stats needs different tweaks than sqlcommenter)? Was there previous discussion about this already? I’ll need to go search mailing list history a bit

-Jeremy

Sent from my TI-83

#4Frits Hoogland
frits.hoogland@gmail.com
In reply to: Jeremy Schneider (#3)
Re: protocol support for labels

The usecase that I think might be useful is to have a database client send metadata along with a query.
This partially is possible today by setting application_name, but that is a separate request, it would be great if that could be sent along with the query in one go.
Another option to pass metadata is to add a comment (/* .. */), but a comment cannot be set for a prepared statement, because the statement is prepared first and then later invoked on runtime, which executes a query that is fixed.

Frits Hoogland

Show quoted text

On 11 Mar 2025, at 15:49, Jeremy Schneider <schneider@ardentperf.com> wrote:

On Mar 11, 2025, at 3:03 AM, Kirill Reshke <reshkekirill@gmail.com> wrote:

On Tue, 11 Mar 2025 at 11:09, Jeremy Schneider <schneider@ardentperf.com> wrote:

observability frameworks like OpenTelemetry support tracing through all
layers of a stack, and trace_ids can even be passed into sql with
extensions like sqlcommenter. however sqlcommenter puts the trace_id
into a comment which effectively breaks all the utility of
pg_stat_statements since each query has a unique trace_id.

There are some other use-cases:
1) RO-RW routing. Users can pass target-session-attrs to the server
within query labels to hint about its need. Useful when some kind of
proxy (odyssey,pgbouncer,spqr,pgpool II, pgcat, pg_doorman) is used.
2) pg_comment_stats uses comments in the query to accumulate statistics. [0]

Thinking a bit more, my root issue is specifically around pg_stat_statements so maybe it’s also solvable with some changes to how query jumbling is done

But that topic seems like one where we’d never get consensus

Should query jumbling for calculating query_id be customizable somehow? How would we resolve multiple concurrent opinions about how queries should be jumbled (eg if comment_stats needs different tweaks than sqlcommenter)? Was there previous discussion about this already? I’ll need to go search mailing list history a bit

-Jeremy

Sent from my TI-83

#5Dave Cramer
davecramer@postgres.rocks
In reply to: Frits Hoogland (#4)
Re: protocol support for labels

Dave Cramer
www.postgres.rocks

On Tue, 11 Mar 2025 at 12:23, Frits Hoogland <frits.hoogland@gmail.com>
wrote:

The usecase that I think might be useful is to have a database client send
metadata along with a query.
This partially is possible today by setting application_name, but that is
a separate request, it would be great if that could be sent along with the
query in one go.
Another option to pass metadata is to add a comment (/* .. */), but a
comment cannot be set for a prepared statement, because the statement is
prepared first and then later invoked on runtime, which executes a query
that is fixed.

*Frits Hoogland*

On 11 Mar 2025, at 15:49, Jeremy Schneider <schneider@ardentperf.com>
wrote:

On Mar 11, 2025, at 3:03 AM, Kirill Reshke <reshkekirill@gmail.com> wrote:

On Tue, 11 Mar 2025 at 11:09, Jeremy Schneider <schneider@ardentperf.com>
wrote:

observability frameworks like OpenTelemetry support tracing through all
layers of a stack, and trace_ids can even be passed into sql with
extensions like sqlcommenter. however sqlcommenter puts the trace_id
into a comment which effectively breaks all the utility of
pg_stat_statements since each query has a unique trace_id.

There are some other use-cases:
1) RO-RW routing. Users can pass target-session-attrs to the server
within query labels to hint about its need. Useful when some kind of
proxy (odyssey,pgbouncer,spqr,pgpool II, pgcat, pg_doorman) is used.
2) pg_comment_stats uses comments in the query to accumulate statistics.
[0]

Thinking a bit more, my root issue is specifically around
pg_stat_statements so maybe it’s also solvable with some changes to how
query jumbling is done

But that topic seems like one where we’d never get consensus

Should query jumbling for calculating query_id be customizable somehow?
How would we resolve multiple concurrent opinions about how queries should
be jumbled (eg if comment_stats needs different tweaks than sqlcommenter)?
Was there previous discussion about this already? I’ll need to go search
mailing list history a bit

-Jeremy

Sent from my TI-83

Jeremy,

Thanks for this. I am hoping to get consensus of ideas for changes to the
protocol.
Please join the discussion here https://discord.gg/bWum3hbM as well.

Krill, change starts with requests and without them nothing will happen.
Please post your ideas.

Dave

#6Nico Williams
nico@cryptonector.com
In reply to: Frits Hoogland (#4)
Re: protocol support for labels

On Tue, Mar 11, 2025 at 05:23:14PM +0100, Frits Hoogland wrote:

The usecase that I think might be useful is to have a database client send metadata along with a query.
This partially is possible today by setting application_name, but that is a separate request, it would be great if that could be sent along with the query in one go.
Another option to pass metadata is to add a comment (/* .. */), but a comment cannot be set for a prepared statement, because the statement is prepared first and then later invoked on runtime, which executes a query that is fixed.

How about using a `set_config()` to deonte the "application_name" (and
any other details) for the _next_ query, then have those details appear
in the pg_stat_statements rows and logs?

Clients would send a `SELECT set_config(...)` and also the next query
one after the other without waiting for the response to the first. The
server could similarly batch the two responses.

Look ma', no protocol change.

Nico
--

#7Jeremy Schneider
schneider@ardentperf.com
In reply to: Nico Williams (#6)
Re: protocol support for labels

On Tue, 11 Mar 2025 14:03:12 -0500
Nico Williams <nico@cryptonector.com> wrote:

How about using a `set_config()` to deonte the "application_name" (and
any other details) for the _next_ query, then have those details
appear in the pg_stat_statements rows and logs?

Clients would send a `SELECT set_config(...)` and also the next query
one after the other without waiting for the response to the first.
The server could similarly batch the two responses.

Isn't multiple queries in one packet only possible with the simple
protocol, but not possible with the extended protocol? So this would be
entirely incompatible with prepared/parameterized statements?

-Jeremy

#8Nico Williams
nico@cryptonector.com
In reply to: Jeremy Schneider (#7)
Re: protocol support for labels

On Wed, Mar 12, 2025 at 11:39:57PM -0700, Jeremy Schneider wrote:

Isn't multiple queries in one packet only possible with the simple
protocol, but not possible with the extended protocol? So this would be
entirely incompatible with prepared/parameterized statements?

It's TCP. Packet and segment boundaries are not exposed to the
application, so PG doesn't know that you sent "two queries in one
packet" (or in one segment). It just reads one query, processes it,
then it reads the next one, processes it, etc.

So all that's needed here is for a well-defined GUC that can be set with
set_config() and whose purpose is to decorate subsequent queries in
pg_stat_statements and/or logs. You could set it once for all your
queries or once per-query -- whatever.

As long as the client-side API can batch one of these calls to
set_config() followed by the actual query you want to do in the same
write() on the socket (TCP_CORK helps) then this idea will be barely
noticeable performance-wise.

Nico
--