compute_query_id and pg_stat_statements

Started by Fujii Masaoover 4 years ago158 messages

masao.fujii@oss.nttdata.com

over 4 years ago

When compute_query_id is not enabled (this is the default setting),
pg_stat_statements doesn't track any statements. This means that
we will see no entries in pg_stat_statements by default. I'm afraid that
users may easily forget to enable compute_query_id
when using pg_stat_statements (because this setting was not necessary
in v13 or before), and finally may have noticed the mis-configuration
and failure of statements tracking after many queries were executed.
For example, we already have one report about this issue, in [1]/messages/by-id/1953aec168224b95b0c962a622bef0794da6ff40.camel@moonset.ru.

Shouldn't we do something so that users can avoid such mis-configuration?

One idea is to change the default value of compute_query_id from false to true.
If enabling compute_query_id doesn't incur any performance penalty,
IMO this idea is very simple and enough.

Another idea is to change pg_stat_statements so that it emits an error
at the server startup (i.e., prevents the server from starting up)
if compute_query_id is not enabled. In this case, users can easily notice
the mis-configuration from the error message in the server log,
enable compute_query_id, and then restart the server.

IMO the former is better if there is no such performance risk. Otherwise
we should adopt the latter approach. Or you have the better idea?

Thought?

[1]: /messages/by-id/1953aec168224b95b0c962a622bef0794da6ff40.camel@moonset.ru
/messages/by-id/1953aec168224b95b0c962a622bef0794da6ff40.camel@moonset.ru

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Fujii Masao (#1)

Re: compute_query_id and pg_stat_statements

Le sam. 24 avr. 2021 à 22:54, Fujii Masao <masao.fujii@oss.nttdata.com> a
écrit :

For example, we already have one report about this issue, in [1].

this report was only a few days after the patch changing the behavior was
committed, unless you've been following the original thread (which has been
going on for 2 years), that's kind of expected. release notes for pg14
should highlight that change, so hopefully people upgrading will see it.
I'll also try to write some blog article about it to add more warnings.

Shouldn't we do something so that users can avoid such mis-configuration?

One idea is to change the default value of compute_query_id from false to
true.
If enabling compute_query_id doesn't incur any performance penalty,
IMO this idea is very simple and enough.

it adds some noticeable overhead in oltp style workloads. I think that I
did some benchmarks in the original thread, and we decided not to enable it
by default

Another idea is to change pg_stat_statements so that it emits an error

at the server startup (i.e., prevents the server from starting up)
if compute_query_id is not enabled. In this case, users can easily notice
the mis-configuration from the error message in the server log,
enable compute_query_id, and then restart the server.

that's also not an option, as one can now use pg_stat_statetements with a
different queryid calculation. see for instance
https://github.com/rjuju/pg_queryid for a proof a concept extension for
that. I think it's clear that multiple people will want to use a different
calculation as they have been asking for that for years.

IMO the former is better if there is no such performance risk. Otherwise

we should adopt the latter approach. Or you have the better idea?

I'm not sure how to address that, as temporarily disabling queryId
calculation should be allowed. maybe we could raise a warning once per
backend if pgss sees a dml query without queryId? but it could end up
creating more problems than it solves.

for the record people have also raised bugs on the powa project because
planning counters are not tracked by default, so compute_query_id will
probably add a bit of traffic.

Show quoted text

Bruce Momjian

bruce@momjian.us

over 4 years ago

In reply to: Fujii Masao (#1)

Re: compute_query_id and pg_stat_statements

On Sat, Apr 24, 2021 at 11:54:25PM +0900, Fujii Masao wrote:

When compute_query_id is not enabled (this is the default setting),
pg_stat_statements doesn't track any statements. This means that
we will see no entries in pg_stat_statements by default. I'm afraid that
users may easily forget to enable compute_query_id
when using pg_stat_statements (because this setting was not necessary
in v13 or before), and finally may have noticed the mis-configuration
and failure of statements tracking after many queries were executed.
For example, we already have one report about this issue, in [1].

Shouldn't we do something so that users can avoid such mis-configuration?

One idea is to change the default value of compute_query_id from false to true.
If enabling compute_query_id doesn't incur any performance penalty,
IMO this idea is very simple and enough.

I think the query overhead was too high (2%) to enable it by default:

/messages/by-id/20201016160355.GA31474@alvherre.pgsql

Another idea is to change pg_stat_statements so that it emits an error
at the server startup (i.e., prevents the server from starting up)
if compute_query_id is not enabled. In this case, users can easily notice
the mis-configuration from the error message in the server log,
enable compute_query_id, and then restart the server.

I think it throws an error in the server logs, but preventing server
start seems extreme. Also, compute_query_id is PGC_SUSET, meaning it
can be changed by the super-user, so you could enable compute_query_id
without a server restart, which makes failing on start kind of odd.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

Magnus Hagander

magnus@hagander.net

over 4 years ago

In reply to: Bruce Momjian (#3)

Re: compute_query_id and pg_stat_statements

On Sat, Apr 24, 2021 at 5:22 PM Bruce Momjian <bruce@momjian.us> wrote:

On Sat, Apr 24, 2021 at 11:54:25PM +0900, Fujii Masao wrote:

When compute_query_id is not enabled (this is the default setting),
pg_stat_statements doesn't track any statements. This means that
we will see no entries in pg_stat_statements by default. I'm afraid that
users may easily forget to enable compute_query_id
when using pg_stat_statements (because this setting was not necessary
in v13 or before), and finally may have noticed the mis-configuration
and failure of statements tracking after many queries were executed.
For example, we already have one report about this issue, in [1].

Shouldn't we do something so that users can avoid such mis-configuration?

One idea is to change the default value of compute_query_id from false to true.
If enabling compute_query_id doesn't incur any performance penalty,
IMO this idea is very simple and enough.

I think the query overhead was too high (2%) to enable it by default:

/messages/by-id/20201016160355.GA31474@alvherre.pgsql

Personally I'd say 2% is not too high to turn it on by default, as it
goes down when you move past trivial queries, which is what most
people do. And since you can easily turn it off.

Another idea is to change pg_stat_statements so that it emits an error
at the server startup (i.e., prevents the server from starting up)
if compute_query_id is not enabled. In this case, users can easily notice
the mis-configuration from the error message in the server log,
enable compute_query_id, and then restart the server.

I think it throws an error in the server logs, but preventing server
start seems extreme. Also, compute_query_id is PGC_SUSET, meaning it
can be changed by the super-user, so you could enable compute_query_id
without a server restart, which makes failing on start kind of odd.

How about turning it into an enum instead of a boolean, that can be:

off = always off
auto = pg_stat_statments turns it on when it's loaded in
shared_preload_libraries. Other extensions using it can do that to.
But it remains off if you haven't installed any *extension* that needs
it
on = always on (if you want it in pg_stat_activity regardless of extensions)

The default would be "auto", which means that pg_stat_statements would
work as expected, but those who haven't installed it (or another
extension that changes it) would not have to pay the overhead.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

Bruce Momjian

bruce@momjian.us

over 4 years ago

In reply to: Magnus Hagander (#4)

Re: compute_query_id and pg_stat_statements

On Sat, Apr 24, 2021 at 06:48:53PM +0200, Magnus Hagander wrote:

I think the query overhead was too high (2%) to enable it by default:

/messages/by-id/20201016160355.GA31474@alvherre.pgsql

Personally I'd say 2% is not too high to turn it on by default, as it
goes down when you move past trivial queries, which is what most
people do. And since you can easily turn it off.

We would do a lot of work to reduce overhead by 2% on every query, and
to add 2% for a hash that previously was only used by pg_stat_statements
seems unwise.

How about turning it into an enum instead of a boolean, that can be:

off = always off
auto = pg_stat_statments turns it on when it's loaded in
shared_preload_libraries. Other extensions using it can do that to.
But it remains off if you haven't installed any *extension* that needs
it
on = always on (if you want it in pg_stat_activity regardless of extensions)

The default would be "auto", which means that pg_stat_statements would
work as expected, but those who haven't installed it (or another
extension that changes it) would not have to pay the overhead.

That's a pretty weird API. I think we just need people to turn it on
like they are doing when the configure pg_stat_statements anyway.
pg_stat_statements already requires configuration anyway.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

Tom Lane

tgl@sss.pgh.pa.us

over 4 years ago

In reply to: Bruce Momjian (#5)

Re: compute_query_id and pg_stat_statements

Bruce Momjian <bruce@momjian.us> writes:

That's a pretty weird API. I think we just need people to turn it on
like they are doing when the configure pg_stat_statements anyway.
pg_stat_statements already requires configuration anyway.

Agreed. If pg_stat_statements were zero-configuration today then
this would be an annoying new burden, but it isn't.

I haven't looked, but did we put anything into pg_stat_statements
to make it easy to tell if you've messed up this setting?

regards, tom lane

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Tom Lane (#6)

Re: compute_query_id and pg_stat_statements

On Sat, Apr 24, 2021 at 01:43:51PM -0400, Tom Lane wrote:

I haven't looked, but did we put anything into pg_stat_statements
to make it easy to tell if you've messed up this setting?

You mean apart from from having pg_stat_statements' view/SRFs returning
nothing?

I think it's a reasonable use case to sometime disable query_id calculation,
eg. if you know that it will only lead to useless bloat in the entry and that
you won't need the info, so spamming warnings if there are no queryid could
cause some pain.

Tom Lane

tgl@sss.pgh.pa.us

over 4 years ago

In reply to: Julien Rouhaud (#7)

Re: compute_query_id and pg_stat_statements

Julien Rouhaud <rjuju123@gmail.com> writes:

On Sat, Apr 24, 2021 at 01:43:51PM -0400, Tom Lane wrote:

I haven't looked, but did we put anything into pg_stat_statements
to make it easy to tell if you've messed up this setting?

You mean apart from from having pg_stat_statements' view/SRFs returning
nothing?

I think it's a reasonable use case to sometime disable query_id calculation,
eg. if you know that it will only lead to useless bloat in the entry and that
you won't need the info, so spamming warnings if there are no queryid could
cause some pain.

I agree repeated warnings would be bad news. I was wondering if we could
arrange a single warning at the time pg_stat_statements is preloaded into
the postmaster. In this way, if you tried to use a configuration file
that used to work, you'd hopefully get some notice about why it no longer
does what you want. Also, if you are preloading pg_stat_statements, it
seems reasonable to assume that you'd like the global value of the flag
to be "on", even if there are use-cases for transiently disabling it.

I think the way to detect "being loaded into the postmaster" is
if (IsPostmasterEnvironment && !IsUnderPostmaster)

regards, tom lane

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Tom Lane (#8)

Re: compute_query_id and pg_stat_statements

On Sun, Apr 25, 2021 at 11:39:55AM -0400, Tom Lane wrote:

I agree repeated warnings would be bad news. I was wondering if we could
arrange a single warning at the time pg_stat_statements is preloaded into
the postmaster. In this way, if you tried to use a configuration file
that used to work, you'd hopefully get some notice about why it no longer
does what you want. Also, if you are preloading pg_stat_statements, it
seems reasonable to assume that you'd like the global value of the flag
to be "on", even if there are use-cases for transiently disabling it.

What about people who wants to use pg_stat_statements but are not ok with our
query_id heuristics and use a third-party plugin for that?

#10

Tom Lane

tgl@sss.pgh.pa.us

over 4 years ago

In reply to: Julien Rouhaud (#9)

Re: compute_query_id and pg_stat_statements

Julien Rouhaud <rjuju123@gmail.com> writes:

On Sun, Apr 25, 2021 at 11:39:55AM -0400, Tom Lane wrote:

I agree repeated warnings would be bad news. I was wondering if we could
arrange a single warning at the time pg_stat_statements is preloaded into
the postmaster. In this way, if you tried to use a configuration file
that used to work, you'd hopefully get some notice about why it no longer
does what you want. Also, if you are preloading pg_stat_statements, it
seems reasonable to assume that you'd like the global value of the flag
to be "on", even if there are use-cases for transiently disabling it.

What about people who wants to use pg_stat_statements but are not ok with our
query_id heuristics and use a third-party plugin for that?

They're still going to want the GUC set to something other than "off",
no? In any case it's just a one-time log message, so it's not likely
to be *that* annoying.

regards, tom lane

#11

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Tom Lane (#10)

Re: compute_query_id and pg_stat_statements

On Sun, Apr 25, 2021 at 01:17:03PM -0400, Tom Lane wrote:

Julien Rouhaud <rjuju123@gmail.com> writes:

On Sun, Apr 25, 2021 at 11:39:55AM -0400, Tom Lane wrote:

I agree repeated warnings would be bad news. I was wondering if we could
arrange a single warning at the time pg_stat_statements is preloaded into
the postmaster. In this way, if you tried to use a configuration file
that used to work, you'd hopefully get some notice about why it no longer
does what you want. Also, if you are preloading pg_stat_statements, it
seems reasonable to assume that you'd like the global value of the flag
to be "on", even if there are use-cases for transiently disabling it.

What about people who wants to use pg_stat_statements but are not ok with our
query_id heuristics and use a third-party plugin for that?

They're still going to want the GUC set to something other than "off",
no?

They will want compute_query_id to be off. And they actually will *need* that,
as we recommend third-party plugins computing alternative query_id to error out
if they see a that a query_id has already been generated, to avoid any problem
if compute_query_id is being temporarily toggled. That's what I did in the POC
plugin for external query_id at [1]https://github.com/rjuju/pg_queryid/blob/master/pg_queryid.c#L172.

In any case it's just a one-time log message, so it's not likely
to be *that* annoying.

In that case it should be phrased in a way that makes it clear that
pg_stat_statements can work without enabling compute_query_id, something like:

"compute_query_id is disabled. This module won't track any activity unless you
configured a third-party extension that computes query identifiers"

[1]: https://github.com/rjuju/pg_queryid/blob/master/pg_queryid.c#L172

#12

Peter Eisentraut

peter.eisentraut@enterprisedb.com

over 4 years ago

In reply to: Tom Lane (#6)

Re: compute_query_id and pg_stat_statements

On 24.04.21 19:43, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

That's a pretty weird API. I think we just need people to turn it on
like they are doing when the configure pg_stat_statements anyway.
pg_stat_statements already requires configuration anyway.

Agreed. If pg_stat_statements were zero-configuration today then
this would be an annoying new burden, but it isn't.

I think people can understand "add pg_stat_statements to
shared_preload_libraries" and "install the extension". You have to turn
it on somehow after all.

Now there is the additional burden of turning on this weird setting that
no one understands. That's a 50% increase in burden.

And almost no one will want to use a nondefault setting.

pg_stat_statements is pretty popular. I think leaving in this
requirement will lead to widespread confusion and complaints.

#13

Christoph Berg

myon@debian.org

over 4 years ago

In reply to: Peter Eisentraut (#12)

Re: compute_query_id and pg_stat_statements

Re: Peter Eisentraut

Agreed. If pg_stat_statements were zero-configuration today then
this would be an annoying new burden, but it isn't.

I think people can understand "add pg_stat_statements to
shared_preload_libraries" and "install the extension". You have to turn it
on somehow after all.

Fwiw, I'd claim that pg_stat_statements *is* zero-configuration today.
You just have to load the module (= shared_preload_libraries), and it
will start working. Later you can run CREATE EXTENSION to actually see
the stats, but they are already being collected in the background.

Now there is the additional burden of turning on this weird setting that no
one understands. That's a 50% increase in burden.

And almost no one will want to use a nondefault setting.

pg_stat_statements is pretty popular. I think leaving in this requirement
will lead to widespread confusion and complaints.

Ack, please make the default config (i.e. after setting shared_preload_libraries)
do something sensible. Having to enable some "weird" internal other setting
will be very hard to explain to users.

Fwiw, I'd even want to have pg_stat_statements enabled in core by
default. That would awesome UX. (And turning off could be as simple as
setting compute_query_id=off.)

Christoph

#14

Bruce Momjian

bruce@momjian.us

over 4 years ago

In reply to: Christoph Berg (#13)

Re: compute_query_id and pg_stat_statements

On Mon, Apr 26, 2021 at 05:34:30PM +0200, Christoph Berg wrote:

Re: Peter Eisentraut

Agreed. If pg_stat_statements were zero-configuration today then
this would be an annoying new burden, but it isn't.

I think people can understand "add pg_stat_statements to
shared_preload_libraries" and "install the extension". You have to turn it
on somehow after all.

Fwiw, I'd claim that pg_stat_statements *is* zero-configuration today.
You just have to load the module (= shared_preload_libraries), and it
will start working. Later you can run CREATE EXTENSION to actually see
the stats, but they are already being collected in the background.

Now there is the additional burden of turning on this weird setting that no
one understands. That's a 50% increase in burden.

And almost no one will want to use a nondefault setting.

pg_stat_statements is pretty popular. I think leaving in this requirement
will lead to widespread confusion and complaints.

Ack, please make the default config (i.e. after setting shared_preload_libraries)
do something sensible. Having to enable some "weird" internal other setting
will be very hard to explain to users.

Fwiw, I'd even want to have pg_stat_statements enabled in core by
default. That would awesome UX. (And turning off could be as simple as
setting compute_query_id=off.)

Techically, pg_stat_statements can turn on compute_query_id when it is
loaded, even if it is 'off' in postgresql.conf, right? And
pg_stat_statements would know if an alternate hash method is being used,
right?

This is closer to Magnus's idea of having a three-value
compute_query_id, except is it more controlled by pg_stat_statements.
Another idea would be to throw a user-visible warning if the
pg_stat_statements extension is loaded and compute_query_id is off.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#15

Stephen Frost

sfrost@snowman.net

over 4 years ago

In reply to: Bruce Momjian (#14)

Re: compute_query_id and pg_stat_statements

Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:

On Mon, Apr 26, 2021 at 05:34:30PM +0200, Christoph Berg wrote:

Re: Peter Eisentraut

Agreed. If pg_stat_statements were zero-configuration today then
this would be an annoying new burden, but it isn't.

I think people can understand "add pg_stat_statements to
shared_preload_libraries" and "install the extension". You have to turn it
on somehow after all.

Fwiw, I'd claim that pg_stat_statements *is* zero-configuration today.
You just have to load the module (= shared_preload_libraries), and it
will start working. Later you can run CREATE EXTENSION to actually see
the stats, but they are already being collected in the background.

Now there is the additional burden of turning on this weird setting that no
one understands. That's a 50% increase in burden.

And almost no one will want to use a nondefault setting.

pg_stat_statements is pretty popular. I think leaving in this requirement
will lead to widespread confusion and complaints.

Ack, please make the default config (i.e. after setting shared_preload_libraries)
do something sensible. Having to enable some "weird" internal other setting
will be very hard to explain to users.

Fwiw, I'd even want to have pg_stat_statements enabled in core by
default. That would awesome UX. (And turning off could be as simple as
setting compute_query_id=off.)

Techically, pg_stat_statements can turn on compute_query_id when it is
loaded, even if it is 'off' in postgresql.conf, right? And
pg_stat_statements would know if an alternate hash method is being used,
right?

+1 on this approach. I agree that we should avoid having to make every
new user and every user who is upgrading with pg_stat_statements
installed have to go twiddle this parameter.

Thanks,

Stephen

#16

Tom Lane

tgl@sss.pgh.pa.us

over 4 years ago

In reply to: Stephen Frost (#15)

Re: compute_query_id and pg_stat_statements

Stephen Frost <sfrost@snowman.net> writes:

* Bruce Momjian (bruce@momjian.us) wrote:

Techically, pg_stat_statements can turn on compute_query_id when it is
loaded, even if it is 'off' in postgresql.conf, right? And
pg_stat_statements would know if an alternate hash method is being used,
right?

+1 on this approach.

That'd make it impossible to turn off or adjust afterwards, wouldn't it?
I'm afraid the confusion stemming from that would outweigh any simplicity.

I would be in favor of logging a message at startup to the effect of
"this is misconfigured" (as per upthread discussion), although whether
people would see that is uncertain.

In the end, it's not like this is the first time we've ever made an
incompatible change in configuration needs; and it won't be the last
either. I don't buy the argument that pg_stat_statements users can't
cope with adding the additional setting. (Of course, we should be
careful to call it out as an incompatible change in the release notes.)

regards, tom lane

#17

Bruce Momjian

bruce@momjian.us

over 4 years ago

In reply to: Tom Lane (#16)

Re: compute_query_id and pg_stat_statements

On Mon, Apr 26, 2021 at 12:56:13PM -0400, Tom Lane wrote:

Stephen Frost <sfrost@snowman.net> writes:

* Bruce Momjian (bruce@momjian.us) wrote:

Techically, pg_stat_statements can turn on compute_query_id when it is
loaded, even if it is 'off' in postgresql.conf, right? And
pg_stat_statements would know if an alternate hash method is being used,
right?

+1 on this approach.

That'd make it impossible to turn off or adjust afterwards, wouldn't it?
I'm afraid the confusion stemming from that would outweigh any simplicity.

I would be in favor of logging a message at startup to the effect of
"this is misconfigured" (as per upthread discussion), although whether
people would see that is uncertain.

I think a user-visible warning at CREATE EXNTENSION would help too.

In the end, it's not like this is the first time we've ever made an
incompatible change in configuration needs; and it won't be the last
either. I don't buy the argument that pg_stat_statements users can't
cope with adding the additional setting. (Of course, we should be
careful to call it out as an incompatible change in the release notes.)

Agreed.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#18

Magnus Hagander

magnus@hagander.net

over 4 years ago

In reply to: Tom Lane (#16)

Re: compute_query_id and pg_stat_statements

On Mon, Apr 26, 2021 at 6:56 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Stephen Frost <sfrost@snowman.net> writes:

* Bruce Momjian (bruce@momjian.us) wrote:

Techically, pg_stat_statements can turn on compute_query_id when it is
loaded, even if it is 'off' in postgresql.conf, right? And
pg_stat_statements would know if an alternate hash method is being used,
right?

+1 on this approach.

That'd make it impossible to turn off or adjust afterwards, wouldn't it?
I'm afraid the confusion stemming from that would outweigh any simplicity.

Thatäs why I suggested the three value one. Default to a mode where
it's automatic, which is what the majority is going to want, but have
a way to explicitly turn it on.

I would be in favor of logging a message at startup to the effect of
"this is misconfigured" (as per upthread discussion), although whether
people would see that is uncertain.

Some people would. Many wouldn't, and sadly many hours would be spent
on debugging things before they got there -- based on experience of
how many people actually read the logs..

In the end, it's not like this is the first time we've ever made an
incompatible change in configuration needs; and it won't be the last
either. I don't buy the argument that pg_stat_statements users can't
cope with adding the additional setting. (Of course, we should be
careful to call it out as an incompatible change in the release notes.)

The fact that we've made changes before that complicated our users
experience isn't in itself an argument for doing it again though...

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#19

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Tom Lane (#16)

Re: compute_query_id and pg_stat_statements

On Tue, Apr 27, 2021 at 12:56 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Stephen Frost <sfrost@snowman.net> writes:

* Bruce Momjian (bruce@momjian.us) wrote:

Techically, pg_stat_statements can turn on compute_query_id when it is
loaded, even if it is 'off' in postgresql.conf, right? And
pg_stat_statements would know if an alternate hash method is being used,
right?

+1 on this approach.

That'd make it impossible to turn off or adjust afterwards, wouldn't it?

I think so, which would also make it impossible to use an external
query_id plugin.

Enabling compute_query_id by default or raising a WARNING in
pg_stat_statements' PG_INIT seems like the only 2 sensible options.

#20

Magnus Hagander

magnus@hagander.net

over 4 years ago

In reply to: Bruce Momjian (#17)

Re: compute_query_id and pg_stat_statements

On Mon, Apr 26, 2021 at 7:00 PM Bruce Momjian <bruce@momjian.us> wrote:

On Mon, Apr 26, 2021 at 12:56:13PM -0400, Tom Lane wrote:

Stephen Frost <sfrost@snowman.net> writes:

* Bruce Momjian (bruce@momjian.us) wrote:

Techically, pg_stat_statements can turn on compute_query_id when it is
loaded, even if it is 'off' in postgresql.conf, right? And
pg_stat_statements would know if an alternate hash method is being used,
right?

+1 on this approach.

That'd make it impossible to turn off or adjust afterwards, wouldn't it?
I'm afraid the confusion stemming from that would outweigh any simplicity.

I would be in favor of logging a message at startup to the effect of
"this is misconfigured" (as per upthread discussion), although whether
people would see that is uncertain.

I think a user-visible warning at CREATE EXNTENSION would help too.

It would help a bit, but actually logging it would probably help more.
Most people don't run the CREATE EXTENSION commands manually, it's all
done as part of either system install scripts or of application
migrations.

But that doesn't mean it wouldn't be useful to do it for those that
*do* run things manually, it just wouldn't be sufficient in itself.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#21

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Magnus Hagander (#18)

Re: compute_query_id and pg_stat_statements

On Tue, Apr 27, 2021 at 1:04 AM Magnus Hagander <magnus@hagander.net> wrote:

Thatäs why I suggested the three value one. Default to a mode where
it's automatic, which is what the majority is going to want, but have
a way to explicitly turn it on.

Agreed, that also sounds like a sensible default.

#22

Stephen Frost

sfrost@snowman.net

over 4 years ago

In reply to: Magnus Hagander (#18)

Re: compute_query_id and pg_stat_statements

Greetings,

* Magnus Hagander (magnus@hagander.net) wrote:

On Mon, Apr 26, 2021 at 6:56 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Stephen Frost <sfrost@snowman.net> writes:

* Bruce Momjian (bruce@momjian.us) wrote:

Techically, pg_stat_statements can turn on compute_query_id when it is
loaded, even if it is 'off' in postgresql.conf, right? And
pg_stat_statements would know if an alternate hash method is being used,
right?

+1 on this approach.

That'd make it impossible to turn off or adjust afterwards, wouldn't it?
I'm afraid the confusion stemming from that would outweigh any simplicity.

I don't know that it actually would, but ...

Thatäs why I suggested the three value one. Default to a mode where
it's automatic, which is what the majority is going to want, but have
a way to explicitly turn it on.

This is certainly fine with me too, though it seems a bit surprising to
me that we couldn't just figure out what the user actually wants based
on what's installed/running for any given combination.

In the end, it's not like this is the first time we've ever made an
incompatible change in configuration needs; and it won't be the last
either. I don't buy the argument that pg_stat_statements users can't
cope with adding the additional setting. (Of course, we should be
careful to call it out as an incompatible change in the release notes.)

The fact that we've made changes before that complicated our users
experience isn't in itself an argument for doing it again though...

I'm generally a proponent of sensible changes across major versions even
if it means that the user has to adjust things, but this seems like a
case where we're punting on something that we really should just be able
to figure out the right answer to and that seems like a step backwards.

Thanks,

Stephen

#23

Tom Lane

tgl@sss.pgh.pa.us

over 4 years ago

In reply to: Stephen Frost (#22)

Re: compute_query_id and pg_stat_statements

Stephen Frost <sfrost@snowman.net> writes:

* Magnus Hagander (magnus@hagander.net) wrote:

Thatäs why I suggested the three value one. Default to a mode where
it's automatic, which is what the majority is going to want, but have
a way to explicitly turn it on.

This is certainly fine with me too, though it seems a bit surprising to
me that we couldn't just figure out what the user actually wants based
on what's installed/running for any given combination.

I'd be on board with having pg_stat_statement's pg_init function do
something to adjust the setting, if we can figure out how to do that
in a way that's not confusing in itself. I'm not sure though that
the GUC engine offers a good way.

regards, tom lane

#24

Alvaro Herrera

alvherre@alvh.no-ip.org

over 4 years ago

In reply to: Tom Lane (#23)

Re: compute_query_id and pg_stat_statements

On 2021-Apr-26, Tom Lane wrote:

Stephen Frost <sfrost@snowman.net> writes:

* Magnus Hagander (magnus@hagander.net) wrote:

Thatï¿½s why I suggested the three value one. Default to a mode where
it's automatic, which is what the majority is going to want, but have
a way to explicitly turn it on.

This is certainly fine with me too, though it seems a bit surprising to
me that we couldn't just figure out what the user actually wants based
on what's installed/running for any given combination.

I'd be on board with having pg_stat_statement's pg_init function do
something to adjust the setting, if we can figure out how to do that
in a way that's not confusing in itself. I'm not sure though that
the GUC engine offers a good way.

I think it's straightforward, if we decouple the tri-valued enum used
for guc.c purposes from a separate boolean that actually enables the
feature. GUC sets the boolean to "off" initially when it sees the enum
as "auto", and then pg_stat_statement's _PG_init modifies it during its
own startup as needed.

So the user can turn the GUC off, and then pg_stat_statement does
nothing and there is no performance drawback; or leave it "auto" and
it'll only compute query_id if pg_stat_statement is loaded; or leave it
on if they want the query_id for other purposes.

--
ï¿½lvaro Herrera 39ï¿½49'30"S 73ï¿½17'W
"El miedo atento y previsor es la madre de la seguridad" (E. Burke)

#25

Stephen Frost

sfrost@snowman.net

over 4 years ago

In reply to: Tom Lane (#23)

Re: compute_query_id and pg_stat_statements

Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Stephen Frost <sfrost@snowman.net> writes:

* Magnus Hagander (magnus@hagander.net) wrote:

Thatäs why I suggested the three value one. Default to a mode where
it's automatic, which is what the majority is going to want, but have
a way to explicitly turn it on.

This is certainly fine with me too, though it seems a bit surprising to
me that we couldn't just figure out what the user actually wants based
on what's installed/running for any given combination.

I'd be on board with having pg_stat_statement's pg_init function do
something to adjust the setting, if we can figure out how to do that
in a way that's not confusing in itself. I'm not sure though that
the GUC engine offers a good way.

Both of the extensions are getting loaded via pg_stat_statements and
both can have pg_init functions which work together to come up with the
right answer, no?

That is- can't pg_stat_statements, when it's loaded, enable
compute_query_id if it's not already enabled, and then the pg_queryid
module simply disable it when it gets loaded in it's pg_init()? Telling
people who are using pg_queryid to have it loaded *after*
pg_stat_statements certainly seems reasonable to me, but if folks don't
like that then maybe have a tri-state which is 'auto', 'on', and 'off',
where pg_stat_statements would set it to 'on' if it's set to 'auto', but
not do anything if it starts as 'off'. pg_queryid would then set it to
'off' when it's loaded and it wouldn't matter if it's loaded before or
after.

Thanks,

Stephen

#26

Stephen Frost

sfrost@snowman.net

over 4 years ago

In reply to: Alvaro Herrera (#24)

Re: compute_query_id and pg_stat_statements

Greetings,

* Alvaro Herrera (alvherre@alvh.no-ip.org) wrote:

On 2021-Apr-26, Tom Lane wrote:

Stephen Frost <sfrost@snowman.net> writes:

* Magnus Hagander (magnus@hagander.net) wrote:

Thatäs why I suggested the three value one. Default to a mode where
it's automatic, which is what the majority is going to want, but have
a way to explicitly turn it on.

This is certainly fine with me too, though it seems a bit surprising to
me that we couldn't just figure out what the user actually wants based
on what's installed/running for any given combination.

I'd be on board with having pg_stat_statement's pg_init function do
something to adjust the setting, if we can figure out how to do that
in a way that's not confusing in itself. I'm not sure though that
the GUC engine offers a good way.

I think it's straightforward, if we decouple the tri-valued enum used
for guc.c purposes from a separate boolean that actually enables the
feature. GUC sets the boolean to "off" initially when it sees the enum
as "auto", and then pg_stat_statement's _PG_init modifies it during its
own startup as needed.

So the user can turn the GUC off, and then pg_stat_statement does
nothing and there is no performance drawback; or leave it "auto" and
it'll only compute query_id if pg_stat_statement is loaded; or leave it
on if they want the query_id for other purposes.

Yeah, this is more-or-less the same as what I was just proposing in an
email that crossed this one. Using a separate boolean would certainly
be fine.

Thanks,

Stephen

#27

Andres Freund

andres@anarazel.de

over 4 years ago

In reply to: Alvaro Herrera (#24)

Re: compute_query_id and pg_stat_statements

Hi,

On 2021-04-26 13:43:31 -0400, Alvaro Herrera wrote:

I think it's straightforward, if we decouple the tri-valued enum used
for guc.c purposes from a separate boolean that actually enables the
feature. GUC sets the boolean to "off" initially when it sees the enum
as "auto", and then pg_stat_statement's _PG_init modifies it during its
own startup as needed.

So the user can turn the GUC off, and then pg_stat_statement does
nothing and there is no performance drawback; or leave it "auto" and
it'll only compute query_id if pg_stat_statement is loaded; or leave it
on if they want the query_id for other purposes.

I think that's the right direction. I wonder though if we shouldn't go a
bit further. Have one guc that determines the "query id provider" (NULL
or a shared library), and one GUC that configures whether query-id is
computed (never, on-demand/auto, always). For the provider GUC load the
.so and look up a function with some well known name.

Greetings,

Andres Freund

#28

Tom Lane

tgl@sss.pgh.pa.us

over 4 years ago

In reply to: Andres Freund (#27)

Re: compute_query_id and pg_stat_statements

Andres Freund <andres@anarazel.de> writes:

I think that's the right direction. I wonder though if we shouldn't go a
bit further. Have one guc that determines the "query id provider" (NULL
or a shared library), and one GUC that configures whether query-id is
computed (never, on-demand/auto, always). For the provider GUC load the
.so and look up a function with some well known name.

That's sounding like a pretty sane design, actually. Not sure about
the shared-library-name-with-fixed-function-name detail, but certainly
it seems to be useful to separate "I need a query-id" from the details
of the ID calculation.

Rather than a GUC per se for the ID provider, maybe we could have a
function hook that defaults to pointing at the in-core computation,
and then a module wanting to override that just gets into the hook.

regards, tom lane

#29

Andres Freund

andres@anarazel.de

over 4 years ago

In reply to: Tom Lane (#28)

Re: compute_query_id and pg_stat_statements

Hi,

On 2021-04-26 14:21:00 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:
That's sounding like a pretty sane design, actually. Not sure about
the shared-library-name-with-fixed-function-name detail, but certainly
it seems to be useful to separate "I need a query-id" from the details
of the ID calculation.

Rather than a GUC per se for the ID provider, maybe we could have a
function hook that defaults to pointing at the in-core computation,
and then a module wanting to override that just gets into the hook.

I have a preference to determining the provider via GUC instead of a
hook because it is both easier to introspect and easier to configure.

If the provider is loaded via a hook, and the shared library is loaded
via shared_preload_libraries, one can't easily just turn that off in a
single session, but needs to restart or explicitly load a different
library (that can't already be loaded).

We also don't have any way to show what's hooking into a hook.

Greetings,

Andres Freund

#30

Magnus Hagander

magnus@hagander.net

over 4 years ago

In reply to: Andres Freund (#27)

Re: compute_query_id and pg_stat_statements

On Mon, Apr 26, 2021 at 8:14 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2021-04-26 13:43:31 -0400, Alvaro Herrera wrote:

I think it's straightforward, if we decouple the tri-valued enum used
for guc.c purposes from a separate boolean that actually enables the
feature. GUC sets the boolean to "off" initially when it sees the enum
as "auto", and then pg_stat_statement's _PG_init modifies it during its
own startup as needed.

That's pretty much exactly my original suggestion, yes :)

So the user can turn the GUC off, and then pg_stat_statement does
nothing and there is no performance drawback; or leave it "auto" and
it'll only compute query_id if pg_stat_statement is loaded; or leave it
on if they want the query_id for other purposes.

I think that's the right direction. I wonder though if we shouldn't go a
bit further. Have one guc that determines the "query id provider" (NULL
or a shared library), and one GUC that configures whether query-id is
computed (never, on-demand/auto, always). For the provider GUC load the
.so and look up a function with some well known name.

On Mon, Apr 26, 2021 at 8:37 PM Andres Freund <andres@anarazel.de> wrote:

I have a preference to determining the provider via GUC instead of a
hook because it is both easier to introspect and easier to configure.

+1 in general. Though we could of course also have a read-only
internal GUC that would show what we ended up with, and still
configure it with shared_preload_libraries, or loaded in some other
way. In a way it'd be cleaner to "always load modules with
shared_preload_libraries", but I can certainly see the arguments in
either direction..

But whichever way it's configured, having a well exposed way to know
what it actually is would be important.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#31

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Andres Freund (#29)

Re: compute_query_id and pg_stat_statements

On Mon, Apr 26, 2021 at 11:37:45AM -0700, Andres Freund wrote:

Hi,

On 2021-04-26 14:21:00 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:
That's sounding like a pretty sane design, actually. Not sure about
the shared-library-name-with-fixed-function-name detail, but certainly
it seems to be useful to separate "I need a query-id" from the details
of the ID calculation.

Rather than a GUC per se for the ID provider, maybe we could have a
function hook that defaults to pointing at the in-core computation,
and then a module wanting to override that just gets into the hook.

I have a preference to determining the provider via GUC instead of a
hook because it is both easier to introspect and easier to configure.

In any case, having a different provider would greatly simplify third-party
queryid lib authors and users life. For now the core queryid is computed
before post_parse_analyze_hook, but any third party plugin would have to do it
as a post_parse_analyze_hook, so you have to make sure that the lib is at the
right position in shared_preload_libraries to have it work, eg. [1]https://github.com/rjuju/pg_queryid/blob/master/pg_queryid.c#L116-L117, depending
on how pg_stat_statements and other similar module call
prev_post_parse_analyze_hook, which is a pretty bad thing.

If the provider is loaded via a hook, and the shared library is loaded
via shared_preload_libraries, one can't easily just turn that off in a
single session, but needs to restart or explicitly load a different
library (that can't already be loaded).

On the other hand we *don't* want to dynamically change the provider.
Temporarily enabling/disabling queryid calculation is ok, but generating
different have for the same query isn't.

We also don't have any way to show what's hooking into a hook.

If we had a dedicated query_id hook, then plugins should error out if users
configured multiple plugins to calculate a query_id, so it should be easy to
know which plugin is responsible for it without knowing who hooked into the
hook.

[1]: https://github.com/rjuju/pg_queryid/blob/master/pg_queryid.c#L116-L117

#32

Michael Paquier

michael@paquier.xyz

over 4 years ago

In reply to: Julien Rouhaud (#31)

Re: compute_query_id and pg_stat_statements

On Tue, Apr 27, 2021 at 02:25:04PM +0800, Julien Rouhaud wrote:

On Mon, Apr 26, 2021 at 11:37:45AM -0700, Andres Freund wrote:

On 2021-04-26 14:21:00 -0400, Tom Lane wrote:

That's sounding like a pretty sane design, actually. Not sure about
the shared-library-name-with-fixed-function-name detail, but certainly
it seems to be useful to separate "I need a query-id" from the details
of the ID calculation.

Rather than a GUC per se for the ID provider, maybe we could have a
function hook that defaults to pointing at the in-core computation,
and then a module wanting to override that just gets into the hook.

I have a preference to determining the provider via GUC instead of a
hook because it is both easier to introspect and easier to configure.

So, this thread has died two weeks ago, and it is still an open item.
Could it be possible to move to a resolution by beta1? The consensus
I can get from the thread is that we should have a tri-value state to
track an extra "auto" for the query ID computation, as proposed by
Alvaro here:
/messages/by-id/20210426174331.GA19401@alvherre.pgsql

Unfortunately, nothing has happened to be able to do something like
that.
--
Michael

#33

Fujii Masao

masao.fujii@oss.nttdata.com

over 4 years ago

In reply to: Michael Paquier (#32)

Re: compute_query_id and pg_stat_statements

On 2021/05/11 15:04, Michael Paquier wrote:

On Tue, Apr 27, 2021 at 02:25:04PM +0800, Julien Rouhaud wrote:

On Mon, Apr 26, 2021 at 11:37:45AM -0700, Andres Freund wrote:

On 2021-04-26 14:21:00 -0400, Tom Lane wrote:

That's sounding like a pretty sane design, actually. Not sure about
the shared-library-name-with-fixed-function-name detail, but certainly
it seems to be useful to separate "I need a query-id" from the details
of the ID calculation.

Rather than a GUC per se for the ID provider, maybe we could have a
function hook that defaults to pointing at the in-core computation,
and then a module wanting to override that just gets into the hook.

I have a preference to determining the provider via GUC instead of a
hook because it is both easier to introspect and easier to configure.

So, this thread has died two weeks ago, and it is still an open item.
Could it be possible to move to a resolution by beta1? The consensus
I can get from the thread is that we should have a tri-value state to
track an extra "auto" for the query ID computation, as proposed by
Alvaro here:
/messages/by-id/20210426174331.GA19401@alvherre.pgsql

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#34

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Michael Paquier (#32)

1 attachment(s)

Re: compute_query_id and pg_stat_statements

On Tue, May 11, 2021 at 03:04:13PM +0900, Michael Paquier wrote:

On Tue, Apr 27, 2021 at 02:25:04PM +0800, Julien Rouhaud wrote:

On Mon, Apr 26, 2021 at 11:37:45AM -0700, Andres Freund wrote:

On 2021-04-26 14:21:00 -0400, Tom Lane wrote:

That's sounding like a pretty sane design, actually. Not sure about
the shared-library-name-with-fixed-function-name detail, but certainly
it seems to be useful to separate "I need a query-id" from the details
of the ID calculation.

Rather than a GUC per se for the ID provider, maybe we could have a
function hook that defaults to pointing at the in-core computation,
and then a module wanting to override that just gets into the hook.

I have a preference to determining the provider via GUC instead of a
hook because it is both easier to introspect and easier to configure.

So, this thread has died two weeks ago, and it is still an open item.
Could it be possible to move to a resolution by beta1? The consensus
I can get from the thread is that we should have a tri-value state to
track an extra "auto" for the query ID computation, as proposed by
Alvaro here:
/messages/by-id/20210426174331.GA19401@alvherre.pgsql

Unfortunately, nothing has happened to be able to do something like
that.

My understanding was that there wasn't a consensus on how to fix the problem.

Anyway, PFA a patch that implement a [off | on | auto] compute_query_id, and
provides a new queryIdWanted() function to let third party plugins inform us
that they want a query id if possible.

As it was noted somewhere in that thread, that's a hack on top on the GUC
machinery, so compute_query_id will display "on" rather than "auto" (or "auto
and enabled" or whatever) since GUC isn't designed to handle that behavior.

For the record I also tested the patch using pg_qualstats(), which can be
loaded interactively and also benefits from a query identifier. It works as
expected, as in "query idenfitier are enabled but only for the backend that
loaded pg_qualstats".

Attachments:

v1-0001-Change-compute_query_id-to-an-enum-GUC.patchtext/x-diff; charset=us-asciiDownload

From 9f1019c8eccdd63275029c2d861fd0fcac71f823 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Tue, 11 May 2021 15:20:45 +0800
Subject: [PATCH v1] Change compute_query_id to an enum GUC.

The current approach that requires to explicitly enable compute_query_id on top
of configuring pg_stat_statements in shared_preload_libraries has been proven
to be at best unfriendly, so switch the GUC to an enum defaulting to "auto",
which allows third-party plugins to enable in-core query identifier calculation
if they require one.

A new queryIdWanted() function is provided to let plugins inform us that
require a query identifier, and which will enable query identifier calculation
iff it's set to "auto".

Note that if that function is called during postmaster startup (typically
during process_shared_preload_libraries()) then it will be enabled globally,
otherwise it will only be enabled in the session(s) that loaded a plugin
calling queryIdWanted().

Author: Julien Rouhaud
Discussion: https://postgr.es/m/35457b09-36f8-add3-1d07-6034fa585ca8@oss.nttdata.com
---
 .../pg_stat_statements/pg_stat_statements.c   |  6 +++
 .../pg_stat_statements.conf                   |  1 -
 doc/src/sgml/config.sgml                      |  9 ++++-
 doc/src/sgml/pgstatstatements.sgml            | 13 +++++--
 src/backend/commands/explain.c                |  2 +-
 src/backend/parser/analyze.c                  |  4 +-
 src/backend/tcop/postgres.c                   |  2 +-
 src/backend/utils/misc/guc.c                  | 38 ++++++++++++++-----
 src/backend/utils/misc/postgresql.conf.sample |  2 +-
 src/backend/utils/misc/queryjumble.c          | 15 ++++++++
 src/include/utils/guc.h                       |  1 -
 src/include/utils/queryjumble.h               | 12 ++++++
 12 files changed, 82 insertions(+), 23 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index f42f07622e..f64ec56bf8 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -369,6 +369,12 @@ _PG_init(void)
 	if (!process_shared_preload_libraries_in_progress)
 		return;
 
+	/*
+	 * Informat the postmaster that we want to enable query_id calculation if
+	 * compute_query_id is set to auto.
+	 */
+	queryIdWanted();
+
 	/*
 	 * Define (or redefine) custom GUC variables.
 	 */
diff --git a/contrib/pg_stat_statements/pg_stat_statements.conf b/contrib/pg_stat_statements/pg_stat_statements.conf
index e47b26040f..13346e2807 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.conf
+++ b/contrib/pg_stat_statements/pg_stat_statements.conf
@@ -1,2 +1 @@
 shared_preload_libraries = 'pg_stat_statements'
-compute_query_id = on
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 45bd1f1b7e..60d8b24f5e 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7627,7 +7627,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
      <variablelist>
 
      <varlistentry id="guc-compute-query-id" xreflabel="compute_query_id">
-      <term><varname>compute_query_id</varname> (<type>boolean</type>)
+      <term><varname>compute_query_id</varname> (<type>enum</type>)
       <indexterm>
        <primary><varname>compute_query_id</varname> configuration parameter</primary>
       </indexterm>
@@ -7643,7 +7643,12 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
         identifier to be computed.  Note that an external module can
         alternatively be used if the in-core query identifier computation
         method is not acceptable.  In this case, in-core computation
-        must be disabled.  The default is <literal>off</literal>.
+        must be always disabled.
+        Valid values are <literal>off</literal> (always disabled),
+        <literal>on</literal> (always enabled) and <literal>auto</literal>,
+        which let modules such as <xref linkend="pgstatstatements"/>
+        automatically enable it.
+        The default is <literal>auto</literal>.
        </para>
        <note>
         <para>
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index bc2b6038ee..acfb134797 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -22,10 +22,15 @@
 
  <para>
   The module will not track statistics unless query
-  identifiers are calculated.  This can be done by enabling <xref
-  linkend="guc-compute-query-id"/> or using a third-party module that
-  computes its own query identifiers.  Note that all statistics tracked
-  by this module must be reset if the query identifier method is changed.
+  identifiers are calculated.  This is done by automatically when this
+  extension is configured in <literal>shared_preload_libraries</literal> and
+  <xref linkend="guc-compute-query-id"/> is set to <literal>auto</literal>
+  (which is the default value), or always if <xref
+  linkend="guc-compute-query-id"/> is set to <literal>on</literal>.
+  You must set <xref linkend="guc-compute-query-id"/> to <literal>off</literal>
+  if you want to use a third-party module that computes its own query
+      identifiers.  Note that all statistics tracked by this module must be
+      reset if the query identifier method is changed.
  </para>
 
  <para>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 8ab7bca866..9d384234b0 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -245,7 +245,7 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 	es->summary = (summary_set) ? es->summary : es->analyze;
 
 	query = castNode(Query, stmt->query);
-	if (compute_query_id)
+	if (compute_query_id == COMPUTE_QUERY_ID_ON)
 		jstate = JumbleQuery(query, pstate->p_sourcetext);
 
 	if (post_parse_analyze_hook)
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index e415bc3df0..1108f4133a 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -124,7 +124,7 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 
 	query = transformTopLevelStmt(pstate, parseTree);
 
-	if (compute_query_id)
+	if (compute_query_id == COMPUTE_QUERY_ID_ON)
 		jstate = JumbleQuery(query, sourceText);
 
 	if (post_parse_analyze_hook)
@@ -163,7 +163,7 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 	/* make sure all is well with parameter types */
 	check_variable_parameters(pstate, query);
 
-	if (compute_query_id)
+	if (compute_query_id == COMPUTE_QUERY_ID_ON)
 		jstate = JumbleQuery(query, sourceText);
 
 	if (post_parse_analyze_hook)
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 2d6d145ecc..e4b76af557 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -704,7 +704,7 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 
 	query = transformTopLevelStmt(pstate, parsetree);
 
-	if (compute_query_id)
+	if (compute_query_id == COMPUTE_QUERY_ID_ON)
 		jstate = JumbleQuery(query, query_string);
 
 	if (post_parse_analyze_hook)
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 0a180341c2..835b322c29 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -101,6 +101,7 @@
 #include "utils/plancache.h"
 #include "utils/portal.h"
 #include "utils/ps_status.h"
+#include "utils/queryjumble.h"
 #include "utils/rls.h"
 #include "utils/snapmgr.h"
 #include "utils/tzparser.h"
@@ -402,6 +403,23 @@ static const struct config_enum_entry backslash_quote_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Although only "on", "off", and "auto" are documented, we accept all the
+ * likely variants of "on" and "off".
+ */
+static const struct config_enum_entry compute_query_id_options[] = {
+	{"auto", COMPUTE_QUERY_ID_AUTO, false},
+	{"on", COMPUTE_QUERY_ID_ON, false},
+	{"off", COMPUTE_QUERY_ID_OFF, false},
+	{"true", COMPUTE_QUERY_ID_ON, true},
+	{"false", COMPUTE_QUERY_ID_OFF, true},
+	{"yes", COMPUTE_QUERY_ID_ON, true},
+	{"no", COMPUTE_QUERY_ID_OFF, true},
+	{"1", COMPUTE_QUERY_ID_ON, true},
+	{"0", COMPUTE_QUERY_ID_OFF, true},
+	{NULL, 0, false}
+};
+
 /*
  * Although only "on", "off", and "partition" are documented, we
  * accept all the likely variants of "on" and "off".
@@ -534,7 +552,6 @@ extern const struct config_enum_entry dynamic_shared_memory_options[];
 /*
  * GUC option variables that are exported from this module
  */
-bool		compute_query_id = false;
 bool		log_duration = false;
 bool		Debug_print_plan = false;
 bool		Debug_print_parse = false;
@@ -1441,15 +1458,6 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
-	{
-		{"compute_query_id", PGC_SUSET, STATS_MONITORING,
-			gettext_noop("Compute query identifiers."),
-			NULL
-		},
-		&compute_query_id,
-		false,
-		NULL, NULL, NULL
-	},
 	{
 		{"log_parser_stats", PGC_SUSET, STATS_MONITORING,
 			gettext_noop("Writes parser performance statistics to the server log."),
@@ -4618,6 +4626,16 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"compute_query_id", PGC_SUSET, STATS_MONITORING,
+			gettext_noop("Compute query identifiers."),
+			NULL
+		},
+		&compute_query_id,
+		COMPUTE_QUERY_ID_AUTO, compute_query_id_options,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"constraint_exclusion", PGC_USERSET, QUERY_TUNING_OTHER,
 			gettext_noop("Enables the planner to use constraints to optimize queries."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index efde01ee56..6e36e4c2ef 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -604,7 +604,7 @@
 
 # - Monitoring -
 
-#compute_query_id = off
+#compute_query_id = auto
 #log_statement_stats = off
 #log_parser_stats = off
 #log_planner_stats = off
diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
index 1bb9fe20ea..4189fdfa53 100644
--- a/src/backend/utils/misc/queryjumble.c
+++ b/src/backend/utils/misc/queryjumble.c
@@ -39,6 +39,9 @@
 
 #define JUMBLE_SIZE				1024	/* query serialization buffer size */
 
+/* GUC parameters */
+int			compute_query_id = COMPUTE_QUERY_ID_AUTO;
+
 static uint64 compute_utility_query_id(const char *str, int query_location, int query_len);
 static void AppendJumble(JumbleState *jstate,
 						 const unsigned char *item, Size size);
@@ -131,6 +134,18 @@ JumbleQuery(Query *query, const char *querytext)
 	return jstate;
 }
 
+/*
+ * Enables compute_query_id if it's set to auto.
+ *
+ * Third-party plugins can use that function to inform the core that they
+ * require a query identifier to be computed.
+ */
+void queryIdWanted(void)
+{
+	if (compute_query_id == COMPUTE_QUERY_ID_AUTO)
+		compute_query_id = COMPUTE_QUERY_ID_ON;
+}
+
 /*
  * Compute a query identifier for the given utility query string.
  */
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 24a5d9d3fb..a7c3a4958e 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -247,7 +247,6 @@ extern bool log_btree_build_stats;
 extern PGDLLIMPORT bool check_function_bodies;
 extern bool session_auth_is_superuser;
 
-extern bool compute_query_id;
 extern bool log_duration;
 extern int	log_parameter_max_length;
 extern int	log_parameter_max_length_on_error;
diff --git a/src/include/utils/queryjumble.h b/src/include/utils/queryjumble.h
index 83ba7339fa..578ed50a7e 100644
--- a/src/include/utils/queryjumble.h
+++ b/src/include/utils/queryjumble.h
@@ -52,7 +52,19 @@ typedef struct JumbleState
 	int			highest_extern_param_id;
 } JumbleState;
 
+/* Values for the compute_query_id GUC */
+typedef enum
+{
+	COMPUTE_QUERY_ID_OFF,
+	COMPUTE_QUERY_ID_ON,
+	COMPUTE_QUERY_ID_AUTO
+}			ComputeQueryIdType;
+
+/* GUC parameters */
+extern int compute_query_id;
+
 const char *CleanQuerytext(const char *query, int *location, int *len);
 JumbleState *JumbleQuery(Query *query, const char *querytext);
+void queryIdWanted(void);
 
 #endif							/* QUERYJUMBLE_H */
-- 
2.30.1

#35

Magnus Hagander

magnus@hagander.net

over 4 years ago

In reply to: Michael Paquier (#32)

Re: compute_query_id and pg_stat_statements

On Tue, May 11, 2021 at 8:04 AM Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Apr 27, 2021 at 02:25:04PM +0800, Julien Rouhaud wrote:

On Mon, Apr 26, 2021 at 11:37:45AM -0700, Andres Freund wrote:

On 2021-04-26 14:21:00 -0400, Tom Lane wrote:

That's sounding like a pretty sane design, actually. Not sure about
the shared-library-name-with-fixed-function-name detail, but certainly
it seems to be useful to separate "I need a query-id" from the details
of the ID calculation.

Rather than a GUC per se for the ID provider, maybe we could have a
function hook that defaults to pointing at the in-core computation,
and then a module wanting to override that just gets into the hook.

I have a preference to determining the provider via GUC instead of a
hook because it is both easier to introspect and easier to configure.

So, this thread has died two weeks ago, and it is still an open item.
Could it be possible to move to a resolution by beta1? The consensus
I can get from the thread is that we should have a tri-value state to
track an extra "auto" for the query ID computation, as proposed by
Alvaro here:
/messages/by-id/20210426174331.GA19401@alvherre.pgsql

Technically I think that was my suggestion from earlier in that thread
that Alvaro just +1ed :)

That said, I sort of put that one aside when both Bruce and Tom
considered it "a pretty weird API" to quote Bruce. I had missed the
fact that Tom changed his mind (maybe when picking up on more of the
details).

And FTR, I still think this is the best way forward.

I think Andres also raised a good point about the ability to actually
know which one is in use.

Even if we keep the current way of *setting* the hook, I think it
might be worthwhile to expose a PGC_INTERNAL guc that shows *which*
implementation is actually in use?

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#36

Magnus Hagander

magnus@hagander.net

over 4 years ago

In reply to: Julien Rouhaud (#34)

Re: compute_query_id and pg_stat_statements

On Tue, May 11, 2021 at 9:35 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Tue, May 11, 2021 at 03:04:13PM +0900, Michael Paquier wrote:

On Tue, Apr 27, 2021 at 02:25:04PM +0800, Julien Rouhaud wrote:

On Mon, Apr 26, 2021 at 11:37:45AM -0700, Andres Freund wrote:

On 2021-04-26 14:21:00 -0400, Tom Lane wrote:

That's sounding like a pretty sane design, actually. Not sure about
the shared-library-name-with-fixed-function-name detail, but certainly
it seems to be useful to separate "I need a query-id" from the details
of the ID calculation.

Rather than a GUC per se for the ID provider, maybe we could have a
function hook that defaults to pointing at the in-core computation,
and then a module wanting to override that just gets into the hook.

I have a preference to determining the provider via GUC instead of a
hook because it is both easier to introspect and easier to configure.

So, this thread has died two weeks ago, and it is still an open item.
Could it be possible to move to a resolution by beta1? The consensus
I can get from the thread is that we should have a tri-value state to
track an extra "auto" for the query ID computation, as proposed by
Alvaro here:
/messages/by-id/20210426174331.GA19401@alvherre.pgsql

Unfortunately, nothing has happened to be able to do something like
that.

My understanding was that there wasn't a consensus on how to fix the problem.

Anyway, PFA a patch that implement a [off | on | auto] compute_query_id, and
provides a new queryIdWanted() function to let third party plugins inform us
that they want a query id if possible.

30 second review -- wouldn't it be cleaner to keep a separate boolean
telling the backend "include it or not", which is set to true/false in
the guc assign hook and can then be flipped from false->true in
queryIdWanted()? (I'd suggest a more verbose name for that function
btw, something like requestQueryIdGeneration() or so).

(Again, just the 30 second review between meetings, so maybe I'm completely off)

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#37

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Magnus Hagander (#36)

Re: compute_query_id and pg_stat_statements

On Tue, May 11, 2021 at 09:43:25AM +0200, Magnus Hagander wrote:

30 second review -- wouldn't it be cleaner to keep a separate boolean
telling the backend "include it or not", which is set to true/false in
the guc assign hook and can then be flipped from false->true in
queryIdWanted()? (I'd suggest a more verbose name for that function
btw, something like requestQueryIdGeneration() or so).

(Again, just the 30 second review between meetings, so maybe I'm completely off)

It it surely would, but then that variable would need to be explicitly handled
as it wouldn't be automatically inherited on Windows and EXEC_BACKEND right?

#38

Fujii Masao

masao.fujii@oss.nttdata.com

over 4 years ago

In reply to: Julien Rouhaud (#34)

Re: compute_query_id and pg_stat_statements

On 2021/05/11 16:35, Julien Rouhaud wrote:

Anyway, PFA a patch that implement a [off | on | auto] compute_query_id, and
provides a new queryIdWanted() function to let third party plugins inform us
that they want a query id if possible.

Thanks!

As it was noted somewhere in that thread, that's a hack on top on the GUC
machinery, so compute_query_id will display "on" rather than "auto" (or "auto
and enabled" or whatever) since GUC isn't designed to handle that behavior.

Can't we work around this issue by making queryIdWanted() set another flag like query_id_wanted instead of overwriting compute_query_id? If we do this, query id computation is necessary when "compute_query_id == COMPUTE_QUERY_ID_ON || (compute_query_id == COMPUTE_QUERY_ID_AUTO && query_id_wanted)".

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#39

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Fujii Masao (#38)

Re: compute_query_id and pg_stat_statements

On Tue, May 11, 2021 at 05:41:53PM +0900, Fujii Masao wrote:

On 2021/05/11 16:35, Julien Rouhaud wrote:

Anyway, PFA a patch that implement a [off | on | auto] compute_query_id, and
provides a new queryIdWanted() function to let third party plugins inform us
that they want a query id if possible.

Thanks!

As it was noted somewhere in that thread, that's a hack on top on the GUC
machinery, so compute_query_id will display "on" rather than "auto" (or "auto
and enabled" or whatever) since GUC isn't designed to handle that behavior.

Can't we work around this issue by making queryIdWanted() set another flag like query_id_wanted instead of overwriting compute_query_id? If we do this, query id computation is necessary when "compute_query_id == COMPUTE_QUERY_ID_ON || (compute_query_id == COMPUTE_QUERY_ID_AUTO && query_id_wanted)".

That's exactly what Magnus mentioned :) It's not possible because variable
aren't inherited on Windows or EXEC_BACKEND. I didn't check but I'm
assuming that it could work if the other flag was an internal GUC that couldn't
be set by users, but then we would have some kind of internal flag that would
have to be documented as "how to check if compute_query_id" is actually enabled
or not, which doesn't seem like a good idea.

Another approach would be to add a new "auto (enabled)" option to the enum, and
prevent users from manually setting the guc to that value. It's not perfect
but maybe it would be cleaner.

Overall it seems that we don't have a clear consensus on how exactly to address
the problem, which is why I originally didn't sent a patch.

#40

Magnus Hagander

magnus@hagander.net

over 4 years ago

In reply to: Julien Rouhaud (#39)

Re: compute_query_id and pg_stat_statements

On Tue, May 11, 2021 at 10:51 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Tue, May 11, 2021 at 05:41:53PM +0900, Fujii Masao wrote:

On 2021/05/11 16:35, Julien Rouhaud wrote:

Anyway, PFA a patch that implement a [off | on | auto] compute_query_id, and
provides a new queryIdWanted() function to let third party plugins inform us
that they want a query id if possible.

Thanks!

As it was noted somewhere in that thread, that's a hack on top on the GUC
machinery, so compute_query_id will display "on" rather than "auto" (or "auto
and enabled" or whatever) since GUC isn't designed to handle that behavior.

Can't we work around this issue by making queryIdWanted() set another flag like query_id_wanted instead of overwriting compute_query_id? If we do this, query id computation is necessary when "compute_query_id == COMPUTE_QUERY_ID_ON || (compute_query_id == COMPUTE_QUERY_ID_AUTO && query_id_wanted)".

That's exactly what Magnus mentioned :) It's not possible because variable
aren't inherited on Windows or EXEC_BACKEND. I didn't check but I'm
assuming that it could work if the other flag was an internal GUC that couldn't
be set by users, but then we would have some kind of internal flag that would
have to be documented as "how to check if compute_query_id" is actually enabled
or not, which doesn't seem like a good idea.

That doesn't fundamentally make it impossible, you just have to add it
to the list of variables being copied over, wouldn't you? See
save_backend_variables()

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#41

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Magnus Hagander (#40)

Re: compute_query_id and pg_stat_statements

On Tue, May 11, 2021 at 10:59:51AM +0200, Magnus Hagander wrote:

That doesn't fundamentally make it impossible, you just have to add it
to the list of variables being copied over, wouldn't you? See
save_backend_variables()

Yes, I agree, and that's what I meant by "explicitly handled". The thing is
that I don't know if that's the best way to go, as it doesn't solve the "is it
actually enabled" and/or "which implementation is used". At least the patch I
sent, although it's totally a hack, let you know if compute_query_id is enabled
or not. I'm fine with implementing it that way, but only if there's a
consensus.

#42

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Julien Rouhaud (#41)

Re: compute_query_id and pg_stat_statements

On Tue, May 11, 2021 at 05:41:06PM +0800, Julien Rouhaud wrote:

On Tue, May 11, 2021 at 10:59:51AM +0200, Magnus Hagander wrote:

That doesn't fundamentally make it impossible, you just have to add it
to the list of variables being copied over, wouldn't you? See
save_backend_variables()

Yes, I agree, and that's what I meant by "explicitly handled". The thing is
that I don't know if that's the best way to go, as it doesn't solve the "is it
actually enabled" and/or "which implementation is used". At least the patch I
sent, although it's totally a hack, let you know if compute_query_id is enabled
or not. I'm fine with implementing it that way, but only if there's a
consensus.

Actually, isn't that how e.g. wal_buffers = -1 is working? The original value
is lost and what you get is the computed value. This is a bit different here
as the value isn't always changed, and can be changed interactively but
otherwise it's the same behavior.

#43

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 4 years ago

In reply to: Julien Rouhaud (#42)

Re: compute_query_id and pg_stat_statements

At Tue, 11 May 2021 18:52:49 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in

On Tue, May 11, 2021 at 05:41:06PM +0800, Julien Rouhaud wrote:

On Tue, May 11, 2021 at 10:59:51AM +0200, Magnus Hagander wrote:

That doesn't fundamentally make it impossible, you just have to add it
to the list of variables being copied over, wouldn't you? See
save_backend_variables()

Yes, I agree, and that's what I meant by "explicitly handled". The thing is
that I don't know if that's the best way to go, as it doesn't solve the "is it
actually enabled" and/or "which implementation is used". At least the patch I
sent, although it's totally a hack, let you know if compute_query_id is enabled
or not. I'm fine with implementing it that way, but only if there's a
consensus.

Actually, isn't that how e.g. wal_buffers = -1 is working? The original value
is lost and what you get is the computed value. This is a bit different here
as the value isn't always changed, and can be changed interactively but
otherwise it's the same behavior.

If we look it in pg_settings, it shows the current value and the value
at boot-time. So I'm fine with that behavior.

However, IMHO, I doubt the necessity of "on". Assuming that we require
any module that wants query-id to call queryIdWanted() at any time
after each process startup (or it could be inherited to children), I
think only "auto" and "off" are enough for the variable. Thinking in
this line, the variable is a subset of a GUC variable to specify the
name of a query-id provider (as Andres suggested upthread), and I
think it would work better in future.

So for now I propose that we have a variable query_id_provider that
has only 'default' and 'none' as the domain. We can later expand it
so that any other query-id provider modules can be loaded without
chaning the interface.

postgresql.conf
# query_id_provider = 'default' # provider module for query-id. 'none' means
# # disabling query-id calculation.

If we want to have a direct way to know whether query-id is active or
not, it'd be good to have a read-only variable "query_id_active".

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#44

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Kyotaro Horiguchi (#43)

Re: compute_query_id and pg_stat_statements

Hello Horiguchi-san,

On Wed, May 12, 2021 at 11:08:36AM +0900, Kyotaro Horiguchi wrote:

If we look it in pg_settings, it shows the current value and the value
at boot-time. So I'm fine with that behavior.

However, IMHO, I doubt the necessity of "on". Assuming that we require
any module that wants query-id to call queryIdWanted() at any time
after each process startup (or it could be inherited to children), I
think only "auto" and "off" are enough for the variable.

I don't think that this approach would cope well for people who want a queryid
without pg_stat_statements or such. Since the queryid can now be found in
pg_stat_activity, EXPLAIN output or the logs I think it's entirely reasonable
to allow users to benefit from that even if they don't install additional
module.

Thinking in
this line, the variable is a subset of a GUC variable to specify the
name of a query-id provider (as Andres suggested upthread), and I
think it would work better in future.

So for now I propose that we have a variable query_id_provider that
has only 'default' and 'none' as the domain.

I think this would be a mistake to do that, as it would mean that we don't
officially support alternative queryid provider.

We can later expand it
so that any other query-id provider modules can be loaded without
chaning the interface.

The GUC itself may not change, but third-party queryid provider would probably
need changes as the new entry point will be dedicated to compute a queryid
only, while third-party plugins may do more than that in their
post_parse_analyze_hook. And also users will have to change their
configuration to use that new interface, and additionally the module may now
have to be removed from shared_preload_libraries. Overall, it doesn't seem to
me that it would make users' life easier.

#45

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 4 years ago

In reply to: Julien Rouhaud (#44)

Re: compute_query_id and pg_stat_statements

At Wed, 12 May 2021 10:42:01 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in

Hello Horiguchi-san,

On Wed, May 12, 2021 at 11:08:36AM +0900, Kyotaro Horiguchi wrote:

If we look it in pg_settings, it shows the current value and the value
at boot-time. So I'm fine with that behavior.

However, IMHO, I doubt the necessity of "on". Assuming that we require
any module that wants query-id to call queryIdWanted() at any time
after each process startup (or it could be inherited to children), I
think only "auto" and "off" are enough for the variable.

I don't think that this approach would cope well for people who want a queryid
without pg_stat_statements or such. Since the queryid can now be found in
pg_stat_activity, EXPLAIN output or the logs I think it's entirely reasonable
to allow users to benefit from that even if they don't install additional
module.

Ah, I missed that case. And we are wanting to use pg_stat_statements
with (almost) zero-config? How about the following behavior?

Setting query_id_provider to 'none' means we don't calculate query-id
by default. However, if queryIdWante() is called, the default provider
is set up and starts calculating query id.

Setting query_id_provider to something else means the user wants
query-id calcualted using the provider. Setting 'default' is
equivalent to setting compute_query_id to 'on'.

There might be a case where a user sets query_id_provider to
non-'none' but don't want have query-id calculated, but it can be said
a kind of mis-configuration?

Thinking in
this line, the variable is a subset of a GUC variable to specify the
name of a query-id provider (as Andres suggested upthread), and I
think it would work better in future.

So for now I propose that we have a variable query_id_provider that
has only 'default' and 'none' as the domain.

I think this would be a mistake to do that, as it would mean that we don't
officially support alternative queryid provider.

Ok, if we want to support alternative providers from the first, we
need to actually write the loader code for query-id providers. It
would not be so hard?, but it might not be suitable to this stage so I
proposed that to get rid of needing such complexity for now.

(Anyway I prefer to load query-id provider as a dynamically loadable
module rather than hook-function.)

We can later expand it
so that any other query-id provider modules can be loaded without
chaning the interface.

The GUC itself may not change, but third-party queryid provider would probably
need changes as the new entry point will be dedicated to compute a queryid
only, while third-party plugins may do more than that in their
post_parse_analyze_hook. And also users will have to change their

I don't think it is not that a problem. Even if any third-party
extension is having query-id generator by itself, in most cases it
would be a copy of JumbleQuery in case of pg_stat_statement is not
loaded and now it is moved in-core as 'default' provider. What the
exntension needs to be done is just ripping out the copied generator
code. I guess...

configuration to use that new interface, and additionally the module may now
have to be removed from shared_preload_libraries. Overall, it doesn't seem to
me that it would make users' life easier.

Why the third-party module need to be removed from
shared_preload_libraries? The module can stay as a preloaded shared
library but just no longer need to have its own query-id provider
since it is provided in-core. If the extension required a specific
provider, the developer need to make it a loadable module and users
need to specify the provider module explicitly. I don't think that is
not a problem but if we wanted to make it easier, we can let users
free from that step by allowing 'auto' for query-id-provider to load
any module by the first extension.

So, for example, how about the following interface?

GUC query_id_provider:

- 'none' : query_id is not calculated, don't allow loading external
generator module.

- 'default' : use default provider and calculate query-id.

- '<provider-name>' : use the provider and calculate query-id using it.

- 'auto' : query_id is not calculated, but allow to load query-id
provider if queryIdWanted() is called.

# of course 'auto' and 'default' are inhibited as the provier name.

- core function bool queryIdWanted(char *provider_name, bool use_existing)

Allows extensions to request to load a provider if not yet, then
start calculating query-id. Returns true if the request is accepted.

provider_name :

- 'default' or '<provider-name>': requests the provider to be loaded
and start calculating query-id. Refuse the request if 'none' is
set to query_id_provider.

use_existing: Set true to allow using a provider already loaded.
Otherwise refuses the request if any other provider than
prvoder_name is already loaded.

In most cases users set query_id_provider to 'auto' and extensions
call queryIdWanted with ('default', true).

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#46

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 4 years ago

In reply to: Kyotaro Horiguchi (#45)

Re: compute_query_id and pg_stat_statements

At Wed, 12 May 2021 14:33:35 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

Ok, if we want to support alternative providers from the first, we
need to actually write the loader code for query-id providers. It
would not be so hard?, but it might not be suitable to this stage so I
proposed that to get rid of needing such complexity for now.

(Anyway I prefer to load query-id provider as a dynamically loadable
module rather than hook-function.)

...

So, for example, how about the following interface?

GUC query_id_provider:

- 'none' : query_id is not calculated, don't allow loading external
generator module.

- 'default' : use default provider and calculate query-id.

- '<provider-name>' : use the provider and calculate query-id using it.

- 'auto' : query_id is not calculated, but allow to load query-id
provider if queryIdWanted() is called.

# of course 'auto' and 'default' are inhibited as the provier name.

- core function bool queryIdWanted(char *provider_name, bool use_existing)

Allows extensions to request to load a provider if not yet, then
start calculating query-id. Returns true if the request is accepted.

provider_name :

- 'default' or '<provider-name>': requests the provider to be loaded
and start calculating query-id. Refuse the request if 'none' is
set to query_id_provider.

use_existing: Set true to allow using a provider already loaded.
Otherwise refuses the request if any other provider than
prvoder_name is already loaded.

In most cases users set query_id_provider to 'auto' and extensions
call queryIdWanted with ('default', true).

Hmm. They are in contradiction. Based on this future picture, at this
stage it can be simplified to allowing only 'default' as the provider
name.

If you want to support any other provider at this point,,, we need to
imlement the full-spec?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#47

Pavel Stehule

pavel.stehule@gmail.com

over 4 years ago

In reply to: Kyotaro Horiguchi (#45)

Re: compute_query_id and pg_stat_statements

Ah, I missed that case. And we are wanting to use pg_stat_statements
with (almost) zero-config? How about the following behavior?

Until now, the pg_stat_statements was zero-config. So the change is not
user friendly.

The idea so pg_stat_statements requires enabled computed_query_id is not
good. There should be dependency only on the queryid column.

Regards

Pavel

#48

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Kyotaro Horiguchi (#45)

Re: compute_query_id and pg_stat_statements

On Wed, May 12, 2021 at 02:33:35PM +0900, Kyotaro Horiguchi wrote:

At Wed, 12 May 2021 10:42:01 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in

I don't think that this approach would cope well for people who want a queryid
without pg_stat_statements or such. Since the queryid can now be found in
pg_stat_activity, EXPLAIN output or the logs I think it's entirely reasonable
to allow users to benefit from that even if they don't install additional
module.

Ah, I missed that case. And we are wanting to use pg_stat_statements
with (almost) zero-config? How about the following behavior?

Setting query_id_provider to 'none' means we don't calculate query-id
by default. However, if queryIdWante() is called, the default provider
is set up and starts calculating query id.

Having "none" meant "not unless someone asks for it" looks like a POLA
violation.

Setting query_id_provider to something else means the user wants
query-id calcualted using the provider. Setting 'default' is
equivalent to setting compute_query_id to 'on'.

There might be a case where a user sets query_id_provider to
non-'none' but don't want have query-id calculated, but it can be said
a kind of mis-configuration?

So if I'm understanding correctly, you're arguing for an approach different to
what Michael stated as the general consensus in [1]/messages/by-id/YJoeXcrwe1EDmqKT@paquier.xyz. I'm not saying that I
think it's a bad idea (and I actually suggested it before), but we have to
chose an approach and stick with it.

I think this would be a mistake to do that, as it would mean that we don't
officially support alternative queryid provider.

Ok, if we want to support alternative providers from the first, we
need to actually write the loader code for query-id providers. It
would not be so hard?, but it might not be suitable to this stage so I
proposed that to get rid of needing such complexity for now.

I did write a POC extension [2]https://github.com/rjuju/pg_queryid to demonstrate that moving pg_stat_statement's
queryid calculation in core doesn't mean that we're imposing it to everyone.
And yes this is critical and a must have in the initial implementation.

(Anyway I prefer to load query-id provider as a dynamically loadable
module rather than hook-function.)

I agree that having a specific API (I'm fine with a hook or a dynamically
loaded function) for that would be better, but it doesn't appear to be the
opinion of the majority.

The GUC itself may not change, but third-party queryid provider would probably
need changes as the new entry point will be dedicated to compute a queryid
only, while third-party plugins may do more than that in their
post_parse_analyze_hook. And also users will have to change their

I don't think it is not that a problem.

Did you mean "I don't think that it's a problem"? Otherwise I don't get it.

Even if any third-party
extension is having query-id generator by itself, in most cases it
would be a copy of JumbleQuery in case of pg_stat_statement is not
loaded and now it is moved in-core as 'default' provider. What the
exntension needs to be done is just ripping out the copied generator
code. I guess...

I don't fully understand, but it seems that you're arguing that the only use
case is to have something similar to pg_stat_statements (say e.g.
pg_store_plans), that always have the same queryid implementation as
pg_stat_statements. That's not the case, as there already are "clones" of
pg_stat_statements, and the main difference is an alternative queryid
implementation. So in that case what author would do is to drop everything
*except* the queryid implementation.

And if I'm not mistaken, pg_store_plans also wants a different queryid
implementation, but has to handle a secondary queryid on top of it
(https://github.com/ossc-db/pg_store_plans/blob/master/pg_store_plans.c#L843-L855).
So here again what the extension want is to get rid of pg_stat_statements (and
now core) queryid implementation.

configuration to use that new interface, and additionally the module may now
have to be removed from shared_preload_libraries. Overall, it doesn't seem to
me that it would make users' life easier.

Why the third-party module need to be removed from
shared_preload_libraries? The module can stay as a preloaded shared
library but just no longer need to have its own query-id provider
since it is provided in-core. If the extension required a specific
provider, the developer need to make it a loadable module and users
need to specify the provider module explicitly.

It's the same misunderstanding here. Basically people want to benefit from the
whole ecosystem based on a queryid (pg_stat_statements, now
pg_stat_activity.query_id and such) but with another definition of what a
queryid is. So those people will now only need to implement something like
[2]: https://github.com/rjuju/pg_queryid

[1]: /messages/by-id/YJoeXcrwe1EDmqKT@paquier.xyz
[2]: https://github.com/rjuju/pg_queryid

#49

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Pavel Stehule (#47)

Re: compute_query_id and pg_stat_statements

On Wed, May 12, 2021 at 07:49:13AM +0200, Pavel Stehule wrote:

Ah, I missed that case. And we are wanting to use pg_stat_statements
with (almost) zero-config? How about the following behavior?

Until now, the pg_stat_statements was zero-config. So the change is not
user friendly.

Apart from configuring shared_preload_libraries, but agreed.

The idea so pg_stat_statements requires enabled computed_query_id is not
good. There should be dependency only on the queryid column.

I agree that requiring to change compute_query_id when you already added
pg_stat_statements in shared_preload_libraries isn't good, and the patch I sent
yesterday would fix that.

#50

Pavel Stehule

pavel.stehule@gmail.com

over 4 years ago

In reply to: Julien Rouhaud (#49)

Re: compute_query_id and pg_stat_statements

st 12. 5. 2021 v 8:10 odesílatel Julien Rouhaud <rjuju123@gmail.com> napsal:

On Wed, May 12, 2021 at 07:49:13AM +0200, Pavel Stehule wrote:

Ah, I missed that case. And we are wanting to use pg_stat_statements
with (almost) zero-config? How about the following behavior?

Until now, the pg_stat_statements was zero-config. So the change is not
user friendly.

Apart from configuring shared_preload_libraries, but agreed.

The idea so pg_stat_statements requires enabled computed_query_id is not
good. There should be dependency only on the queryid column.

I agree that requiring to change compute_query_id when you already added
pg_stat_statements in shared_preload_libraries isn't good, and the patch I
sent
yesterday would fix that.

I don't like the idea of implicit force enabling any feature flag, but it
is better than current design. But it doesn't look like a robust solution.

Does it mean that if somebody disables computed_query_id, then
pg_stat_statements will not work?

Why is there the strong dependency between computed_query_id and
pg_stat_statements? Can this dependency be just optional?

Regards

Pavel

#51

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Pavel Stehule (#50)

Re: compute_query_id and pg_stat_statements

On Wed, May 12, 2021 at 08:58:45AM +0200, Pavel Stehule wrote:

I don't like the idea of implicit force enabling any feature flag, but it
is better than current design. But it doesn't look like a robust solution.

Does it mean that if somebody disables computed_query_id, then
pg_stat_statements will not work?

It depends, but if you mean "setting up pg_stat_statements, intentionally
disabling in-core queryid calculation and not configuring an alternative
source" then yes pg_stat_statements will not work. But I don't see any
difference from "someone reduce wal_level and complain that replication does
not work" or "someone disable fsync and complain that data got corrupted". We
provide a sensible default configuration, you can mess it up if you don't know
what you're doing.

Why is there the strong dependency between computed_query_id and
pg_stat_statements? Can this dependency be just optional?

Once again no, as it otherwise would mean that postgres unilaterally decides
that pg_stat_statements' approach to compute a query identifier is the one and
only ultimate truth and nothing else could be useful for anyone.

#52

Pavel Stehule

pavel.stehule@gmail.com

over 4 years ago

In reply to: Julien Rouhaud (#51)

Re: compute_query_id and pg_stat_statements

st 12. 5. 2021 v 9:13 odesílatel Julien Rouhaud <rjuju123@gmail.com> napsal:

On Wed, May 12, 2021 at 08:58:45AM +0200, Pavel Stehule wrote:

I don't like the idea of implicit force enabling any feature flag, but it
is better than current design. But it doesn't look like a robust

solution.

Does it mean that if somebody disables computed_query_id, then
pg_stat_statements will not work?

It depends, but if you mean "setting up pg_stat_statements, intentionally
disabling in-core queryid calculation and not configuring an alternative
source" then yes pg_stat_statements will not work. But I don't see any
difference from "someone reduce wal_level and complain that replication
does
not work" or "someone disable fsync and complain that data got
corrupted". We
provide a sensible default configuration, you can mess it up if you don't
know
what you're doing.

Why is there the strong dependency between computed_query_id and
pg_stat_statements? Can this dependency be just optional?

Once again no, as it otherwise would mean that postgres unilaterally
decides
that pg_stat_statements' approach to compute a query identifier is the one
and
only ultimate truth and nothing else could be useful for anyone.

ok. Understand.

If I understand well, then computed_query_id does not make sense for
pg_stat_statemenst, because this extension always requires it.

Cannot be better to use queryid inside pg_stat_statements every time
without dependency on computed_query_id? And computed_query_id can be used
only for EXPLAIN and for pg_stat_activity.

pg_stat_statements cannot work without a queryid, so is useless to speak
about configuration. If you use pg_stat_statements, then the queryid will
be computed every time, but the visibility will be only for
pg_stat_statements.

Or a different strategy. I understand so computed_query_id should be
active. But I dislike the empty result of pg_stat_statements when
computed_query_id is off. Is it possible to raise an exception instead of
showing an empty result?

The most correct fix from my perspective is just check in function
pg_stat_statements if query id is computed or not. If not, and there is no
data, then raise an exception with the hint "enable compute_query_id". When
there is data, then show a warning with the mentioned hint and show data.

What do you think about it?

Pavel

#53

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Pavel Stehule (#52)

Re: compute_query_id and pg_stat_statements

On Wed, May 12, 2021 at 09:51:26AM +0200, Pavel Stehule wrote:

If I understand well, then computed_query_id does not make sense for
pg_stat_statemenst, because this extension always requires it.

No, pg_stat_statements requires *a* queryid, not specifially *our* queryid.

Cannot be better to use queryid inside pg_stat_statements every time
without dependency on computed_query_id? And computed_query_id can be used
only for EXPLAIN and for pg_stat_activity.

No, because then you will have a discrepancy between those two. And if you
want a different queryid approach (say based on object names rather than oid so
it survives logical replication), then you also want that queryid used for
pg_stat_statements. And that what happen is that you have to fork
pg_stat_statements to only change the queryid implementation, which is one of
the thing that the patch to move the implementation to core solves.

pg_stat_statements cannot work without a queryid, so is useless to speak
about configuration. If you use pg_stat_statements, then the queryid will
be computed every time, but the visibility will be only for
pg_stat_statements.

Yes, pg_stat_statements cannot work without a queryid, but it CAN work without
core queryid.

Or a different strategy. I understand so computed_query_id should be
active. But I dislike the empty result of pg_stat_statements when
computed_query_id is off. Is it possible to raise an exception instead of
showing an empty result?

Yes, but I don't think that it's a good idea. For instance pg_stat_statements
will behave poorly if you have to regularly evict entry. For instance: any
query touching a temporary table. One way to avoid that it to avoid storing
entries that you know are very likely to be eventually evicted.

So to fix this problem, you have 2 ways to go:

1) fix your app and explicitly disable/enable pg_stat_statements around all
those queries, and hope you won't miss any

2) write your own queryid implementation to not generate a queryid in such case.

2 seems like a reasonable scenario, and if you force pg_stat_statements to
error out in that case then you would be forced to use approach 1.

#54

Pavel Stehule

pavel.stehule@gmail.com

over 4 years ago

In reply to: Julien Rouhaud (#53)

Re: compute_query_id and pg_stat_statements

st 12. 5. 2021 v 10:14 odesílatel Julien Rouhaud <rjuju123@gmail.com>
napsal:

On Wed, May 12, 2021 at 09:51:26AM +0200, Pavel Stehule wrote:

If I understand well, then computed_query_id does not make sense for
pg_stat_statemenst, because this extension always requires it.

No, pg_stat_statements requires *a* queryid, not specifially *our* queryid.

Cannot be better to use queryid inside pg_stat_statements every time
without dependency on computed_query_id? And computed_query_id can be

used

only for EXPLAIN and for pg_stat_activity.

No, because then you will have a discrepancy between those two. And if you
want a different queryid approach (say based on object names rather than
oid so
it survives logical replication), then you also want that queryid used for
pg_stat_statements. And that what happen is that you have to fork
pg_stat_statements to only change the queryid implementation, which is one
of
the thing that the patch to move the implementation to core solves.

pg_stat_statements cannot work without a queryid, so is useless to speak
about configuration. If you use pg_stat_statements, then the queryid will
be computed every time, but the visibility will be only for
pg_stat_statements.

Yes, pg_stat_statements cannot work without a queryid, but it CAN work
without
core queryid.

Or a different strategy. I understand so computed_query_id should be
active. But I dislike the empty result of pg_stat_statements when
computed_query_id is off. Is it possible to raise an exception instead of
showing an empty result?

Yes, but I don't think that it's a good idea. For instance
pg_stat_statements
will behave poorly if you have to regularly evict entry. For instance: any
query touching a temporary table. One way to avoid that it to avoid
storing
entries that you know are very likely to be eventually evicted.

So to fix this problem, you have 2 ways to go:

1) fix your app and explicitly disable/enable pg_stat_statements around all
those queries, and hope you won't miss any

2) write your own queryid implementation to not generate a queryid in such
case.

2 seems like a reasonable scenario, and if you force pg_stat_statements to
error out in that case then you would be forced to use approach 1.

My second proposal can work for your example too. pg_stat_statements have
to require any active queryid computing. And when it is not available, then
the exception should be raised.

The custom queryid can return null, and still the queryid will be computed.
Maybe the warning can be enough. Just, if somebody use pg_stat_statements
function, then enforce the check if queryid is computed (compute_query_id
is true || some hook is not null), and if not then raise a warning.

#55

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Pavel Stehule (#54)

Re: compute_query_id and pg_stat_statements

On Wed, May 12, 2021 at 10:57:25AM +0200, Pavel Stehule wrote:

My second proposal can work for your example too. pg_stat_statements have
to require any active queryid computing. And when it is not available, then
the exception should be raised.

The custom queryid can return null, and still the queryid will be computed.
Maybe the warning can be enough. Just, if somebody use pg_stat_statements
function, then enforce the check if queryid is computed (compute_query_id
is true || some hook is not null), and if not then raise a warning.

Ah I'm sorry I misunderstood your proposal. Yes, definitely adding a warning
or an error when executing pg_stat_statements() SRF would help, that's a great
idea!

I'll wait a bit in case someone has any objection, and if not send an updated
patch!

#56

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 4 years ago

In reply to: Julien Rouhaud (#48)

Re: compute_query_id and pg_stat_statements

At Wed, 12 May 2021 14:05:16 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in

On Wed, May 12, 2021 at 02:33:35PM +0900, Kyotaro Horiguchi wrote:

At Wed, 12 May 2021 10:42:01 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in

I don't think that this approach would cope well for people who want a queryid
without pg_stat_statements or such. Since the queryid can now be found in
pg_stat_activity, EXPLAIN output or the logs I think it's entirely reasonable
to allow users to benefit from that even if they don't install additional
module.

Ah, I missed that case. And we are wanting to use pg_stat_statements
with (almost) zero-config? How about the following behavior?

Setting query_id_provider to 'none' means we don't calculate query-id
by default. However, if queryIdWante() is called, the default provider
is set up and starts calculating query id.

Having "none" meant "not unless someone asks for it" looks like a POLA
violation.

Sorry for confusion. A different behavior for "none" is proposed later
in the mail. It is just an intermediate discussion.

Setting query_id_provider to something else means the user wants
query-id calcualted using the provider. Setting 'default' is
equivalent to setting compute_query_id to 'on'.

There might be a case where a user sets query_id_provider to
non-'none' but don't want have query-id calculated, but it can be said
a kind of mis-configuration?

So if I'm understanding correctly, you're arguing for an approach different to
what Michael stated as the general consensus in [1]. I'm not saying that I
think it's a bad idea (and I actually suggested it before), but we have to
chose an approach and stick with it.

I'm not sure how much room for change of the direction is left. So it
was just a proposal. So if the majority still thinks that it is the
way to stick to controling on/off/(auto) the in-core implement and
separately allow another module to be hooked, I don't further object
to that decision.

I think this would be a mistake to do that, as it would mean that we don't
officially support alternative queryid provider.

Ok, if we want to support alternative providers from the first, we
need to actually write the loader code for query-id providers. It
would not be so hard?, but it might not be suitable to this stage so I
proposed that to get rid of needing such complexity for now.

I did write a POC extension [2] to demonstrate that moving pg_stat_statement's
queryid calculation in core doesn't mean that we're imposing it to everyone.
And yes this is critical and a must have in the initial implementation.

Ok, understood.

(Anyway I prefer to load query-id provider as a dynamically loadable
module rather than hook-function.)

I agree that having a specific API (I'm fine with a hook or a dynamically
loaded function) for that would be better, but it doesn't appear to be the
opinion of the majority.

Ugg. Ok.

The GUC itself may not change, but third-party queryid provider would probably
need changes as the new entry point will be dedicated to compute a queryid
only, while third-party plugins may do more than that in their
post_parse_analyze_hook. And also users will have to change their

I don't think it is not that a problem.

Did you mean "I don't think that it's a problem"? Otherwise I don't get it.

Yes, you're right. Sorry for the typo.

Even if any third-party
extension is having query-id generator by itself, in most cases it
would be a copy of JumbleQuery in case of pg_stat_statement is not
loaded and now it is moved in-core as 'default' provider. What the
exntension needs to be done is just ripping out the copied generator
code. I guess...

I don't fully understand, but it seems that you're arguing that the only use
case is to have something similar to pg_stat_statements (say e.g.
pg_store_plans), that always have the same queryid implementation as
pg_stat_statements. That's not the case, as there already are "clones" of
pg_stat_statements, and the main difference is an alternative queryid
implementation. So in that case what author would do is to drop everything
*except* the queryid implementation.

And if I'm not mistaken, pg_store_plans also wants a different queryid
implementation, but has to handle a secondary queryid on top of it
(https://github.com/ossc-db/pg_store_plans/blob/master/pg_store_plans.c#L843-L855).

Yeah, the extension intended to be used joining with the
pg_stat_statements view. And the reason for the second query-id dates
back to the era when query id was not available in the
pg_stat_statements view. Now it is mere a fall-back query id when
pg_stat_statments is not active. Now that the in-core query-id is
available, I think there's no reason to keep that implement.

So here again what the extension want is to get rid of pg_stat_statements (and
now core) queryid implementation.

So the extension might be a good reason for the discussion^^;

configuration to use that new interface, and additionally the module may now
have to be removed from shared_preload_libraries. Overall, it doesn't seem to
me that it would make users' life easier.

Why the third-party module need to be removed from
shared_preload_libraries? The module can stay as a preloaded shared
library but just no longer need to have its own query-id provider
since it is provided in-core. If the extension required a specific
provider, the developer need to make it a loadable module and users
need to specify the provider module explicitly.

It's the same misunderstanding here. Basically people want to benefit from the
whole ecosystem based on a queryid (pg_stat_statements, now
pg_stat_activity.query_id and such) but with another definition of what a
queryid is. So those people will now only need to implement something like
[2], rather than forking every single extension they want to use.

Hmm. I'm not sure the [2] gives sufficient reason for leaving the
current interface. But will follow if it is sitll the consensus. (And
it seems like true.)

[1]: /messages/by-id/YJoeXcrwe1EDmqKT@paquier.xyz
[2]: https://github.com/rjuju/pg_queryid

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#57

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 4 years ago

In reply to: Julien Rouhaud (#55)

Re: compute_query_id and pg_stat_statements

At Wed, 12 May 2021 17:30:26 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in

On Wed, May 12, 2021 at 10:57:25AM +0200, Pavel Stehule wrote:

My second proposal can work for your example too. pg_stat_statements have
to require any active queryid computing. And when it is not available, then
the exception should be raised.

The custom queryid can return null, and still the queryid will be computed.
Maybe the warning can be enough. Just, if somebody use pg_stat_statements
function, then enforce the check if queryid is computed (compute_query_id
is true || some hook is not null), and if not then raise a warning.

Ah I'm sorry I misunderstood your proposal. Yes, definitely adding a warning
or an error when executing pg_stat_statements() SRF would help, that's a great
idea!

I'll wait a bit in case someone has any objection, and if not send an updated
patch!

Isn't there a case where pg_stat_statements uses an alternative
query-id provider?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#58

Pavel Stehule

pavel.stehule@gmail.com

over 4 years ago

In reply to: Kyotaro Horiguchi (#57)

Re: compute_query_id and pg_stat_statements

st 12. 5. 2021 v 11:39 odesílatel Kyotaro Horiguchi <horikyota.ntt@gmail.com>
napsal:

At Wed, 12 May 2021 17:30:26 +0800, Julien Rouhaud <rjuju123@gmail.com>
wrote in

On Wed, May 12, 2021 at 10:57:25AM +0200, Pavel Stehule wrote:

My second proposal can work for your example too. pg_stat_statements

have

to require any active queryid computing. And when it is not available,

then

the exception should be raised.

The custom queryid can return null, and still the queryid will be

computed.

Maybe the warning can be enough. Just, if somebody use

pg_stat_statements

function, then enforce the check if queryid is computed

(compute_query_id

is true || some hook is not null), and if not then raise a warning.

Ah I'm sorry I misunderstood your proposal. Yes, definitely adding a

warning

or an error when executing pg_stat_statements() SRF would help, that's a

great

idea!

I'll wait a bit in case someone has any objection, and if not send an

updated

patch!

Isn't there a case where pg_stat_statements uses an alternative
query-id provider?

this check just can check if there is "any" query-id provider. In this
context is not important if it is buildin or external

Show quoted text

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#59

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 4 years ago

In reply to: Kyotaro Horiguchi (#57)

Re: compute_query_id and pg_stat_statements

At Wed, 12 May 2021 18:39:27 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

At Wed, 12 May 2021 17:30:26 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in

On Wed, May 12, 2021 at 10:57:25AM +0200, Pavel Stehule wrote:

My second proposal can work for your example too. pg_stat_statements have
to require any active queryid computing. And when it is not available, then
the exception should be raised.

The custom queryid can return null, and still the queryid will be computed.
Maybe the warning can be enough. Just, if somebody use pg_stat_statements
function, then enforce the check if queryid is computed (compute_query_id
is true || some hook is not null), and if not then raise a warning.

Ah I'm sorry I misunderstood your proposal. Yes, definitely adding a warning
or an error when executing pg_stat_statements() SRF would help, that's a great
idea!

I'll wait a bit in case someone has any objection, and if not send an updated
patch!

Isn't there a case where pg_stat_statements uses an alternative
query-id provider?

I don't object that if we allow false non-error when an extension that
uses the hooks but doesn't compute a query id.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#60

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Pavel Stehule (#58)

Re: compute_query_id and pg_stat_statements

On Wed, May 12, 2021 at 11:42:12AM +0200, Pavel Stehule wrote:

st 12. 5. 2021 v 11:39 odesï¿½latel Kyotaro Horiguchi <horikyota.ntt@gmail.com>
napsal:

At Wed, 12 May 2021 17:30:26 +0800, Julien Rouhaud <rjuju123@gmail.com>
wrote in

On Wed, May 12, 2021 at 10:57:25AM +0200, Pavel Stehule wrote:

My second proposal can work for your example too. pg_stat_statements

have

to require any active queryid computing. And when it is not available,

then

the exception should be raised.

The custom queryid can return null, and still the queryid will be

computed.

Maybe the warning can be enough. Just, if somebody use

pg_stat_statements

function, then enforce the check if queryid is computed

(compute_query_id

is true || some hook is not null), and if not then raise a warning.

Ah I'm sorry I misunderstood your proposal. Yes, definitely adding a

warning

or an error when executing pg_stat_statements() SRF would help, that's a

great

idea!

I'll wait a bit in case someone has any objection, and if not send an

updated

patch!

Isn't there a case where pg_stat_statements uses an alternative
query-id provider?

this check just can check if there is "any" query-id provider. In this
context is not important if it is buildin or external

Yes, the idea is that if you execute "SELECT * FROM pg_stat_statements" or
similar, then if the executing query itself doesn't have a queryid then
it's very likely that you didn't configure compute_query_id or an alternative
query_id implementation properly. And loudly complaining seems like the right
thing to do.

#61

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Kyotaro Horiguchi (#56)

Re: compute_query_id and pg_stat_statements

On Wed, May 12, 2021 at 06:37:24PM +0900, Kyotaro Horiguchi wrote:

At Wed, 12 May 2021 14:05:16 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in

And if I'm not mistaken, pg_store_plans also wants a different queryid
implementation, but has to handle a secondary queryid on top of it
(https://github.com/ossc-db/pg_store_plans/blob/master/pg_store_plans.c#L843-L855).

Yeah, the extension intended to be used joining with the
pg_stat_statements view. And the reason for the second query-id dates
back to the era when query id was not available in the
pg_stat_statements view. Now it is mere a fall-back query id when
pg_stat_statments is not active. Now that the in-core query-id is
available, I think there's no reason to keep that implement.

So here again what the extension want is to get rid of pg_stat_statements (and
now core) queryid implementation.

So the extension might be a good reason for the discussion^^;

Indeed. So IIUC, what pg_store_plans wants is:

- to use its own query_id implementation
- to be able to be joined to pg_stat_statements

Is that correct?

If yes, it seems that starting with pg14, it can be easily achieved by:

- documenting to disable compute_query_id
- eventually error out at execution time if it's enabled
- don't call queryIdWanted()
- expose its query_id

It will then work just fine, and will be more efficient compared to what is
done today as only one queryid will be calculated.

Did I miss something?

#62

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Julien Rouhaud (#55)

1 attachment(s)

Re: compute_query_id and pg_stat_statements

On Wed, May 12, 2021 at 05:30:26PM +0800, Julien Rouhaud wrote:

On Wed, May 12, 2021 at 10:57:25AM +0200, Pavel Stehule wrote:

My second proposal can work for your example too. pg_stat_statements have
to require any active queryid computing. And when it is not available, then
the exception should be raised.

The custom queryid can return null, and still the queryid will be computed.
Maybe the warning can be enough. Just, if somebody use pg_stat_statements
function, then enforce the check if queryid is computed (compute_query_id
is true || some hook is not null), and if not then raise a warning.

Ah I'm sorry I misunderstood your proposal. Yes, definitely adding a warning
or an error when executing pg_stat_statements() SRF would help, that's a great
idea!

I'll wait a bit in case someone has any objection, and if not send an updated
patch!

Hearing no complaint, PFA a v2 implementing such a warning. Here's an
extract from the updated regression tests:

-- Check that pg_stat_statements() will complain if the configuration appears
-- to be broken.
SET compute_query_id = off;
SELECT pg_stat_statements_reset();
pg_stat_statements_reset
--------------------------

(1 row)

SELECT count(*) FROM pg_stat_statements;
WARNING: Query identifier calculation seems to be disabled
HINT: If you don't want to use a third-party module to compute query identifiers, you may want to enable compute_query_id
count
-------
0
(1 row)

I'm of course open to suggestions for some better wording.

Attachments:

v2-0001-Change-compute_query_id-to-an-enum-GUC.patchtext/x-diff; charset=us-asciiDownload

From ae0bce48f5cf1de51aec21e2b34278f780a2a04b Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Tue, 11 May 2021 15:20:45 +0800
Subject: [PATCH v2] Change compute_query_id to an enum GUC.

The current approach that requires to explicitly enable compute_query_id on top
of configuring pg_stat_statements in shared_preload_libraries has been proven
to be at best unfriendly, so switch the GUC to an enum defaulting to "auto",
which allows third-party plugins to enable in-core query identifier calculation
if they require one.

While at it, also change pg_stat_statements_internal() to raise a warning if
the configuration looks broken (as in the calling query doesn't have a
query_id, compute_query_id is explicitly disabled and there's nothing stored in
pg_stat_statements hash).

A new queryIdWanted() function is provided to let plugins inform us that
require a query identifier, and which will enable query identifier calculation
iff it's set to "auto".

Note that if that function is called during postmaster startup (typically
during process_shared_preload_libraries()) then it will be enabled globally,
otherwise it will only be enabled in the session(s) that loaded a plugin
calling queryIdWanted().

Author: Julien Rouhaud
Discussion: https://postgr.es/m/35457b09-36f8-add3-1d07-6034fa585ca8@oss.nttdata.com
---
 .../expected/pg_stat_statements.out           | 17 +++++++++
 .../pg_stat_statements/pg_stat_statements.c   | 21 ++++++++++
 .../pg_stat_statements.conf                   |  1 -
 .../sql/pg_stat_statements.sql                |  6 +++
 doc/src/sgml/config.sgml                      |  9 ++++-
 doc/src/sgml/pgstatstatements.sgml            | 13 +++++--
 src/backend/commands/explain.c                |  2 +-
 src/backend/parser/analyze.c                  |  4 +-
 src/backend/tcop/postgres.c                   |  2 +-
 src/backend/utils/misc/guc.c                  | 38 ++++++++++++++-----
 src/backend/utils/misc/postgresql.conf.sample |  2 +-
 src/backend/utils/misc/queryjumble.c          | 15 ++++++++
 src/include/utils/guc.h                       |  1 -
 src/include/utils/queryjumble.h               | 12 ++++++
 14 files changed, 120 insertions(+), 23 deletions(-)

diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index 40b5109b55..b071ffc9d5 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -1067,4 +1067,21 @@ SELECT COUNT(*) FROM pg_stat_statements WHERE query LIKE '%SELECT GROUPING%';
      2
 (1 row)
 
+-- Check that pg_stat_statements() will complain if the configuration appears
+-- to be broken.
+SET compute_query_id = off;
+SELECT pg_stat_statements_reset();
+ pg_stat_statements_reset 
+--------------------------
+ 
+(1 row)
+
+SELECT count(*) FROM pg_stat_statements;
+WARNING:  Query identifier calculation seems to be disabled
+HINT:  If you don't want to use a third-party module to compute query identifiers, you may want to enable compute_query_id
+ count 
+-------
+     0
+(1 row)
+
 DROP EXTENSION pg_stat_statements;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index f42f07622e..8f5285e09d 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -369,6 +369,12 @@ _PG_init(void)
 	if (!process_shared_preload_libraries_in_progress)
 		return;
 
+	/*
+	 * Informat the postmaster that we want to enable query_id calculation if
+	 * compute_query_id is set to auto.
+	 */
+	queryIdWanted();
+
 	/*
 	 * Define (or redefine) custom GUC variables.
 	 */
@@ -1616,6 +1622,21 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
 	 */
 	LWLockAcquire(pgss->lock, LW_SHARED);
 
+	/*
+	 * If no query_id has been computed for the calling query and there is
+	 * no entries stored, then there's likely a configuration error that caller
+	 * may not be aware of so point it out.
+	 */
+	if (pgstat_get_my_query_id() == UINT64CONST(0) &&
+		compute_query_id == COMPUTE_QUERY_ID_OFF &&
+		hash_get_num_entries(pgss_hash) == 0)
+		ereport(WARNING,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				errmsg("Query identifier calculation seems to be disabled"),
+				errhint("If you don't want to use a third-party module to"
+					" compute query identifiers, you may want to enable"
+					" compute_query_id")));
+
 	if (showtext)
 	{
 		/*
diff --git a/contrib/pg_stat_statements/pg_stat_statements.conf b/contrib/pg_stat_statements/pg_stat_statements.conf
index e47b26040f..13346e2807 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.conf
+++ b/contrib/pg_stat_statements/pg_stat_statements.conf
@@ -1,2 +1 @@
 shared_preload_libraries = 'pg_stat_statements'
-compute_query_id = on
diff --git a/contrib/pg_stat_statements/sql/pg_stat_statements.sql b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
index bc3b6493e6..827a8e3d18 100644
--- a/contrib/pg_stat_statements/sql/pg_stat_statements.sql
+++ b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
@@ -437,4 +437,10 @@ SELECT (
 
 SELECT COUNT(*) FROM pg_stat_statements WHERE query LIKE '%SELECT GROUPING%';
 
+-- Check that pg_stat_statements() will complain if the configuration appears
+-- to be broken.
+SET compute_query_id = off;
+SELECT pg_stat_statements_reset();
+SELECT count(*) FROM pg_stat_statements;
+
 DROP EXTENSION pg_stat_statements;
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 45bd1f1b7e..60d8b24f5e 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7627,7 +7627,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
      <variablelist>
 
      <varlistentry id="guc-compute-query-id" xreflabel="compute_query_id">
-      <term><varname>compute_query_id</varname> (<type>boolean</type>)
+      <term><varname>compute_query_id</varname> (<type>enum</type>)
       <indexterm>
        <primary><varname>compute_query_id</varname> configuration parameter</primary>
       </indexterm>
@@ -7643,7 +7643,12 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
         identifier to be computed.  Note that an external module can
         alternatively be used if the in-core query identifier computation
         method is not acceptable.  In this case, in-core computation
-        must be disabled.  The default is <literal>off</literal>.
+        must be always disabled.
+        Valid values are <literal>off</literal> (always disabled),
+        <literal>on</literal> (always enabled) and <literal>auto</literal>,
+        which let modules such as <xref linkend="pgstatstatements"/>
+        automatically enable it.
+        The default is <literal>auto</literal>.
        </para>
        <note>
         <para>
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index bc2b6038ee..acfb134797 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -22,10 +22,15 @@
 
  <para>
   The module will not track statistics unless query
-  identifiers are calculated.  This can be done by enabling <xref
-  linkend="guc-compute-query-id"/> or using a third-party module that
-  computes its own query identifiers.  Note that all statistics tracked
-  by this module must be reset if the query identifier method is changed.
+  identifiers are calculated.  This is done by automatically when this
+  extension is configured in <literal>shared_preload_libraries</literal> and
+  <xref linkend="guc-compute-query-id"/> is set to <literal>auto</literal>
+  (which is the default value), or always if <xref
+  linkend="guc-compute-query-id"/> is set to <literal>on</literal>.
+  You must set <xref linkend="guc-compute-query-id"/> to <literal>off</literal>
+  if you want to use a third-party module that computes its own query
+      identifiers.  Note that all statistics tracked by this module must be
+      reset if the query identifier method is changed.
  </para>
 
  <para>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 8ab7bca866..9d384234b0 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -245,7 +245,7 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 	es->summary = (summary_set) ? es->summary : es->analyze;
 
 	query = castNode(Query, stmt->query);
-	if (compute_query_id)
+	if (compute_query_id == COMPUTE_QUERY_ID_ON)
 		jstate = JumbleQuery(query, pstate->p_sourcetext);
 
 	if (post_parse_analyze_hook)
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index e415bc3df0..1108f4133a 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -124,7 +124,7 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 
 	query = transformTopLevelStmt(pstate, parseTree);
 
-	if (compute_query_id)
+	if (compute_query_id == COMPUTE_QUERY_ID_ON)
 		jstate = JumbleQuery(query, sourceText);
 
 	if (post_parse_analyze_hook)
@@ -163,7 +163,7 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 	/* make sure all is well with parameter types */
 	check_variable_parameters(pstate, query);
 
-	if (compute_query_id)
+	if (compute_query_id == COMPUTE_QUERY_ID_ON)
 		jstate = JumbleQuery(query, sourceText);
 
 	if (post_parse_analyze_hook)
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 2d6d145ecc..e4b76af557 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -704,7 +704,7 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 
 	query = transformTopLevelStmt(pstate, parsetree);
 
-	if (compute_query_id)
+	if (compute_query_id == COMPUTE_QUERY_ID_ON)
 		jstate = JumbleQuery(query, query_string);
 
 	if (post_parse_analyze_hook)
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 0a180341c2..835b322c29 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -101,6 +101,7 @@
 #include "utils/plancache.h"
 #include "utils/portal.h"
 #include "utils/ps_status.h"
+#include "utils/queryjumble.h"
 #include "utils/rls.h"
 #include "utils/snapmgr.h"
 #include "utils/tzparser.h"
@@ -402,6 +403,23 @@ static const struct config_enum_entry backslash_quote_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Although only "on", "off", and "auto" are documented, we accept all the
+ * likely variants of "on" and "off".
+ */
+static const struct config_enum_entry compute_query_id_options[] = {
+	{"auto", COMPUTE_QUERY_ID_AUTO, false},
+	{"on", COMPUTE_QUERY_ID_ON, false},
+	{"off", COMPUTE_QUERY_ID_OFF, false},
+	{"true", COMPUTE_QUERY_ID_ON, true},
+	{"false", COMPUTE_QUERY_ID_OFF, true},
+	{"yes", COMPUTE_QUERY_ID_ON, true},
+	{"no", COMPUTE_QUERY_ID_OFF, true},
+	{"1", COMPUTE_QUERY_ID_ON, true},
+	{"0", COMPUTE_QUERY_ID_OFF, true},
+	{NULL, 0, false}
+};
+
 /*
  * Although only "on", "off", and "partition" are documented, we
  * accept all the likely variants of "on" and "off".
@@ -534,7 +552,6 @@ extern const struct config_enum_entry dynamic_shared_memory_options[];
 /*
  * GUC option variables that are exported from this module
  */
-bool		compute_query_id = false;
 bool		log_duration = false;
 bool		Debug_print_plan = false;
 bool		Debug_print_parse = false;
@@ -1441,15 +1458,6 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
-	{
-		{"compute_query_id", PGC_SUSET, STATS_MONITORING,
-			gettext_noop("Compute query identifiers."),
-			NULL
-		},
-		&compute_query_id,
-		false,
-		NULL, NULL, NULL
-	},
 	{
 		{"log_parser_stats", PGC_SUSET, STATS_MONITORING,
 			gettext_noop("Writes parser performance statistics to the server log."),
@@ -4618,6 +4626,16 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"compute_query_id", PGC_SUSET, STATS_MONITORING,
+			gettext_noop("Compute query identifiers."),
+			NULL
+		},
+		&compute_query_id,
+		COMPUTE_QUERY_ID_AUTO, compute_query_id_options,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"constraint_exclusion", PGC_USERSET, QUERY_TUNING_OTHER,
 			gettext_noop("Enables the planner to use constraints to optimize queries."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index efde01ee56..6e36e4c2ef 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -604,7 +604,7 @@
 
 # - Monitoring -
 
-#compute_query_id = off
+#compute_query_id = auto
 #log_statement_stats = off
 #log_parser_stats = off
 #log_planner_stats = off
diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
index 1bb9fe20ea..4189fdfa53 100644
--- a/src/backend/utils/misc/queryjumble.c
+++ b/src/backend/utils/misc/queryjumble.c
@@ -39,6 +39,9 @@
 
 #define JUMBLE_SIZE				1024	/* query serialization buffer size */
 
+/* GUC parameters */
+int			compute_query_id = COMPUTE_QUERY_ID_AUTO;
+
 static uint64 compute_utility_query_id(const char *str, int query_location, int query_len);
 static void AppendJumble(JumbleState *jstate,
 						 const unsigned char *item, Size size);
@@ -131,6 +134,18 @@ JumbleQuery(Query *query, const char *querytext)
 	return jstate;
 }
 
+/*
+ * Enables compute_query_id if it's set to auto.
+ *
+ * Third-party plugins can use that function to inform the core that they
+ * require a query identifier to be computed.
+ */
+void queryIdWanted(void)
+{
+	if (compute_query_id == COMPUTE_QUERY_ID_AUTO)
+		compute_query_id = COMPUTE_QUERY_ID_ON;
+}
+
 /*
  * Compute a query identifier for the given utility query string.
  */
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 24a5d9d3fb..a7c3a4958e 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -247,7 +247,6 @@ extern bool log_btree_build_stats;
 extern PGDLLIMPORT bool check_function_bodies;
 extern bool session_auth_is_superuser;
 
-extern bool compute_query_id;
 extern bool log_duration;
 extern int	log_parameter_max_length;
 extern int	log_parameter_max_length_on_error;
diff --git a/src/include/utils/queryjumble.h b/src/include/utils/queryjumble.h
index 83ba7339fa..578ed50a7e 100644
--- a/src/include/utils/queryjumble.h
+++ b/src/include/utils/queryjumble.h
@@ -52,7 +52,19 @@ typedef struct JumbleState
 	int			highest_extern_param_id;
 } JumbleState;
 
+/* Values for the compute_query_id GUC */
+typedef enum
+{
+	COMPUTE_QUERY_ID_OFF,
+	COMPUTE_QUERY_ID_ON,
+	COMPUTE_QUERY_ID_AUTO
+}			ComputeQueryIdType;
+
+/* GUC parameters */
+extern int compute_query_id;
+
 const char *CleanQuerytext(const char *query, int *location, int *len);
 JumbleState *JumbleQuery(Query *query, const char *querytext);
+void queryIdWanted(void);
 
 #endif							/* QUERYJUMBLE_H */
-- 
2.31.1

#63

Bruce Momjian

bruce@momjian.us

over 4 years ago

In reply to: Julien Rouhaud (#60)

Re: compute_query_id and pg_stat_statements

On Wed, May 12, 2021 at 05:51:49PM +0800, Julien Rouhaud wrote:

On Wed, May 12, 2021 at 11:42:12AM +0200, Pavel Stehule wrote:

this check just can check if there is "any" query-id provider. In this
context is not important if it is buildin or external

Yes, the idea is that if you execute "SELECT * FROM pg_stat_statements" or
similar, then if the executing query itself doesn't have a queryid then
it's very likely that you didn't configure compute_query_id or an alternative
query_id implementation properly. And loudly complaining seems like the right
thing to do.

I understand the desire to make pg_stat_statements require minimal
configuration, but frankly, if the server-side variable query id API is
confusing, I think we have done more harm than good.

The problem with compute_query_id=auto is that there is no way to know
if the query id is actually enabled, unless you guess from the installed
extensions, or we add another variable to report that, and maybe another
variable to control the provier, unless we require turning
compute_query_id=off if you are using custom query id computation. What
if it is auto, and pg_stat_statments is installed, and you want to use a
custom query id computation --- what happens? As you can see, this is
all becoming very complicated.

I think we might be just as well to go with compute_query_id=on/off, and
just complain loudly from CREATE EXTENSION, or in the server logs on
server start via shared_preload_libraries, or when querying
pg_stat_statements system view. We simply say to change
compute_query_id=on or to provide a custom query id implementation.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#64

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Bruce Momjian (#63)

Re: compute_query_id and pg_stat_statements

On Wed, May 12, 2021 at 08:36:18PM -0400, Bruce Momjian wrote:

On Wed, May 12, 2021 at 05:51:49PM +0800, Julien Rouhaud wrote:

On Wed, May 12, 2021 at 11:42:12AM +0200, Pavel Stehule wrote:

this check just can check if there is "any" query-id provider. In this
context is not important if it is buildin or external

Yes, the idea is that if you execute "SELECT * FROM pg_stat_statements" or
similar, then if the executing query itself doesn't have a queryid then
it's very likely that you didn't configure compute_query_id or an alternative
query_id implementation properly. And loudly complaining seems like the right
thing to do.

I understand the desire to make pg_stat_statements require minimal
configuration, but frankly, if the server-side variable query id API is
confusing, I think we have done more harm than good.

The problem with compute_query_id=auto is that there is no way to know
if the query id is actually enabled, unless you guess from the installed
extensions, or we add another variable to report that, and maybe another
variable to control the provier, unless we require turning
compute_query_id=off if you are using custom query id computation. What
if it is auto, and pg_stat_statments is installed, and you want to use a
custom query id computation --- what happens? As you can see, this is
all becoming very complicated.

Well, as implemented you can get the value of compute_query_id, and if it's
still "auto" then it's not enabled as calling queryIdWanted() would turn it to
on. I agree that it's not ideal but you have a way to know. We could document
that auto means that it's set to auto and no one asked to automatically enabled
it.

Or you can just do e.g.

SELECT query_id FROM pg_stat_activity WHERE pid = pg_backend_pid();

and see if you have a query_id or not.

If you want to use third-party modules, they you have to explicitly disable
compute_query_id. If you don't, every query execution will raise an error as
we documented that third-party modules should error out if they see that a
query_id is already generated. Such module could also explicitly check that
compute_query_id is off and also raise an error if that's not the case.

I think we might be just as well to go with compute_query_id=on/off, and
just complain loudly from CREATE EXTENSION, or in the server logs on
server start via shared_preload_libraries, or when querying
pg_stat_statements system view. We simply say to change
compute_query_id=on or to provide a custom query id implementation.

I'm not opposed to that, but it was already suggested and apparently people
didn't like that approach.

#65

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 4 years ago

In reply to: Julien Rouhaud (#61)

Re: compute_query_id and pg_stat_statements

At Wed, 12 May 2021 18:09:30 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in

On Wed, May 12, 2021 at 06:37:24PM +0900, Kyotaro Horiguchi wrote:

At Wed, 12 May 2021 14:05:16 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in

And if I'm not mistaken, pg_store_plans also wants a different queryid
implementation, but has to handle a secondary queryid on top of it
(https://github.com/ossc-db/pg_store_plans/blob/master/pg_store_plans.c#L843-L855).

Yeah, the extension intended to be used joining with the
pg_stat_statements view. And the reason for the second query-id dates
back to the era when query id was not available in the
pg_stat_statements view. Now it is mere a fall-back query id when
pg_stat_statments is not active. Now that the in-core query-id is
available, I think there's no reason to keep that implement.

So here again what the extension want is to get rid of pg_stat_statements (and
now core) queryid implementation.

So the extension might be a good reason for the discussion^^;

Indeed. So IIUC, what pg_store_plans wants is:

Ugg. Very sorry. My brain should need more oxygen, or caffeine.. The
last sentence forgetting a negation. The plugin does not need a
special query-id provider so the special provider can be removed
without problems if the core provides one.

- to use its own query_id implementation
- to be able to be joined to pg_stat_statements

Is that correct?

It is correct, but a bit short in detail.

The query_id of its own is provided because pg_stat_statements did not
expose query_id. And it has been preserved only for the case the
plugin is used without pg_stat_statements activated. Now that the
in-core query_id is available, the last reason for the special
provider has gone.

If yes, it seems that starting with pg14, it can be easily achieved by:

So, it would be a bit different.

- documenting to disable compute_query_id

documenting to *not disable* compute_query_id. That is set it to on
or auto.

- eventually error out at execution time if it's enabled

So. the extension would check if any query_id provider *is* active.

- don't call queryIdWanted()
- expose its query_id

It will then work just fine, and will be more efficient compared to what is
done today as only one queryid will be calculated.

After reading Magnus' comment nearby, I realized that my most
significant concern here is how to know any query_id provider is
active. The way of setting the hook cannot enforce notifying that
kind of things on plugins. For me implementing them as a dll looked as
one of the most promising way of enabling that without needing any
boiler-plates.

Another not-prefect (in that it needs a boiler-plate) but workable way
is letting query-id providers set some variable including GUC
explicitly as Magnus' suggested. GUC would be better in that it is
naturally observable from users.

Even though there'a possibility that a developer of a query_id
provider forgets to set it, but maybe it would be easily
noticeable. On the other hand it gives a sure means to know any
query_id provider is active.

How about adding a GUC_INTERNAL "current_query_provider" or such?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#66

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 4 years ago

In reply to: Bruce Momjian (#63)

Re: compute_query_id and pg_stat_statements

At Wed, 12 May 2021 20:36:18 -0400, Bruce Momjian <bruce@momjian.us> wrote in

On Wed, May 12, 2021 at 05:51:49PM +0800, Julien Rouhaud wrote:

On Wed, May 12, 2021 at 11:42:12AM +0200, Pavel Stehule wrote:

this check just can check if there is "any" query-id provider. In this
context is not important if it is buildin or external

Yes, the idea is that if you execute "SELECT * FROM pg_stat_statements" or
similar, then if the executing query itself doesn't have a queryid then
it's very likely that you didn't configure compute_query_id or an alternative
query_id implementation properly. And loudly complaining seems like the right
thing to do.

I understand the desire to make pg_stat_statements require minimal
configuration, but frankly, if the server-side variable query id API is
confusing, I think we have done more harm than good.

The problem with compute_query_id=auto is that there is no way to know
if the query id is actually enabled, unless you guess from the installed
extensions, or we add another variable to report that, and maybe another
variable to control the provier, unless we require turning
compute_query_id=off if you are using custom query id computation. What
if it is auto, and pg_stat_statments is installed, and you want to use a
custom query id computation --- what happens? As you can see, this is
all becoming very complicated.

I think we might be just as well to go with compute_query_id=on/off, and
just complain loudly from CREATE EXTENSION, or in the server logs on
server start via shared_preload_libraries, or when querying
pg_stat_statements system view. We simply say to change
compute_query_id=on or to provide a custom query id implementation.

FWIW, I personally am fine with that (ignoring details :p), that is,
leaving the whole responsibility of a sane setup to users. If we are
going to automate even a part of it, I think we need to make it
perfect at least to a certain level. The current auery_id = auto
looks like somewhat halfway, or narrow-ranged.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#67

Bruce Momjian

bruce@momjian.us

over 4 years ago

In reply to: Julien Rouhaud (#64)

Re: compute_query_id and pg_stat_statements

On Thu, May 13, 2021 at 08:52:36AM +0800, Julien Rouhaud wrote:

On Wed, May 12, 2021 at 08:36:18PM -0400, Bruce Momjian wrote:

The problem with compute_query_id=auto is that there is no way to know
if the query id is actually enabled, unless you guess from the installed
extensions, or we add another variable to report that, and maybe another
variable to control the provier, unless we require turning
compute_query_id=off if you are using custom query id computation. What
if it is auto, and pg_stat_statments is installed, and you want to use a
custom query id computation --- what happens? As you can see, this is
all becoming very complicated.

Well, as implemented you can get the value of compute_query_id, and if it's
still "auto" then it's not enabled as calling queryIdWanted() would turn it to
on. I agree that it's not ideal but you have a way to know. We could document
that auto means that it's set to auto and no one asked to automatically enabled
it.

Wow, so the extension changes it? How do we record the "source" of that
change? Do we have other GUCs that do that?

Or you can just do e.g.

SELECT query_id FROM pg_stat_activity WHERE pid = pg_backend_pid();

and see if you have a query_id or not.

True.

If you want to use third-party modules, they you have to explicitly disable
compute_query_id. If you don't, every query execution will raise an error as
we documented that third-party modules should error out if they see that a
query_id is already generated. Such module could also explicitly check that
compute_query_id is off and also raise an error if that's not the case.

OK.

I think we might be just as well to go with compute_query_id=on/off, and
just complain loudly from CREATE EXTENSION, or in the server logs on
server start via shared_preload_libraries, or when querying
pg_stat_statements system view. We simply say to change
compute_query_id=on or to provide a custom query id implementation.

I'm not opposed to that, but it was already suggested and apparently people
didn't like that approach.

Also probably true.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#68

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Kyotaro Horiguchi (#65)

Re: compute_query_id and pg_stat_statements

On Thu, May 13, 2021 at 09:59:43AM +0900, Kyotaro Horiguchi wrote:

The query_id of its own is provided because pg_stat_statements did not
expose query_id. And it has been preserved only for the case the
plugin is used without pg_stat_statements activated. Now that the
in-core query_id is available, the last reason for the special
provider has gone.

Ah I see, indeed that makes sense. However I'm assuming that pg_store_plans
also requires *a* queryid, not specifically what used to be pg_stat_statements'
one right, so it could also fallback on an alternative implementation if users
configured one? Even if that's not the case, the core query_id can still be
calculated if needed as the function is now exported.

#69

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 4 years ago

In reply to: Kyotaro Horiguchi (#65)

Re: compute_query_id and pg_stat_statements

At Thu, 13 May 2021 09:59:43 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

How about adding a GUC_INTERNAL "current_query_provider" or such?

On the second thought, I wonder why we don't just call JumbleQuery in
pgss_post_parse_analyze when compute_query_id is "off".

We can think this behavior as the following.

- compute_query_id sets whether the *internal* query-id provider turn
on. If it is "off", query_id in , for example, pg_stat_activity is
not set. Even in that case it is set to some valid value if some
alternative query-id provider is active.

On the other hand pg_stat_statements looks as if providing
"alternative" query-id provider, but actually it is just calling
in-core JumbleQuery if not called yet.

@@ -830,6 +830,10 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
return;
}

+	/* Call in-core JumbleQuery if it was not called in-core */
+	if (!jstate)
+		jstate = JumbleQuery(query, pstate->p_sourcetext);
+
 	/*

Any plugins that want to use its own query-id generator can WARN if
jstate is not NULL, but also can proceed ignoring the exisint jstate.

WARNING: the default query-id provier is active, turn off compute_query_id to avoid unnecessary calculation

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#70

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Bruce Momjian (#67)

Re: compute_query_id and pg_stat_statements

On Wed, May 12, 2021 at 09:13:25PM -0400, Bruce Momjian wrote:

On Thu, May 13, 2021 at 08:52:36AM +0800, Julien Rouhaud wrote:

Well, as implemented you can get the value of compute_query_id, and if it's
still "auto" then it's not enabled as calling queryIdWanted() would turn it to
on. I agree that it's not ideal but you have a way to know. We could document
that auto means that it's set to auto and no one asked to automatically enabled
it.

Wow, so the extension changes it?

Yes. It seemed better to go this way rather than having a secondary read-only
GUC for that.

How do we record the "source" of that
change? Do we have other GUCs that do that?

No, we don't. But I don't know what exactly you would like to have as a
source? What if you have for instance pg_stat_statements, pg_stat_kcache,
pg_store_plans and pg_wait_sampling installed? All those extensions need a
query_id (or at least benefit from it for pg_wait_sampling), is there any value
to give a full list of all the modules that would enable compute_query_id?

I'm assuming that anyone wanting to install any of those extensions (or any
similar one) is fully aware that they aggregate metrics based on at least a
query_id. If they don't, well they probably never read any documentation since
postgres 9.2 which introduced query normalization, and I doubt that they will
be interested in having access to the information anyway.

#71

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Kyotaro Horiguchi (#69)

Re: compute_query_id and pg_stat_statements

On Thu, May 13, 2021 at 10:51:52AM +0900, Kyotaro Horiguchi wrote:

At Thu, 13 May 2021 09:59:43 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

How about adding a GUC_INTERNAL "current_query_provider" or such?

On the second thought, I wonder why we don't just call JumbleQuery in
pgss_post_parse_analyze when compute_query_id is "off".

Because not generating a query_id for a custom query_id implementation is a
valid use case for queries that are known to lead to huge pg_stat_statements
overhead, as I mentioned in [1]/messages/by-id/20210512081445.axosz3xf7ydrhe7o@nol. For the record I implemented that in
pg_queryid (optionally don't generate query_id for queries referencing a temp
relation) yesterday evening as a POC for that approach.

[1]: /messages/by-id/20210512081445.axosz3xf7ydrhe7o@nol

#72

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 4 years ago

In reply to: Julien Rouhaud (#71)

Re: compute_query_id and pg_stat_statements

At Thu, 13 May 2021 10:02:45 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in

On Thu, May 13, 2021 at 10:51:52AM +0900, Kyotaro Horiguchi wrote:

At Thu, 13 May 2021 09:59:43 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

How about adding a GUC_INTERNAL "current_query_provider" or such?

On the second thought, I wonder why we don't just call JumbleQuery in
pgss_post_parse_analyze when compute_query_id is "off".

Because not generating a query_id for a custom query_id implementation is a
valid use case for queries that are known to lead to huge pg_stat_statements
overhead, as I mentioned in [1]. For the record I implemented that in
pg_queryid (optionally don't generate query_id for queries referencing a temp
relation) yesterday evening as a POC for that approach.

Yes, I know. So I said that "if not yet called". I believe any "real"
alternative query-id provider is supposed to be hooked "before"
pg_stat_statements. (It is a kind of magic to control the order of
plugins, though..) When the alternative provider generated a query_id
(that is, it has set jstate), pg_stat_statment doesn't call the
in-core JumbleQuery and uses the givin query_id.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#73

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 4 years ago

In reply to: Kyotaro Horiguchi (#72)

Re: compute_query_id and pg_stat_statements

At Thu, 13 May 2021 11:26:29 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

At Thu, 13 May 2021 10:02:45 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in

On Thu, May 13, 2021 at 10:51:52AM +0900, Kyotaro Horiguchi wrote:

At Thu, 13 May 2021 09:59:43 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

How about adding a GUC_INTERNAL "current_query_provider" or such?

On the second thought, I wonder why we don't just call JumbleQuery in
pgss_post_parse_analyze when compute_query_id is "off".

Because not generating a query_id for a custom query_id implementation is a
valid use case for queries that are known to lead to huge pg_stat_statements
overhead, as I mentioned in [1]. For the record I implemented that in
pg_queryid (optionally don't generate query_id for queries referencing a temp
relation) yesterday evening as a POC for that approach.

Yes, I know. So I said that "if not yet called". I believe any "real"
alternative query-id provider is supposed to be hooked "before"
pg_stat_statements. (It is a kind of magic to control the order of
plugins, though..) When the alternative provider generated a query_id
(that is, it has set jstate), pg_stat_statment doesn't call the
in-core JumbleQuery and uses the givin query_id.

Forgot to mention, I think that the state "query_id provider is active
but it has not assigned one to this query" can be signaled by
jstate=<non-null> and query_id = 0.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#74

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Kyotaro Horiguchi (#73)

Re: compute_query_id and pg_stat_statements

On Thu, May 13, 2021 at 11:30:56AM +0900, Kyotaro Horiguchi wrote:

At Thu, 13 May 2021 11:26:29 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

At Thu, 13 May 2021 10:02:45 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in

On Thu, May 13, 2021 at 10:51:52AM +0900, Kyotaro Horiguchi wrote:

At Thu, 13 May 2021 09:59:43 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

How about adding a GUC_INTERNAL "current_query_provider" or such?

On the second thought, I wonder why we don't just call JumbleQuery in
pgss_post_parse_analyze when compute_query_id is "off".

Because not generating a query_id for a custom query_id implementation is a
valid use case for queries that are known to lead to huge pg_stat_statements
overhead, as I mentioned in [1]. For the record I implemented that in
pg_queryid (optionally don't generate query_id for queries referencing a temp
relation) yesterday evening as a POC for that approach.

Yes, I know. So I said that "if not yet called". I believe any "real"
alternative query-id provider is supposed to be hooked "before"
pg_stat_statements. (It is a kind of magic to control the order of
plugins, though..) When the alternative provider generated a query_id
(that is, it has set jstate), pg_stat_statment doesn't call the
in-core JumbleQuery and uses the givin query_id.

Forgot to mention, I think that the state "query_id provider is active
but it has not assigned one to this query" can be signaled by
jstate=<non-null> and query_id = 0.

I assume that you mean "third-party query_id provider" here, as the core one
will always return a non-zero query_id?

I guess it could work, but a lot of people are complaining that having
compute_query_id = [ off | on | auto ] is too confusing, so I don't see how
having "off" means "sometimes off, sometimes on" is going to be any clearer for
users.

#75

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Kyotaro Horiguchi (#72)

Re: compute_query_id and pg_stat_statements

On Thu, May 13, 2021 at 11:26:29AM +0900, Kyotaro Horiguchi wrote:

I believe any "real"
alternative query-id provider is supposed to be hooked "before"
pg_stat_statements. (It is a kind of magic to control the order of
plugins, though..

Indeed, you have to configure shared_preload_libraries depending on whether
each module calls the previous post_parse_analyze_hook before or after its own
processing, and that's the main reason why I think a dedicated entry point
would be better.

#76

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 4 years ago

In reply to: Julien Rouhaud (#74)

Re: compute_query_id and pg_stat_statements

At Thu, 13 May 2021 10:39:20 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in

On Thu, May 13, 2021 at 11:30:56AM +0900, Kyotaro Horiguchi wrote:

At Thu, 13 May 2021 11:26:29 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

At Thu, 13 May 2021 10:02:45 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in
Yes, I know. So I said that "if not yet called". I believe any "real"
alternative query-id provider is supposed to be hooked "before"
pg_stat_statements. (It is a kind of magic to control the order of
plugins, though..) When the alternative provider generated a query_id
(that is, it has set jstate), pg_stat_statment doesn't call the
in-core JumbleQuery and uses the givin query_id.

Forgot to mention, I think that the state "query_id provider is active
but it has not assigned one to this query" can be signaled by
jstate=<non-null> and query_id = 0.

I assume that you mean "third-party query_id provider" here, as the core one
will always return a non-zero query_id?

Right.

I guess it could work, but a lot of people are complaining that having
compute_query_id = [ off | on | auto ] is too confusing, so I don't see how
having "off" means "sometimes off, sometimes on" is going to be any clearer for
users.

I don't get it. It read as "people are complaining the tristate is too
confusing, so I made it tristate"?

For the second point, so I said that the variable controls whether the
"internal" query-id pvovider turn on. It is more clearer if the name
were something like "use_internal_query_id_generator".

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#77

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Kyotaro Horiguchi (#76)

Re: compute_query_id and pg_stat_statements

On Thu, May 13, 2021 at 11:49:34AM +0900, Kyotaro Horiguchi wrote:

At Thu, 13 May 2021 10:39:20 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in

On Thu, May 13, 2021 at 11:30:56AM +0900, Kyotaro Horiguchi wrote:

At Thu, 13 May 2021 11:26:29 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

At Thu, 13 May 2021 10:02:45 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in
Yes, I know. So I said that "if not yet called". I believe any "real"
alternative query-id provider is supposed to be hooked "before"
pg_stat_statements. (It is a kind of magic to control the order of
plugins, though..) When the alternative provider generated a query_id
(that is, it has set jstate), pg_stat_statment doesn't call the
in-core JumbleQuery and uses the givin query_id.

Forgot to mention, I think that the state "query_id provider is active
but it has not assigned one to this query" can be signaled by
jstate=<non-null> and query_id = 0.

I assume that you mean "third-party query_id provider" here, as the core one
will always return a non-zero query_id?

Right.

I guess it could work, but a lot of people are complaining that having
compute_query_id = [ off | on | auto ] is too confusing, so I don't see how
having "off" means "sometimes off, sometimes on" is going to be any clearer for
users.

I don't get it. It read as "people are complaining the tristate is too
confusing, so I made it tristate"?

No, the consensus was for having a tristate, so I implemented it, and now
people are complaining that it's too confusing.

For the second point, so I said that the variable controls whether the
"internal" query-id pvovider turn on. It is more clearer if the name
were something like "use_internal_query_id_generator".

I don't see how it's really different. If I understand correctly, you're
suggesting that
use_internal_query_id_generator = off
can mean either

- off
- on if pg_stat_statements or similar extension is configured but no custom
query_id provider is configured, and in any case it will always be displayed
as off

with no other new GUC. Is that correct?

#78

Bruce Momjian

bruce@momjian.us

over 4 years ago

In reply to: Julien Rouhaud (#70)

Re: compute_query_id and pg_stat_statements

On Thu, May 13, 2021 at 09:57:00AM +0800, Julien Rouhaud wrote:

On Wed, May 12, 2021 at 09:13:25PM -0400, Bruce Momjian wrote:

On Thu, May 13, 2021 at 08:52:36AM +0800, Julien Rouhaud wrote:

Well, as implemented you can get the value of compute_query_id, and if it's
still "auto" then it's not enabled as calling queryIdWanted() would turn it to
on. I agree that it's not ideal but you have a way to know. We could document
that auto means that it's set to auto and no one asked to automatically enabled
it.

Wow, so the extension changes it?

Yes. It seemed better to go this way rather than having a secondary read-only
GUC for that.

How do we record the "source" of that
change? Do we have other GUCs that do that?

No, we don't. But I don't know what exactly you would like to have as a

OK.

source? What if you have for instance pg_stat_statements, pg_stat_kcache,
pg_store_plans and pg_wait_sampling installed? All those extensions need a
query_id (or at least benefit from it for pg_wait_sampling), is there any value
to give a full list of all the modules that would enable compute_query_id?

Well, we don't have any other cases where the source of the change is
this indeterminate, so I don't really have a suggestion. I think this
does highlight another case where 'auto' just isn't a good fit for our
API.

I'm assuming that anyone wanting to install any of those extensions (or any
similar one) is fully aware that they aggregate metrics based on at least a
query_id. If they don't, well they probably never read any documentation since
postgres 9.2 which introduced query normalization, and I doubt that they will
be interested in having access to the information anyway.

My point is that we are changing a setting from an extension, and if you
look in pg_setting, it will say "default"?

If the user already has to edit postgresql.conf to set
shared_preload_libraries, I still don't see why having them set
compute_query_id at the same time is a significant problem and a reason
to distort our API to do 'auto'.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#79

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 4 years ago

In reply to: Julien Rouhaud (#75)

Re: compute_query_id and pg_stat_statements

At Thu, 13 May 2021 10:43:03 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in

On Thu, May 13, 2021 at 11:26:29AM +0900, Kyotaro Horiguchi wrote:

I believe any "real"
alternative query-id provider is supposed to be hooked "before"
pg_stat_statements. (It is a kind of magic to control the order of
plugins, though..

Indeed, you have to configure shared_preload_libraries depending on whether
each module calls the previous post_parse_analyze_hook before or after its own
processing, and that's the main reason why I think a dedicated entry point
would be better.

I see it as cleaner than the status-quo. (But still believing less
cleaner than DLL:p, since the same problem happens if two
query_id-generating modules are competing on the new hook ponit.).

You told that a special query-id provider needed to be separated to
another DLL, but a preload shared librarie is also a dll and I think
any exported function in it can be obtained via
load_external_function().

As the result, even if we take the DLL approach, still not need to
split out the query-id provider part. By the following config:

query_id_provider = 'pg_stat_statements'

the core can obtain the entrypoint of, say, "_PG_calculate_query_id"
to call it. And it can be of another module.

It seems like the only problem doing that is we don't have a means to
call per-process intializer for a preload libralies.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#80

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Bruce Momjian (#78)

Re: compute_query_id and pg_stat_statements

On Wed, May 12, 2021 at 11:06:52PM -0400, Bruce Momjian wrote:

On Thu, May 13, 2021 at 09:57:00AM +0800, Julien Rouhaud wrote:

source? What if you have for instance pg_stat_statements, pg_stat_kcache,
pg_store_plans and pg_wait_sampling installed? All those extensions need a
query_id (or at least benefit from it for pg_wait_sampling), is there any value
to give a full list of all the modules that would enable compute_query_id?

Well, we don't have any other cases where the source of the change is
this indeterminate, so I don't really have a suggestion. I think this
does highlight another case where 'auto' just isn't a good fit for our
API.

It may depends on your next question

I'm assuming that anyone wanting to install any of those extensions (or any
similar one) is fully aware that they aggregate metrics based on at least a
query_id. If they don't, well they probably never read any documentation since
postgres 9.2 which introduced query normalization, and I doubt that they will
be interested in having access to the information anyway.

My point is that we are changing a setting from an extension, and if you
look in pg_setting, it will say "default"?

No, it will say "on". What the patch I sent implements is:

- compute_query_id = on means it was either explicitly set to on, or
automatically set to on if it was allowed to (so initially set to auto). It
means you know whether core query_id calculation is enabled or not, you can
know looking at the reset value it it was changed by an extension, you just
don't know which one(s).

- compute_query_id = auto means that it can be set to on, it just wasn't yet,
so it's off, and may change if you have an extension that can be dynamically
loaded and request for core query_id calculation to be enabled

- compute_query_id = off means it's off

If the user already has to edit postgresql.conf to set
shared_preload_libraries, I still don't see why having them set
compute_query_id at the same time is a significant problem and a reason
to distort our API to do 'auto'.

Looking at the arguments until now my understanding is that it's because "no
one will read the doc anyway".

#81

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Kyotaro Horiguchi (#79)

Re: compute_query_id and pg_stat_statements

On Thu, May 13, 2021 at 12:11:12PM +0900, Kyotaro Horiguchi wrote:

At Thu, 13 May 2021 10:43:03 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in

On Thu, May 13, 2021 at 11:26:29AM +0900, Kyotaro Horiguchi wrote:

I believe any "real"
alternative query-id provider is supposed to be hooked "before"
pg_stat_statements. (It is a kind of magic to control the order of
plugins, though..

Indeed, you have to configure shared_preload_libraries depending on whether
each module calls the previous post_parse_analyze_hook before or after its own
processing, and that's the main reason why I think a dedicated entry point
would be better.

I see it as cleaner than the status-quo. (But still believing less
cleaner than DLL:p, since the same problem happens if two
query_id-generating modules are competing on the new hook ponit.).

You told that a special query-id provider needed to be separated to
another DLL

No, I'm saying a different entry point. It can be a new hook or an explicit
function name called for a dynamically loaded function, I'm fine with both as
long as it's called before post_parse_analyze_hook.

It seems like the only problem doing that is we don't have a means to
call per-process intializer for a preload libralies.

But that's going to happen only once per backend? If it's still adding too
much overhead you could add the module in shared_preload_libraries.

#82

Bruce Momjian

bruce@momjian.us

over 4 years ago

In reply to: Julien Rouhaud (#80)

Re: compute_query_id and pg_stat_statements

On Thu, May 13, 2021 at 11:16:13AM +0800, Julien Rouhaud wrote:

On Wed, May 12, 2021 at 11:06:52PM -0400, Bruce Momjian wrote:

On Thu, May 13, 2021 at 09:57:00AM +0800, Julien Rouhaud wrote:

source? What if you have for instance pg_stat_statements, pg_stat_kcache,
pg_store_plans and pg_wait_sampling installed? All those extensions need a
query_id (or at least benefit from it for pg_wait_sampling), is there any value
to give a full list of all the modules that would enable compute_query_id?

Well, we don't have any other cases where the source of the change is
this indeterminate, so I don't really have a suggestion. I think this
does highlight another case where 'auto' just isn't a good fit for our
API.

It may depends on your next question

I'm assuming that anyone wanting to install any of those extensions (or any
similar one) is fully aware that they aggregate metrics based on at least a
query_id. If they don't, well they probably never read any documentation since
postgres 9.2 which introduced query normalization, and I doubt that they will
be interested in having access to the information anyway.

My point is that we are changing a setting from an extension, and if you
look in pg_setting, it will say "default"?

No, it will say "on". What the patch I sent implements is:

I was asking what pg_settings.source will say:

SELECT name, soource FROM pg_settings;

- compute_query_id = on means it was either explicitly set to on, or
automatically set to on if it was allowed to (so initially set to auto). It
means you know whether core query_id calculation is enabled or not, you can
know looking at the reset value it it was changed by an extension, you just
don't know which one(s).

- compute_query_id = auto means that it can be set to on, it just wasn't yet,
so it's off, and may change if you have an extension that can be dynamically
loaded and request for core query_id calculation to be enabled

So, it is 'off', but not set to 'off' in the GUC sense --- just off as
in not being computed. You can see the confusion in my just reading
that sentence.

- compute_query_id = off means it's off

If the user already has to edit postgresql.conf to set
shared_preload_libraries, I still don't see why having them set
compute_query_id at the same time is a significant problem and a reason
to distort our API to do 'auto'.

Looking at the arguments until now my understanding is that it's because "no
one will read the doc anyway".

How do they know to set shared_preload_libraries then? We change the
user API all the time, especially in GUCs, and even rename them, but for
some reason we don't think people using pg_stat_statements can be
trusted to read the release notes and change their behavior. I just
don't get it.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#83

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Bruce Momjian (#82)

Re: compute_query_id and pg_stat_statements

On Wed, May 12, 2021 at 11:33:32PM -0400, Bruce Momjian wrote:

On Thu, May 13, 2021 at 11:16:13AM +0800, Julien Rouhaud wrote:

On Wed, May 12, 2021 at 11:06:52PM -0400, Bruce Momjian wrote:

On Thu, May 13, 2021 at 09:57:00AM +0800, Julien Rouhaud wrote:

source? What if you have for instance pg_stat_statements, pg_stat_kcache,
pg_store_plans and pg_wait_sampling installed? All those extensions need a
query_id (or at least benefit from it for pg_wait_sampling), is there any value
to give a full list of all the modules that would enable compute_query_id?

Well, we don't have any other cases where the source of the change is
this indeterminate, so I don't really have a suggestion. I think this
does highlight another case where 'auto' just isn't a good fit for our
API.

It may depends on your next question

I'm assuming that anyone wanting to install any of those extensions (or any
similar one) is fully aware that they aggregate metrics based on at least a
query_id. If they don't, well they probably never read any documentation since
postgres 9.2 which introduced query normalization, and I doubt that they will
be interested in having access to the information anyway.

My point is that we are changing a setting from an extension, and if you
look in pg_setting, it will say "default"?

No, it will say "on". What the patch I sent implements is:

I was asking what pg_settings.source will say:

SELECT name, soource FROM pg_settings;

Oh sorry. Here's the full output before and after a dynamic call to
queryIsWanted():

name | compute_query_id
setting | auto
unit | <NULL>
category | Statistics / Monitoring
short_desc | Compute query identifiers.
extra_desc | <NULL>
context | superuser
vartype | enum
source | default
min_val | <NULL>
max_val | <NULL>
enumvals | {auto,on,off}
boot_val | auto
reset_val | auto
sourcefile | <NULL>
sourceline | <NULL>
pending_restart | f

=# LOAD 'pg_qualstats'; /* will call queryIsWanted() */
WARNING: 01000: Without shared_preload_libraries, only current backend stats will be available.
LOAD

name | compute_query_id
setting | on
unit | <NULL>
category | Statistics / Monitoring
short_desc | Compute query identifiers.
extra_desc | <NULL>
context | superuser
vartype | enum
source | default
min_val | <NULL>
max_val | <NULL>
enumvals | {auto,on,off}
boot_val | auto
reset_val | auto
sourcefile | <NULL>
sourceline | <NULL>
pending_restart | f

So yes, it says "default" (and it's the same if the change is done during
postmaster init). It can be fixed if that's the only issue.

- compute_query_id = on means it was either explicitly set to on, or
automatically set to on if it was allowed to (so initially set to auto). It
means you know whether core query_id calculation is enabled or not, you can
know looking at the reset value it it was changed by an extension, you just
don't know which one(s).

- compute_query_id = auto means that it can be set to on, it just wasn't yet,
so it's off, and may change if you have an extension that can be dynamically
loaded and request for core query_id calculation to be enabled

So, it is 'off', but not set to 'off' in the GUC sense --- just off as
in not being computed. You can see the confusion in my just reading
that sentence.

It's technically not "off" but "not on yet", but that's probably just making it
worse :)

How do they know to set shared_preload_libraries then? We change the
user API all the time, especially in GUCs, and even rename them, but for
some reason we don't think people using pg_stat_statements can be
trusted to read the release notes and change their behavior. I just
don't get it.

I don't know what to say. So here is a summary of the complaints that I'm
aware of:

- /messages/by-id/1953aec168224b95b0c962a622bef0794da6ff40.camel@moonset.ru
That was only a couple of days after the commit just before the feature freeze,
so it may be the less relevant complaint.

- /messages/by-id/CAOxo6XJEYunL71g0yD-zRzNRRqBG0Ssw-ARygy5pGRdSjK5YLQ@mail.gmail.com
Did a git bisect to find the commit that changed the behavior and somehow
didn't notice the new setting

- this thread, with Fuji-san saying:

I'm afraid that users may easily forget to enable compute_query_id when using
pg_stat_statements (because this setting was not necessary in v13 or before)

- this thread, with Peter E. saying:

Now there is the additional burden of turning on this weird setting that
no one understands. That's a 50% increase in burden. And almost no one will
want to use a nondefault setting. pg_stat_statements is pretty popular. I
think leaving in this requirement will lead to widespread confusion and
complaints.

- this thread, with Pavel saying:

Until now, the pg_stat_statements was zero-config. So the change is not user
friendly.

So it's a mix of "it's changing something that didn't change in a long time"
and "it's adding extra footgun and/or burden as it's not doing by default what
the majority of users will want", with an overwhelming majority of people
supporting the "we don't want that extra burden".

#84

Fujii Masao

masao.fujii@oss.nttdata.com

over 4 years ago

In reply to: Julien Rouhaud (#83)

Re: compute_query_id and pg_stat_statements

On 2021/05/13 13:03, Julien Rouhaud wrote:

On Wed, May 12, 2021 at 11:33:32PM -0400, Bruce Momjian wrote:

On Thu, May 13, 2021 at 11:16:13AM +0800, Julien Rouhaud wrote:

On Wed, May 12, 2021 at 11:06:52PM -0400, Bruce Momjian wrote:

On Thu, May 13, 2021 at 09:57:00AM +0800, Julien Rouhaud wrote:

source? What if you have for instance pg_stat_statements, pg_stat_kcache,
pg_store_plans and pg_wait_sampling installed? All those extensions need a
query_id (or at least benefit from it for pg_wait_sampling), is there any value
to give a full list of all the modules that would enable compute_query_id?

Well, we don't have any other cases where the source of the change is
this indeterminate, so I don't really have a suggestion. I think this
does highlight another case where 'auto' just isn't a good fit for our
API.

It may depends on your next question

I'm assuming that anyone wanting to install any of those extensions (or any
similar one) is fully aware that they aggregate metrics based on at least a
query_id. If they don't, well they probably never read any documentation since
postgres 9.2 which introduced query normalization, and I doubt that they will
be interested in having access to the information anyway.

My point is that we are changing a setting from an extension, and if you
look in pg_setting, it will say "default"?

No, it will say "on". What the patch I sent implements is:

I was asking what pg_settings.source will say:

SELECT name, soource FROM pg_settings;

Oh sorry. Here's the full output before and after a dynamic call to
queryIsWanted():

name | compute_query_id
setting | auto
unit | <NULL>
category | Statistics / Monitoring
short_desc | Compute query identifiers.
extra_desc | <NULL>
context | superuser
vartype | enum
source | default
min_val | <NULL>
max_val | <NULL>
enumvals | {auto,on,off}
boot_val | auto
reset_val | auto
sourcefile | <NULL>
sourceline | <NULL>
pending_restart | f

=# LOAD 'pg_qualstats'; /* will call queryIsWanted() */
WARNING: 01000: Without shared_preload_libraries, only current backend stats will be available.
LOAD

name | compute_query_id
setting | on
unit | <NULL>
category | Statistics / Monitoring
short_desc | Compute query identifiers.
extra_desc | <NULL>
context | superuser
vartype | enum
source | default
min_val | <NULL>
max_val | <NULL>
enumvals | {auto,on,off}
boot_val | auto
reset_val | auto
sourcefile | <NULL>
sourceline | <NULL>
pending_restart | f

So yes, it says "default" (and it's the same if the change is done during
postmaster init). It can be fixed if that's the only issue.

I like leaving compute_query_id=auto instead of overwriting it to "on",
even when queryIsWanted() is called, as I told upthread. Then we can decide
that query id computation is necessary when compute_query_id=auto and
the special flag is enabled by queryIsWanted(). Of course as you and Magnus
discussed upthread, the issue in EXEC_BACKEND case should be fixed,
maybe by using save_backend_variables() or something, though.

If we do this, compute_query_id=auto seems to be similar to huge_pages=try.
When huge_pages=try, whether huge pages is actually used is defined depending
outside PostgreSQL (i.e, OS setting in this case). Neither pg_settings.setting nor
souce are not changed in that case.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#85

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Fujii Masao (#84)

Re: compute_query_id and pg_stat_statements

Le jeu. 13 mai 2021 à 12:18, Fujii Masao <masao.fujii@oss.nttdata.com> a
écrit :

I like leaving compute_query_id=auto instead of overwriting it to "on",
even when queryIsWanted() is called, as I told upthread. Then we can decide
that query id computation is necessary when compute_query_id=auto and
the special flag is enabled by queryIsWanted(). Of course as you and Magnus
discussed upthread, the issue in EXEC_BACKEND case should be fixed,
maybe by using save_backend_variables() or something, though.

If we do this, compute_query_id=auto seems to be similar to huge_pages=try.
When huge_pages=try, whether huge pages is actually used is defined
depending
outside PostgreSQL (i.e, OS setting in this case). Neither
pg_settings.setting nor
souce are not changed in that case.

I'm fine with that, but a lot of people complained that it wasn't good
because you don't really know if it's actually on or not. I personally
don't think that it's an issue, because what user want is to
automagumically do what they want, not check how the magic happened, and if
they want a third party implementation then the module can error out if the
setting is on, so the burden will only be for those users, and handled by
the third party module author.

Show quoted text

#86

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 4 years ago

In reply to: Kyotaro Horiguchi (#79)

1 attachment(s)

Re: compute_query_id and pg_stat_statements

At Thu, 13 May 2021 12:11:12 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

As the result, even if we take the DLL approach, still not need to
split out the query-id provider part. By the following config:

query_id_provider = 'pg_stat_statements'

the core can obtain the entrypoint of, say, "_PG_calculate_query_id"
to call it. And it can be of another module.

It seems like the only problem doing that is we don't have a means to
call per-process intializer for a preload libralies.

So this is a crude PoC of that.

pg_stat_statemnts defines its own query-id provider function in
pg_stat_statements which calls in-core DefaultJumbeQuery (end emits a
log line).

If server started with query_id_provider='auto' and pg_stat_statements
is not loaded, pg_stat_activity.query_id is null.

If query_id_provider='auto' and pg_stat_statements is loaded,
pg_stat_activity.query_id is filled in with a query id.

If query_id_provider='default' or 'pg_stat_statements' and
pg_stat_statements is not loaded, pg_stat_activity.query_id is filled
in with a query id.

If query_id_provider='none' and pg_stat_statements is loaded, it
complains as "query id provider is not available" and refuss to start.

If showing the variable, it shows the real provider name instead of
the setting in postgresql.conf.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

query_id_provider_poc.patch.txttext/plain; charset=us-asciiDownload

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index a85f962801..207c4362af 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -295,6 +295,7 @@ static bool pgss_save;			/* whether to save stats across shutdown */
 
 void		_PG_init(void);
 void		_PG_fini(void);
+JumbleState *_PG_calculate_query_id(Query *query, const char *querytext);
 
 PG_FUNCTION_INFO_V1(pg_stat_statements_reset);
 PG_FUNCTION_INFO_V1(pg_stat_statements_reset_1_7);
@@ -478,6 +479,13 @@ _PG_fini(void)
 	ProcessUtility_hook = prev_ProcessUtility;
 }
 
+/* Test queryid provider function */
+JumbleState *_PG_calculate_query_id(Query *query, const char *querytext)
+{
+	elog(LOG, "Called query id generatr of pg_stat_statements");
+	return DefaultJumbleQuery(query, querytext);
+}
+
 /*
  * shmem_startup hook: allocate or attach to shared memory,
  * then load any pre-existing statistics from file.
@@ -544,6 +552,11 @@ pgss_shmem_startup(void)
 	if (!IsUnderPostmaster)
 		on_shmem_exit(pgss_shmem_shutdown, (Datum) 0);
 
+	/* request my defalt provider, but allow exisint one */
+	if (!queryIdWanted("pg_stat_statements", true))
+		ereport(ERROR,
+				(errmsg ("query_id provider is not available")));
+
 	/*
 	 * Done if some other process already completed our initialization.
 	 */
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 1202bf85a3..be00564221 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -245,8 +245,7 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 	es->summary = (summary_set) ? es->summary : es->analyze;
 
 	query = castNode(Query, stmt->query);
-	if (compute_query_id)
-		jstate = JumbleQuery(query, pstate->p_sourcetext);
+	jstate = JumbleQuery(query, pstate->p_sourcetext);
 
 	if (post_parse_analyze_hook)
 		(*post_parse_analyze_hook) (pstate, query, jstate);
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 168198acd1..bdf3a5a6d1 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -124,8 +124,7 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 
 	query = transformTopLevelStmt(pstate, parseTree);
 
-	if (compute_query_id)
-		jstate = JumbleQuery(query, sourceText);
+	jstate = JumbleQuery(query, sourceText);
 
 	if (post_parse_analyze_hook)
 		(*post_parse_analyze_hook) (pstate, query, jstate);
@@ -163,8 +162,7 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 	/* make sure all is well with parameter types */
 	check_variable_parameters(pstate, query);
 
-	if (compute_query_id)
-		jstate = JumbleQuery(query, sourceText);
+	jstate = JumbleQuery(query, sourceText);
 
 	if (post_parse_analyze_hook)
 		(*post_parse_analyze_hook) (pstate, query, jstate);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 6200699ddd..1034dfea28 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -704,8 +704,7 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 
 	query = transformTopLevelStmt(pstate, parsetree);
 
-	if (compute_query_id)
-		jstate = JumbleQuery(query, query_string);
+	jstate = JumbleQuery(query, query_string);
 
 	if (post_parse_analyze_hook)
 		(*post_parse_analyze_hook) (pstate, query, jstate);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index eb7f7181e4..70d06b825e 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -101,6 +101,7 @@
 #include "utils/plancache.h"
 #include "utils/portal.h"
 #include "utils/ps_status.h"
+#include "utils/queryjumble.h"
 #include "utils/rls.h"
 #include "utils/snapmgr.h"
 #include "utils/tzparser.h"
@@ -534,7 +535,6 @@ extern const struct config_enum_entry dynamic_shared_memory_options[];
 /*
  * GUC option variables that are exported from this module
  */
-bool		compute_query_id = false;
 bool		log_duration = false;
 bool		Debug_print_plan = false;
 bool		Debug_print_parse = false;
@@ -1441,15 +1441,6 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
-	{
-		{"compute_query_id", PGC_SUSET, STATS_MONITORING,
-			gettext_noop("Compute query identifiers."),
-			NULL
-		},
-		&compute_query_id,
-		false,
-		NULL, NULL, NULL
-	},
 	{
 		{"log_parser_stats", PGC_SUSET, STATS_MONITORING,
 			gettext_noop("Writes parser performance statistics to the server log."),
@@ -4579,6 +4570,16 @@ static struct config_string ConfigureNamesString[] =
 		check_backtrace_functions, assign_backtrace_functions, NULL
 	},
 
+
+	{
+		{"query_id_provider", PGC_SUSET, CLIENT_CONN_PRELOAD,
+			gettext_noop("Sets the query-id provider."),
+		},
+		&query_id_provider,
+		"auto",
+		check_query_id_provider, assign_query_id_provider, NULL
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, NULL, NULL, NULL, NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index efde01ee56..fc31ce15c4 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -604,7 +604,7 @@
 
 # - Monitoring -
 
-#compute_query_id = off
+#query_id_provider = 'auto'
 #log_statement_stats = off
 #log_parser_stats = off
 #log_planner_stats = off
diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
index f004a9ce8c..709d654ea0 100644
--- a/src/backend/utils/misc/queryjumble.c
+++ b/src/backend/utils/misc/queryjumble.c
@@ -47,6 +47,96 @@ static void JumbleRangeTable(JumbleState *jstate, List *rtable);
 static void JumbleRowMarks(JumbleState *jstate, List *rowMarks);
 static void JumbleExpr(JumbleState *jstate, Node *node);
 static void RecordConstLocation(JumbleState *jstate, int location);
+static JumbleState *DummyJumbleQuery(Query *query, const char *querytext);
+
+char *query_id_provider = NULL;
+static bool lock_provider = false;
+
+typedef JumbleState *(*JumbleQueryType) (Query *query, const char *querytext);
+JumbleQueryType JumbleQuery = NULL;
+
+typedef struct QueryIdProviderExtra
+{
+	JumbleQueryType pfunc;
+} QueryIdProviderExtra;
+
+bool
+check_query_id_provider(char **newval, void **extra, GucSource source)
+{
+	QueryIdProviderExtra *param = NULL;
+
+	if (lock_provider)
+		return false;
+
+	param = (QueryIdProviderExtra *)malloc(sizeof(QueryIdProviderExtra));
+	if (param == NULL)
+		return false;
+
+	if (strcmp(*newval, "none") == 0 ||
+		strcmp(*newval, "auto") == 0)
+		param->pfunc = DummyJumbleQuery;
+	else if (strcmp(*newval, "default") == 0)
+		param->pfunc = DefaultJumbleQuery;
+	else
+		param->pfunc =
+			load_external_function(*newval, "_PG_calculate_query_id",
+								   false, NULL);
+
+	if (param->pfunc == NULL)
+	{
+		free(param);
+		param = NULL;
+		GUC_check_errdetail("failed to load query id provider");
+		return false;
+	}
+
+	*extra = (void *) param;
+	return true;
+}
+
+void
+assign_query_id_provider(const char *newval, void *extra)
+{
+	QueryIdProviderExtra *param = (QueryIdProviderExtra *)extra;
+
+	JumbleQuery = param->pfunc;
+}
+
+bool
+queryIdWanted(char *provider_name, bool use_existing)
+{
+	JumbleQueryType func;
+
+	Assert(query_id_provider != NULL);
+	Assert(JumbleQuery != NULL);
+
+	if (lock_provider || strcmp(query_id_provider, "none") == 0)
+		return false;
+
+	/* use existing provider when use_existing */
+	if (strcmp(query_id_provider, "auto") != 0 && use_existing)
+		return true;
+
+	if (strcmp(provider_name, "default") == 0)
+		func = DefaultJumbleQuery;
+	else
+		func = load_external_function(provider_name, "_PG_calculate_query_id",
+									  false, NULL);
+
+	if (func == NULL)
+		return false;
+
+	elog(LOG, "query-id provider \"%s\" loaded", provider_name);
+	JumbleQuery = func;
+
+	/* expose real provider name */
+	SetConfigOption("query_id_provider", provider_name,
+					PGC_POSTMASTER, PGC_S_OVERRIDE);
+
+	lock_provider = true;
+
+	return true;
+}
 
 /*
  * Given a possibly multi-statement source string, confine our attention to the
@@ -92,7 +182,7 @@ CleanQuerytext(const char *query, int *location, int *len)
 }
 
 JumbleState *
-JumbleQuery(Query *query, const char *querytext)
+DefaultJumbleQuery(Query *query, const char *querytext)
 {
 	JumbleState *jstate = NULL;
 
@@ -132,6 +222,12 @@ JumbleQuery(Query *query, const char *querytext)
 	return jstate;
 }
 
+static JumbleState *
+DummyJumbleQuery(Query *query, const char *querytext)
+{
+	return NULL;
+}
+
 /*
  * Compute a query identifier for the given utility query string.
  */
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 24a5d9d3fb..a7c3a4958e 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -247,7 +247,6 @@ extern bool log_btree_build_stats;
 extern PGDLLIMPORT bool check_function_bodies;
 extern bool session_auth_is_superuser;
 
-extern bool compute_query_id;
 extern bool log_duration;
 extern int	log_parameter_max_length;
 extern int	log_parameter_max_length_on_error;
diff --git a/src/include/utils/queryjumble.h b/src/include/utils/queryjumble.h
index 83ba7339fa..682d687b79 100644
--- a/src/include/utils/queryjumble.h
+++ b/src/include/utils/queryjumble.h
@@ -15,6 +15,7 @@
 #define QUERYJUBLE_H
 
 #include "nodes/parsenodes.h"
+#include "utils/guc.h"
 
 #define JUMBLE_SIZE				1024	/* query serialization buffer size */
 
@@ -52,7 +53,12 @@ typedef struct JumbleState
 	int			highest_extern_param_id;
 } JumbleState;
 
+extern char *query_id_provider;
+extern JumbleState *(*JumbleQuery)(Query *query, const char *querytext);
+
+JumbleState *DefaultJumbleQuery(Query *query, const char *querytext);
+bool queryIdWanted(char *provider_name, bool use_existing);
+bool check_query_id_provider(char **newval, void **extra, GucSource source);
+void assign_query_id_provider(const char *newval, void *extra);
 const char *CleanQuerytext(const char *query, int *location, int *len);
-JumbleState *JumbleQuery(Query *query, const char *querytext);
-
 #endif							/* QUERYJUMBLE_H */
diff --git a/src/test/regress/expected/explain.out b/src/test/regress/expected/explain.out
index cda28098ba..16375d5596 100644
--- a/src/test/regress/expected/explain.out
+++ b/src/test/regress/expected/explain.out
@@ -477,7 +477,7 @@ select jsonb_pretty(
 (1 row)
 
 rollback;
-set compute_query_id = on;
+set compute_query_id = 'default';
 select explain_filter('explain (verbose) select * from int8_tbl i8');
                          explain_filter                         
 ----------------------------------------------------------------

#87

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 4 years ago

In reply to: Kyotaro Horiguchi (#86)

Re: compute_query_id and pg_stat_statements

At Thu, 13 May 2021 13:26:37 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

At Thu, 13 May 2021 12:11:12 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

As the result, even if we take the DLL approach, still not need to
split out the query-id provider part. By the following config:

query_id_provider = 'pg_stat_statements'

the core can obtain the entrypoint of, say, "_PG_calculate_query_id"
to call it. And it can be of another module.

It seems like the only problem doing that is we don't have a means to
call per-process intializer for a preload libralies.

So this is a crude PoC of that.

pg_stat_statemnts defines its own query-id provider function in
pg_stat_statements which calls in-core DefaultJumbeQuery (end emits a
log line).

If server started with query_id_provider='auto' and pg_stat_statements
is not loaded, pg_stat_activity.query_id is null.

If query_id_provider='auto' and pg_stat_statements is loaded,
pg_stat_activity.query_id is filled in with a query id.

If query_id_provider='default' or 'pg_stat_statements' and
pg_stat_statements is not loaded, pg_stat_activity.query_id is filled
in with a query id.

If query_id_provider='none' and pg_stat_statements is loaded, it
complains as "query id provider is not available" and refuss to start.

If showing the variable, it shows the real provider name instead of
the setting in postgresql.conf.

The change contains needless things that tries to handle per-backend
change case, so it would be simpler assuming we don't want on-the-fly
change of provider (and I believe so since that change surely causes
inconsistency)

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#88

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Kyotaro Horiguchi (#86)

Re: compute_query_id and pg_stat_statements

Le jeu. 13 mai 2021 à 12:26, Kyotaro Horiguchi <horikyota.ntt@gmail.com> a
écrit :

At Thu, 13 May 2021 12:11:12 +0900 (JST), Kyotaro Horiguchi <
horikyota.ntt@gmail.com> wrote in

As the result, even if we take the DLL approach, still not need to
split out the query-id provider part. By the following config:

query_id_provider = 'pg_stat_statements'

the core can obtain the entrypoint of, say, "_PG_calculate_query_id"
to call it. And it can be of another module.

It seems like the only problem doing that is we don't have a means to
call per-process intializer for a preload libralies.

So this is a crude PoC of that.

pg_stat_statemnts defines its own query-id provider function in
pg_stat_statements which calls in-core DefaultJumbeQuery (end emits a
log line).

If server started with query_id_provider='auto' and pg_stat_statements
is not loaded, pg_stat_activity.query_id is null.

If query_id_provider='auto' and pg_stat_statements is loaded,
pg_stat_activity.query_id is filled in with a query id.

If query_id_provider='default' or 'pg_stat_statements' and
pg_stat_statements is not loaded, pg_stat_activity.query_id is filled
in with a query id.

If query_id_provider='none' and pg_stat_statements is loaded, it
complains as "query id provider is not available" and refuss to start.

If showing the variable, it shows the real provider name instead of
the setting in postgresql.conf.

what if you want to have some other extensions like pg_stat_kcache or
pg_store_plans that need a query_id but don't really care if
pg_stat_statements is enabled or not? should they all declare their own
wrapper too? should the system complain or silently ignore when they all
try to install their provider function?

Show quoted text

#89

Maciek Sakrejda

m.sakrejda@gmail.com

over 4 years ago

In reply to: Julien Rouhaud (#83)

Re: compute_query_id and pg_stat_statements

On Wed, May 12, 2021 at 9:03 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Wed, May 12, 2021 at 11:33:32PM -0400, Bruce Momjian wrote:

How do they know to set shared_preload_libraries then? We change the
user API all the time, especially in GUCs, and even rename them, but for
some reason we don't think people using pg_stat_statements can be
trusted to read the release notes and change their behavior. I just
don't get it.

I don't know what to say. So here is a summary of the complaints that I'm
aware of:

- /messages/by-id/1953aec168224b95b0c962a622bef0794da6ff40.camel@moonset.ru
That was only a couple of days after the commit just before the feature freeze,
so it may be the less relevant complaint.

- /messages/by-id/CAOxo6XJEYunL71g0yD-zRzNRRqBG0Ssw-ARygy5pGRdSjK5YLQ@mail.gmail.com
Did a git bisect to find the commit that changed the behavior and somehow
didn't notice the new setting

- this thread, with Fuji-san saying:

I'm afraid that users may easily forget to enable compute_query_id when using
pg_stat_statements (because this setting was not necessary in v13 or before)

- this thread, with Peter E. saying:

Now there is the additional burden of turning on this weird setting that
no one understands. That's a 50% increase in burden. And almost no one will
want to use a nondefault setting. pg_stat_statements is pretty popular. I
think leaving in this requirement will lead to widespread confusion and
complaints.

- this thread, with Pavel saying:

Until now, the pg_stat_statements was zero-config. So the change is not user
friendly.

So it's a mix of "it's changing something that didn't change in a long time"
and "it's adding extra footgun and/or burden as it's not doing by default what
the majority of users will want", with an overwhelming majority of people
supporting the "we don't want that extra burden".

For what it's worth, I don't think the actual changing of an extra
setting is that big a burden: it's the figuring out that you need to
change it, and how you should configure it, that is the problem.
Especially since all major search engines still seem to return 9.4 (!)
documentation as the first hit for a "pg_stat_statements" search. The
common case (installing pg_stat_statements but not tweaking query id
generation) should be simple.

#90

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Maciek Sakrejda (#89)

Re: compute_query_id and pg_stat_statements

Le jeu. 13 mai 2021 à 12:52, Maciek Sakrejda <m.sakrejda@gmail.com> a
écrit :

For what it's worth, I don't think the actual changing of an extra
setting is that big a burden: it's the figuring out that you need to
change it, and how you should configure it, that is the problem.
Especially since all major search engines still seem to return 9.4 (!)
documentation as the first hit for a "pg_stat_statements" search. The
common case (installing pg_stat_statements but not tweaking query id
generation) should be simple.

the v2 patch I sent should address both your requirements.

Show quoted text

#91

Maciek Sakrejda

m.sakrejda@gmail.com

over 4 years ago

In reply to: Julien Rouhaud (#90)

Re: compute_query_id and pg_stat_statements

On Wed, May 12, 2021 at 9:58 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

Le jeu. 13 mai 2021 à 12:52, Maciek Sakrejda <m.sakrejda@gmail.com> a écrit :

For what it's worth, I don't think the actual changing of an extra
setting is that big a burden: it's the figuring out that you need to
change it, and how you should configure it, that is the problem.
Especially since all major search engines still seem to return 9.4 (!)
documentation as the first hit for a "pg_stat_statements" search. The
common case (installing pg_stat_statements but not tweaking query id
generation) should be simple.

the v2 patch I sent should address both your requirements.

Yes, thanks--I just tried it and this is great. I just wanted to argue
against reversing course here.

#92

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Maciek Sakrejda (#91)

Re: compute_query_id and pg_stat_statements

On Wed, May 12, 2021 at 10:31:01PM -0700, Maciek Sakrejda wrote:

On Wed, May 12, 2021 at 9:58 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

Le jeu. 13 mai 2021 ï¿½ 12:52, Maciek Sakrejda <m.sakrejda@gmail.com> a ï¿½crit :

For what it's worth, I don't think the actual changing of an extra
setting is that big a burden: it's the figuring out that you need to
change it, and how you should configure it, that is the problem.
Especially since all major search engines still seem to return 9.4 (!)
documentation as the first hit for a "pg_stat_statements" search. The
common case (installing pg_stat_statements but not tweaking query id
generation) should be simple.

the v2 patch I sent should address both your requirements.

Yes, thanks--I just tried it and this is great. I just wanted to argue
against reversing course here.

Oh ok. Good news then, thanks!

#93

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 4 years ago

In reply to: Julien Rouhaud (#88)

Re: compute_query_id and pg_stat_statements

At Thu, 13 May 2021 12:33:47 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in

Le jeu. 13 mai 2021 à 12:26, Kyotaro Horiguchi <horikyota.ntt@gmail.com> a
écrit :

At Thu, 13 May 2021 12:11:12 +0900 (JST), Kyotaro Horiguchi <
horikyota.ntt@gmail.com> wrote in
pg_stat_statemnts defines its own query-id provider function in
pg_stat_statements which calls in-core DefaultJumbeQuery (end emits a
log line).

If server started with query_id_provider='auto' and pg_stat_statements
is not loaded, pg_stat_activity.query_id is null.

If query_id_provider='auto' and pg_stat_statements is loaded,
pg_stat_activity.query_id is filled in with a query id.

If query_id_provider='default' or 'pg_stat_statements' and
pg_stat_statements is not loaded, pg_stat_activity.query_id is filled
in with a query id.

If query_id_provider='none' and pg_stat_statements is loaded, it
complains as "query id provider is not available" and refuss to start.

If showing the variable, it shows the real provider name instead of
the setting in postgresql.conf.

what if you want to have some other extensions like pg_stat_kcache or
pg_store_plans that need a query_id but don't really care if
pg_stat_statements is enabled or not? should they all declare their own

Thanks for looking it.

The addtional provider function in pg_stat_statements is just an
example to show what if it needs its own query-id provider, which is
useless in reality. In reality pg_stat_statements just calls
"queryIdWanted("default", true)" to use any query-id provider and use
the in-core one as the fallback implement, and no need to define its
own one.

Any extension can use the in-core provider and accepting any other
ones by calling queryIdWanted("default", true) then get what they want
regardless of existence of pg_stat_statements.

wrapper too? should the system complain or silently ignore when they all
try to install their provider function?

Of course if two extensions require diffrent query-id providers, they
just cannot coexist (that is, server refuses to start). It is quite
sane behavior in the standpoint of safety. I think almost all
query-id users don't insist on a specific implmentation. (So the
second parameter to queryIdWanted() could be omtted and assumed to be
true.)

reagrds.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#94

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Kyotaro Horiguchi (#93)

Re: compute_query_id and pg_stat_statements

On Thu, May 13, 2021 at 04:15:30PM +0900, Kyotaro Horiguchi wrote:

what if you want to have some other extensions like pg_stat_kcache or
pg_store_plans that need a query_id but don't really care if
pg_stat_statements is enabled or not? should they all declare their own

Thanks for looking it.

The addtional provider function in pg_stat_statements is just an
example to show what if it needs its own query-id provider, which is
useless in reality. In reality pg_stat_statements just calls
"queryIdWanted("default", true)" to use any query-id provider and use
the in-core one as the fallback implement, and no need to define its
own one.

Any extension can use the in-core provider and accepting any other
ones by calling queryIdWanted("default", true) then get what they want
regardless of existence of pg_stat_statements.

I see, thanks for the clarification. So I looked a bit at the implementation,
mostly the new queryIdWanted() and check_query_id_provider(), it seems a bit
inconsistent.

It's not clear to me how this should be used. It seems that it's designed to
allow any plugin to activate a query_id implementation, but if a third-party
query_id provider tries to activate its own implementation it will fail if you
also want to use pg_stat_statements as both will try to activate incompatible
implementations. It seems to me that queryIdWanted() should only be used for
enabling core query_id if the configuration allows the core implementation to
be enabled, and everything else should be manually configured by users, so
there shouldn't be a provider_name.

#95

Bruce Momjian

bruce@momjian.us

over 4 years ago

In reply to: Julien Rouhaud (#83)

Re: compute_query_id and pg_stat_statements

On Thu, May 13, 2021 at 12:03:42PM +0800, Julien Rouhaud wrote:

On Wed, May 12, 2021 at 11:33:32PM -0400, Bruce Momjian wrote:
I don't know what to say. So here is a summary of the complaints that I'm
aware of:

- /messages/by-id/1953aec168224b95b0c962a622bef0794da6ff40.camel@moonset.ru
That was only a couple of days after the commit just before the feature freeze,
so it may be the less relevant complaint.

- /messages/by-id/CAOxo6XJEYunL71g0yD-zRzNRRqBG0Ssw-ARygy5pGRdSjK5YLQ@mail.gmail.com
Did a git bisect to find the commit that changed the behavior and somehow
didn't notice the new setting

- this thread, with Fuji-san saying:

I'm afraid that users may easily forget to enable compute_query_id when using
pg_stat_statements (because this setting was not necessary in v13 or before)

- this thread, with Peter E. saying:

Now there is the additional burden of turning on this weird setting that
no one understands. That's a 50% increase in burden. And almost no one will
want to use a nondefault setting. pg_stat_statements is pretty popular. I
think leaving in this requirement will lead to widespread confusion and
complaints.

- this thread, with Pavel saying:

Until now, the pg_stat_statements was zero-config. So the change is not user
friendly.

So it's a mix of "it's changing something that didn't change in a long time"
and "it's adding extra footgun and/or burden as it's not doing by default what
the majority of users will want", with an overwhelming majority of people
supporting the "we don't want that extra burden".

Well, now that we have clear warnings when it is misconfigured,
especially when querying the pg_stat_statements view, are these
complaints still valid? Also, how is modifying
shared_preload_libraries zero-config, but modifying
shared_preload_libraries and compute_query_id a huge burden?

I am personally not comfortable committing a patch to add an auto option
the way it is implemented, so another committer will need to take
ownership of this, or the entire feature can be removed.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#96

Maciek Sakrejda

m.sakrejda@gmail.com

over 4 years ago

In reply to: Bruce Momjian (#95)

Re: compute_query_id and pg_stat_statements

On Thu, May 13, 2021 at 7:42 AM Bruce Momjian <bruce@momjian.us> wrote:

On Thu, May 13, 2021 at 12:03:42PM +0800, Julien Rouhaud wrote:

On Wed, May 12, 2021 at 11:33:32PM -0400, Bruce Momjian wrote:
I don't know what to say. So here is a summary of the complaints that I'm
aware of:

- /messages/by-id/1953aec168224b95b0c962a622bef0794da6ff40.camel@moonset.ru
That was only a couple of days after the commit just before the feature freeze,
so it may be the less relevant complaint.

- /messages/by-id/CAOxo6XJEYunL71g0yD-zRzNRRqBG0Ssw-ARygy5pGRdSjK5YLQ@mail.gmail.com
Did a git bisect to find the commit that changed the behavior and somehow
didn't notice the new setting

- this thread, with Fuji-san saying:

I'm afraid that users may easily forget to enable compute_query_id when using
pg_stat_statements (because this setting was not necessary in v13 or before)

- this thread, with Peter E. saying:

Now there is the additional burden of turning on this weird setting that
no one understands. That's a 50% increase in burden. And almost no one will
want to use a nondefault setting. pg_stat_statements is pretty popular. I
think leaving in this requirement will lead to widespread confusion and
complaints.

- this thread, with Pavel saying:

Until now, the pg_stat_statements was zero-config. So the change is not user
friendly.

So it's a mix of "it's changing something that didn't change in a long time"
and "it's adding extra footgun and/or burden as it's not doing by default what
the majority of users will want", with an overwhelming majority of people
supporting the "we don't want that extra burden".

Well, now that we have clear warnings when it is misconfigured,
especially when querying the pg_stat_statements view, are these
complaints still valid? Also, how is modifying
shared_preload_libraries zero-config, but modifying
shared_preload_libraries and compute_query_id a huge burden?

The warning makes it clear that there's something wrong, but not how
to fix it (as I noted in another message in this thread, a web search
for pg_stat_statements docs still leads to an obsolete version). I
don't think anyone is arguing that this is insurmountable for all
users, but it is additional friction, and every bit of friction makes
Postgres harder to use. Users don't read documentation, or misread
documentation, or just are not sure what the documentation or the
warning is telling them, in spite of our best efforts.

And you're right, modifying shared_preload_libraries is not
zero-config--I would love it if we could drop that requirement ;).
Still, adding another setting is clearly one more thing to get wrong.

I am personally not comfortable committing a patch to add an auto option
the way it is implemented, so another committer will need to take
ownership of this, or the entire feature can be removed.

Assuming we do want to avoid additional configuration requirements for
pg_stat_statements, is there another mechanism you feel would work
better? Or is that constraint incompatible with sane behavior for this
feature?

#97

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Bruce Momjian (#95)

Re: compute_query_id and pg_stat_statements

On Thu, May 13, 2021 at 10:41:43AM -0400, Bruce Momjian wrote:

Well, now that we have clear warnings when it is misconfigured,
especially when querying the pg_stat_statements view, are these
complaints still valid?

I'm personally fine with it, and I can send a new version with just the
warning when calling pg_stat_statements() or one of the view(s). Or was there
other warnings that you were referring too?

I am personally not comfortable committing a patch to add an auto option
the way it is implemented, so another committer will need to take
ownership of this, or the entire feature can be removed.

That's fair. Just to be clear, I'm assuming that you also don't like
Horigushi-san approach either?

#98

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Maciek Sakrejda (#96)

Re: compute_query_id and pg_stat_statements

On Thu, May 13, 2021 at 08:32:50AM -0700, Maciek Sakrejda wrote:

The warning makes it clear that there's something wrong, but not how
to fix it

I'm confused, are we talking about the new warning in v2 as suggested by Pavel?
As it should make things quite clear:

+SELECT count(*) FROM pg_stat_statements;
+WARNING:  Query identifier calculation seems to be disabled
+HINT:  If you don't want to use a third-party module to compute query identifiers, you may want to enable compute_query_id

The wording can of course be improved.

#99

Bruce Momjian

bruce@momjian.us

over 4 years ago

In reply to: Maciek Sakrejda (#96)

Re: compute_query_id and pg_stat_statements

On Thu, May 13, 2021 at 08:32:50AM -0700, Maciek Sakrejda wrote:

Well, now that we have clear warnings when it is misconfigured,
especially when querying the pg_stat_statements view, are these
complaints still valid? Also, how is modifying
shared_preload_libraries zero-config, but modifying
shared_preload_libraries and compute_query_id a huge burden?

The warning makes it clear that there's something wrong, but not how
to fix it (as I noted in another message in this thread, a web search
for pg_stat_statements docs still leads to an obsolete version). I
don't think anyone is arguing that this is insurmountable for all
users, but it is additional friction, and every bit of friction makes
Postgres harder to use. Users don't read documentation, or misread
documentation, or just are not sure what the documentation or the
warning is telling them, in spite of our best efforts.

Well, then let's have the error message tell them what is wrong and how
to fix it. My issue is that 'auto' spreads confusion around the entire
API, as you can see from the discussion in this thread.

And you're right, modifying shared_preload_libraries is not
zero-config--I would love it if we could drop that requirement ;).
Still, adding another setting is clearly one more thing to get wrong.

I am personally not comfortable committing a patch to add an auto option
the way it is implemented, so another committer will need to take
ownership of this, or the entire feature can be removed.

Assuming we do want to avoid additional configuration requirements for
pg_stat_statements, is there another mechanism you feel would work
better? Or is that constraint incompatible with sane behavior for this
feature?

I think we just need to leave it is on/off, and then help people find
the way to fix it if the misconfigure it, which I think is already been
shown to be possible.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#100

Bruce Momjian

bruce@momjian.us

over 4 years ago

In reply to: Julien Rouhaud (#97)

Re: compute_query_id and pg_stat_statements

On Thu, May 13, 2021 at 11:35:13PM +0800, Julien Rouhaud wrote:

On Thu, May 13, 2021 at 10:41:43AM -0400, Bruce Momjian wrote:

Well, now that we have clear warnings when it is misconfigured,
especially when querying the pg_stat_statements view, are these
complaints still valid?

I'm personally fine with it, and I can send a new version with just the
warning when calling pg_stat_statements() or one of the view(s). Or was there
other warnings that you were referring too?

No, that was the big fix that made misconfiguration very clear to users
who didn't see the change before.

I am personally not comfortable committing a patch to add an auto option
the way it is implemented, so another committer will need to take
ownership of this, or the entire feature can be removed.

That's fair. Just to be clear, I'm assuming that you also don't like
Horigushi-san approach either?

Uh, anything with 'auto', I don't like. I am afraid if I commit it, I
would feel responsible for the long tail of confusion this will cause
users, which is why I was saying I would rather remove it than be
responsible for causing such confusion.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#101

Maciek Sakrejda

m.sakrejda@gmail.com

over 4 years ago

In reply to: Julien Rouhaud (#98)

Re: compute_query_id and pg_stat_statements

On Thu, May 13, 2021 at 8:38 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Thu, May 13, 2021 at 08:32:50AM -0700, Maciek Sakrejda wrote:

The warning makes it clear that there's something wrong, but not how
to fix it

I'm confused, are we talking about the new warning in v2 as suggested by Pavel?
As it should make things quite clear:
+SELECT count(*) FROM pg_stat_statements;
+WARNING:  Query identifier calculation seems to be disabled
+HINT:  If you don't want to use a third-party module to compute query identifiers, you may want to enable compute_query_id
The wording can of course be improved.

I meant that no warning can be as clear as things just working, but I
do have feedback on the specific message here:

* "seems to" be disabled? Is it? Any reason not to be more definitive here?
* On reading the beginning of the hint, I can see users asking
themselves, "Do I want to use a third-party module to compute query
identifiers?"
* "may want to enable"? Are there any situations where I don't want
to use a third-party module *and* I don't want to enable
compute_query_id?

So maybe something like

+SELECT count(*) FROM pg_stat_statements;
+WARNING:  Query identifier calculation is disabled
+HINT:  You must enable compute_query_id or configure a third-party module to compute query identifiers in order to use pg_stat_statements.

(I admit "configure a third-party module" is pretty vague, but I think
that suggests it's only an option to consider if you know what you're
doing.)

Also, if you're configuring this for usage with a tool like pganalyze,
and neglect to run a manual query (we guide users to do that, but they
may skip that step), the warnings may not even be visible (the Go
driver we are using does not surface client warnings). Should this be
an error instead of a warning? Is it ever useful to get an empty
result set from querying pg_stat_statements? Using an error here would
parallel the behavior of shared_preload_libraries not including
pg_stat_statements.

#102

Bruce Momjian

bruce@momjian.us

over 4 years ago

In reply to: Maciek Sakrejda (#101)

Re: compute_query_id and pg_stat_statements

On Thu, May 13, 2021 at 09:30:55AM -0700, Maciek Sakrejda wrote:

On Thu, May 13, 2021 at 8:38 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Thu, May 13, 2021 at 08:32:50AM -0700, Maciek Sakrejda wrote:

The warning makes it clear that there's something wrong, but not how
to fix it

I'm confused, are we talking about the new warning in v2 as suggested by Pavel?
As it should make things quite clear:
+SELECT count(*) FROM pg_stat_statements;
+WARNING:  Query identifier calculation seems to be disabled
+HINT:  If you don't want to use a third-party module to compute query identifiers, you may want to enable compute_query_id
The wording can of course be improved.
I meant that no warning can be as clear as things just working, but I
do have feedback on the specific message here:

* "seems to" be disabled? Is it? Any reason not to be more definitive here?
* On reading the beginning of the hint, I can see users asking
themselves, "Do I want to use a third-party module to compute query
identifiers?"
* "may want to enable"? Are there any situations where I don't want
to use a third-party module *and* I don't want to enable
compute_query_id?

So maybe something like
+SELECT count(*) FROM pg_stat_statements;
+WARNING:  Query identifier calculation is disabled
+HINT:  You must enable compute_query_id or configure a third-party module to compute query identifiers in order to use pg_stat_statements.

Yes, I like this. The reason the old message was so vague is that
'auto', the default some people wanted, didn't issue that error, only
'off' did, so there was an assumption you wanted a custom module since
you changed the value to off. If we are going with just on/off, no
auto, the message you suggest, leading with compute_query_id, is the
right approach.

(I admit "configure a third-party module" is pretty vague, but I think
that suggests it's only an option to consider if you know what you're
doing.)

Seems fine to me.

Also, if you're configuring this for usage with a tool like pganalyze,
and neglect to run a manual query (we guide users to do that, but they
may skip that step), the warnings may not even be visible (the Go
driver we are using does not surface client warnings). Should this be
an error instead of a warning? Is it ever useful to get an empty
result set from querying pg_stat_statements? Using an error here would
parallel the behavior of shared_preload_libraries not including
pg_stat_statements.

Good question.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#103

Andrew Dunstan

andrew@dunslane.net

over 4 years ago

In reply to: Fujii Masao (#84)

Re: compute_query_id and pg_stat_statements

On 5/13/21 12:18 AM, Fujii Masao wrote:

If we do this, compute_query_id=auto seems to be similar to
huge_pages=try.
When huge_pages=try, whether huge pages is actually used is defined
depending
outside PostgreSQL (i.e, OS setting in this case). Neither
pg_settings.setting nor
souce are not changed in that case.

Not a bad analogy, showing there's some precedent for this sort of thing.

The only thing that bugs me is that we're pretty damn late in the
process to be engaging in this amount of design.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#104

Bruce Momjian

bruce@momjian.us

over 4 years ago

In reply to: Andrew Dunstan (#103)

Re: compute_query_id and pg_stat_statements

On Thu, May 13, 2021 at 12:45:07PM -0400, Andrew Dunstan wrote:

On 5/13/21 12:18 AM, Fujii Masao wrote:

If we do this, compute_query_id=auto seems to be similar to
huge_pages=try.
When huge_pages=try, whether huge pages is actually used is defined
depending
outside PostgreSQL (i.e, OS setting in this case). Neither
pg_settings.setting nor
souce are not changed in that case.

Not a bad analogy, showing there's some precedent for this sort of thing.

The only thing that bugs me is that we're pretty damn late in the
process to be engaging in this amount of design.

The issue is that there is no external way to check what query id
computation is being used, unlike huge pages, which can be queried from
the operating system. I also agree it is late, and discussion of auto
continues to show cases where this makes later improvements more
complex.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#105

Tom Lane

tgl@sss.pgh.pa.us

over 4 years ago

In reply to: Andrew Dunstan (#103)

Re: compute_query_id and pg_stat_statements

Andrew Dunstan <andrew@dunslane.net> writes:

The only thing that bugs me is that we're pretty damn late in the
process to be engaging in this amount of design.

Indeed. I feel that this feature was forced in before it was really
ready.

regards, tom lane

#106

Bruce Momjian

bruce@momjian.us

over 4 years ago

In reply to: Tom Lane (#105)

Re: compute_query_id and pg_stat_statements

On Thu, May 13, 2021 at 01:17:16PM -0400, Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

The only thing that bugs me is that we're pretty damn late in the
process to be engaging in this amount of design.

Indeed. I feel that this feature was forced in before it was really
ready.

The user API has always been a challenge for this feature but I thought
we had it ironed out. What I didn't anticipate were the configuration
complaints, and if those are valid, the feature should be removed since
we can't improve it at this point, nor do I have any idea if that is
even possible without unacceptable negatives. If the configuration
complaints are invalid, what we have now is very good, I think, though
adding more warnings about misconfiguration would be wise.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#107

Stephen Frost

sfrost@snowman.net

over 4 years ago

In reply to: Tom Lane (#105)

Re: compute_query_id and pg_stat_statements

Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

The only thing that bugs me is that we're pretty damn late in the
process to be engaging in this amount of design.

Indeed. I feel that this feature was forced in before it was really
ready.

I'm coming around to have a similar feeling. While having an
alternative query ID might be useful, I have a hard time seeing it as
likely to be a hugely popular feature that is worth a lot of users
complaining (as we've seen already, multiple times, before even getting
to beta...) that things aren't working anymore. That we can't figure
out which libraries to load automatically based on what extensions have
been installed and therefore make everyone have to change
shared_preload_libraries isn't a good thing and requiring additional
configuration for extremely common extensions like pg_stat_statements is
making it worse.

Thanks,

Stephen

#108

Christoph Berg

myon@debian.org

over 4 years ago

In reply to: Bruce Momjian (#95)

Re: compute_query_id and pg_stat_statements

Re: Bruce Momjian

Well, now that we have clear warnings when it is misconfigured,
especially when querying the pg_stat_statements view, are these
complaints still valid? Also, how is modifying
shared_preload_libraries zero-config, but modifying
shared_preload_libraries and compute_query_id a huge burden?

It's zero-config in the sense that if you want to have
pg_stat_statements, loading that module via shared_preload_libraries
is just natural.

Having to set compute_query_id isn't natural. It's a setting with a
completely different name, and the connection of pg_stat_statements to
compute_query_id isn't obvious.

The reasoning with "we have warnings and stuff" might be ok if
pg_stat_statements were a new thing, but it has worked via
shared_preload_libraries only for the last decade, and requiring
something extra will confuse probably every single user of
pg_stat_statements out there.

Perhaps worse, note that these warnings will likely first be seen by
the end users of databases, not by the admin performing the initial
setup or upgrade, who will not be able to fix the problem themselves.

Christoph

#109

Bruce Momjian

bruce@momjian.us

over 4 years ago

In reply to: Stephen Frost (#107)

Re: compute_query_id and pg_stat_statements

On Thu, May 13, 2021 at 01:33:27PM -0400, Stephen Frost wrote:

I'm coming around to have a similar feeling. While having an
alternative query ID might be useful, I have a hard time seeing it as
likely to be a hugely popular feature that is worth a lot of users
complaining (as we've seen already, multiple times, before even getting
to beta...) that things aren't working anymore. That we can't figure
out which libraries to load automatically based on what extensions have
been installed and therefore make everyone have to change
shared_preload_libraries isn't a good thing and requiring additional
configuration for extremely common extensions like pg_stat_statements is
making it worse.

Would someone please explain what is wrong with what is in the tree
now, except that it needs additional warnings about misconfiguration?
Requiring two postgresql.conf changes instead of one doesn't seem like a
valid complaint to me, especially if the warnings are in place and the
release notes mention it.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#110

Stephen Frost

sfrost@snowman.net

over 4 years ago

In reply to: Bruce Momjian (#109)

Re: compute_query_id and pg_stat_statements

Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:

On Thu, May 13, 2021 at 01:33:27PM -0400, Stephen Frost wrote:

I'm coming around to have a similar feeling. While having an
alternative query ID might be useful, I have a hard time seeing it as
likely to be a hugely popular feature that is worth a lot of users
complaining (as we've seen already, multiple times, before even getting
to beta...) that things aren't working anymore. That we can't figure
out which libraries to load automatically based on what extensions have
been installed and therefore make everyone have to change
shared_preload_libraries isn't a good thing and requiring additional
configuration for extremely common extensions like pg_stat_statements is
making it worse.

Would someone please explain what is wrong with what is in the tree
now, except that it needs additional warnings about misconfiguration?
Requiring two postgresql.conf changes instead of one doesn't seem like a
valid complaint to me, especially if the warnings are in place and the
release notes mention it.

Will you be updating pg_upgrade to detect and throw a warning during
check in the event that it discovers a broken config?

If not, then I don't think you're correct in arguing that this need for
additional configuration isn't a valid complaint.

Thanks,

Stephen

#111

Bruce Momjian

bruce@momjian.us

over 4 years ago

In reply to: Stephen Frost (#110)

Re: compute_query_id and pg_stat_statements

On Thu, May 13, 2021 at 01:51:07PM -0400, Stephen Frost wrote:

Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:

On Thu, May 13, 2021 at 01:33:27PM -0400, Stephen Frost wrote:

I'm coming around to have a similar feeling. While having an
alternative query ID might be useful, I have a hard time seeing it as
likely to be a hugely popular feature that is worth a lot of users
complaining (as we've seen already, multiple times, before even getting
to beta...) that things aren't working anymore. That we can't figure
out which libraries to load automatically based on what extensions have
been installed and therefore make everyone have to change
shared_preload_libraries isn't a good thing and requiring additional
configuration for extremely common extensions like pg_stat_statements is
making it worse.

Would someone please explain what is wrong with what is in the tree
now, except that it needs additional warnings about misconfiguration?
Requiring two postgresql.conf changes instead of one doesn't seem like a
valid complaint to me, especially if the warnings are in place and the
release notes mention it.

Will you be updating pg_upgrade to detect and throw a warning during
check in the event that it discovers a broken config?

Uh, how does this relate to pg_upgrade? Are you saying someone
misconfigures the new system with pg_stat_statements but not query id?
The server would still start and upgrade, no? How is this different
from any other GUC we rename? I am not following much of the logic in
this discussion, frankly.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#112

Bruce Momjian

bruce@momjian.us

over 4 years ago

In reply to: Christoph Berg (#108)

Re: compute_query_id and pg_stat_statements

On Thu, May 13, 2021 at 07:39:45PM +0200, Christoph Berg wrote:

Re: Bruce Momjian

Well, now that we have clear warnings when it is misconfigured,
especially when querying the pg_stat_statements view, are these
complaints still valid? Also, how is modifying
shared_preload_libraries zero-config, but modifying
shared_preload_libraries and compute_query_id a huge burden?

It's zero-config in the sense that if you want to have
pg_stat_statements, loading that module via shared_preload_libraries
is just natural.

Having to set compute_query_id isn't natural. It's a setting with a
completely different name, and the connection of pg_stat_statements to
compute_query_id isn't obvious.

The reasoning with "we have warnings and stuff" might be ok if
pg_stat_statements were a new thing, but it has worked via
shared_preload_libraries only for the last decade, and requiring
something extra will confuse probably every single user of
pg_stat_statements out there.

Perhaps worse, note that these warnings will likely first be seen by
the end users of databases, not by the admin performing the initial
setup or upgrade, who will not be able to fix the problem themselves.

Well, but doing this extra configuration, the query id shows up in a lot
more places. I can't imagine how this could be done cleanly without
requiring extra configuration, unless the query_id computation was
cheaper to compute and we could enable it by default.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#113

Stephen Frost

sfrost@snowman.net

over 4 years ago

In reply to: Bruce Momjian (#112)

Re: compute_query_id and pg_stat_statements

Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:

On Thu, May 13, 2021 at 07:39:45PM +0200, Christoph Berg wrote:

Re: Bruce Momjian

Well, now that we have clear warnings when it is misconfigured,
especially when querying the pg_stat_statements view, are these
complaints still valid? Also, how is modifying
shared_preload_libraries zero-config, but modifying
shared_preload_libraries and compute_query_id a huge burden?

It's zero-config in the sense that if you want to have
pg_stat_statements, loading that module via shared_preload_libraries
is just natural.

Not sure about natural but it's certainly what folks have at least
become used to. We should be working to eliminate it though.

Having to set compute_query_id isn't natural. It's a setting with a
completely different name, and the connection of pg_stat_statements to
compute_query_id isn't obvious.

The reasoning with "we have warnings and stuff" might be ok if
pg_stat_statements were a new thing, but it has worked via
shared_preload_libraries only for the last decade, and requiring
something extra will confuse probably every single user of
pg_stat_statements out there.

As we keep seeing, over and over. The ongoing comments claiming that
it's "just" a minor additional configuration tweak fall pretty flat when
you consider the number of times it's already been brought up, and who
it has been brought up by.

Perhaps worse, note that these warnings will likely first be seen by
the end users of databases, not by the admin performing the initial
setup or upgrade, who will not be able to fix the problem themselves.

I don't think this is appreciated anywhere near well enough. This takes
existing perfectly working configurations and actively breaks them in a
manner that isn't obvious and isn't something that an admin would have
any idea about until after they've upgraded and then started trying to
query the view. That's pretty terrible.

Well, but doing this extra configuration, the query id shows up in a lot
more places. I can't imagine how this could be done cleanly without
requiring extra configuration, unless the query_id computation was
cheaper to compute and we could enable it by default.

There's a ridiculously simple option here which is: drop the idea that
we support an extension redefining the query id and then just make it
on/off with the default to be 'on'. If people actually have a problem
with it being on and they don't want to use pg_stat_statements then they
can turn it off. This won't break any existing configs that are out
there in the field and avoids the complexity of having some kind of
'auto' config. I do agree with the general idea of wanting to be
extensible but I'm not convinced that, in this particular case, it's
worth all of this. I'm somewhat convinced that having a way to disable
the query id is useful in limited cases and if people want a way to do
that, then we can give that to them in a straightfoward way that doens't
break things.

Thanks,

Stephen

#114

Tom Lane

tgl@sss.pgh.pa.us

over 4 years ago

In reply to: Stephen Frost (#113)

Re: compute_query_id and pg_stat_statements

Stephen Frost <sfrost@snowman.net> writes:

There's a ridiculously simple option here which is: drop the idea that
we support an extension redefining the query id and then just make it
on/off with the default to be 'on'.

I do not think that defaulting it to 'on' is acceptable unless you can
show that the added overhead is negligible. IIUC the measurements that
have been done show the opposite.

Maybe we should revert this thing pending somebody doing the work to
make a version of queryid labeling that actually is negligibly cheap.
It certainly seems like that could be done; one more traversal of the
parse tree can't be that expensive in itself. I suspect that the
performance problem is with the particular hashing mechanism that
was used, which looks mighty ad-hoc anyway.

regards, tom lane

#115

Stephen Frost

sfrost@snowman.net

over 4 years ago

In reply to: Tom Lane (#114)

Re: compute_query_id and pg_stat_statements

Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Stephen Frost <sfrost@snowman.net> writes:

There's a ridiculously simple option here which is: drop the idea that
we support an extension redefining the query id and then just make it
on/off with the default to be 'on'.

I do not think that defaulting it to 'on' is acceptable unless you can
show that the added overhead is negligible. IIUC the measurements that
have been done show the opposite.

Ah, right, it had only been done before when pg_stat_statements was
loaded.. In which case, it seems like we should:

a) go back to that

b) if someone wants an alternative query ID, tell them to add it to
pg_stat_statements and make it configurable *there*

c) Have pg_stat_statements provide another function/view/etc that folks
can use to get a queryid for an ongoing query ..?

d) Maybe come up with a way for extensions, generically, to make a value
available to log_line_prefix? That could be pretty interesting.

Or just accept that this is a bit hokey with the 'auto' approach. I get
Bruce has concerns about it but I'm not convinced that it's actually all
that bad to go with that.

Thanks,

Stephen

#116

Alvaro Herrera

alvherre@alvh.no-ip.org

over 4 years ago

In reply to: Stephen Frost (#115)

Re: compute_query_id and pg_stat_statements

On 2021-May-13, Stephen Frost wrote:

Or just accept that this is a bit hokey with the 'auto' approach. I get
Bruce has concerns about it but I'm not convinced that it's actually all
that bad to go with that.

Yeah, I think the alleged confusion there is overstated.

I'm happy to act as committer for that if he wants to step away from it.
I'm already used to being lapidated at every corner anyway.

--
ï¿½lvaro Herrera Valdivia, Chile
"E pur si muove" (Galileo Galilei)

#117

Bruce Momjian

bruce@momjian.us

over 4 years ago

In reply to: Tom Lane (#114)

Re: compute_query_id and pg_stat_statements

On Thu, May 13, 2021 at 02:29:09PM -0400, Tom Lane wrote:

Stephen Frost <sfrost@snowman.net> writes:

There's a ridiculously simple option here which is: drop the idea that
we support an extension redefining the query id and then just make it
on/off with the default to be 'on'.

I do not think that defaulting it to 'on' is acceptable unless you can
show that the added overhead is negligible. IIUC the measurements that
have been done show the opposite.

Agreed.

Maybe we should revert this thing pending somebody doing the work to
make a version of queryid labeling that actually is negligibly cheap.
It certainly seems like that could be done; one more traversal of the
parse tree can't be that expensive in itself. I suspect that the
performance problem is with the particular hashing mechanism that
was used, which looks mighty ad-hoc anyway.

I was surprised it was ~2%.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#118

Bruce Momjian

bruce@momjian.us

over 4 years ago

In reply to: Alvaro Herrera (#116)

Re: compute_query_id and pg_stat_statements

On Thu, May 13, 2021 at 03:04:30PM -0400, ï¿½lvaro Herrera wrote:

On 2021-May-13, Stephen Frost wrote:

Or just accept that this is a bit hokey with the 'auto' approach. I get
Bruce has concerns about it but I'm not convinced that it's actually all
that bad to go with that.

Yeah, I think the alleged confusion there is overstated.

I'm happy to act as committer for that if he wants to step away from it.
I'm already used to being lapidated at every corner anyway.

OK, feel free to take ownership of it, thanks.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#119

Andrew Dunstan

andrew@dunslane.net

over 4 years ago

In reply to: Alvaro Herrera (#116)

Re: compute_query_id and pg_stat_statements

On 5/13/21 3:04 PM, Alvaro Herrera wrote:

On 2021-May-13, Stephen Frost wrote:

Or just accept that this is a bit hokey with the 'auto' approach. I get
Bruce has concerns about it but I'm not convinced that it's actually all
that bad to go with that.

Yeah, I think the alleged confusion there is overstated.

I'm happy to act as committer for that if he wants to step away from it.
I'm already used to being lapidated at every corner anyway.

Many thanks Alvaro, among other things for teaching me a new word.

cheers

(delapidated) andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#120

Bruce Momjian

bruce@momjian.us

over 4 years ago

In reply to: Andrew Dunstan (#119)

Re: compute_query_id and pg_stat_statements

On Thu, May 13, 2021 at 07:19:11PM -0400, Andrew Dunstan wrote:

On 5/13/21 3:04 PM, Alvaro Herrera wrote:

On 2021-May-13, Stephen Frost wrote:

Or just accept that this is a bit hokey with the 'auto' approach. I get
Bruce has concerns about it but I'm not convinced that it's actually all
that bad to go with that.

Yeah, I think the alleged confusion there is overstated.

I'm happy to act as committer for that if he wants to step away from it.
I'm already used to being lapidated at every corner anyway.

Many thanks Alvaro, among other things for teaching me a new word.

(delapidated) andrew

Yes, I had to look it up too. :-)

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#121

Michael Paquier

michael@paquier.xyz

over 4 years ago

In reply to: Andrew Dunstan (#119)

Re: compute_query_id and pg_stat_statements

On Thu, May 13, 2021 at 07:19:11PM -0400, Andrew Dunstan wrote:

On 5/13/21 3:04 PM, Alvaro Herrera wrote:

I'm happy to act as committer for that if he wants to step away from it.
I'm already used to being lapidated at every corner anyway.

Many thanks Alvaro, among other things for teaching me a new word.

+1.  Thanks, Alvaro.
--
Michael

#122

Alvaro Herrera

alvherre@alvh.no-ip.org

over 4 years ago

In reply to: Julien Rouhaud (#62)

1 attachment(s)

Re: compute_query_id and pg_stat_statements

Here's a first attempt at what was suggested. If you say "auto" it
remains auto in SHOW, but it gets enabled if a module asks for it.

Not final yet, but I thought I'd throw it out for early commentary ...

--
ï¿½lvaro Herrera Valdivia, Chile

Attachments:

Change-compute_query_id-to-an-enum-GUC-v3.patchtext/x-diff; charset=us-asciiDownload

diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index 40b5109b55..f7ce01539f 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -1067,4 +1067,20 @@ SELECT COUNT(*) FROM pg_stat_statements WHERE query LIKE '%SELECT GROUPING%';
      2
 (1 row)
 
+-- Check that pg_stat_statements() will complain if the configuration appears
+-- to be broken.
+SET compute_query_id = off;
+SELECT pg_stat_statements_reset();
+ pg_stat_statements_reset 
+--------------------------
+ 
+(1 row)
+
+SELECT count(*) FROM pg_stat_statements;
+WARNING:  query identifier calculation is disabled
+ count 
+-------
+     0
+(1 row)
+
 DROP EXTENSION pg_stat_statements;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index a85f962801..fab684a649 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -369,6 +369,12 @@ _PG_init(void)
 	if (!process_shared_preload_libraries_in_progress)
 		return;
 
+	/*
+	 * Inform the postmaster that we want to enable query_id calculation if
+	 * compute_query_id is set to auto.
+	 */
+	EnableQueryId();
+
 	/*
 	 * Define (or redefine) custom GUC variables.
 	 */
@@ -1617,6 +1623,18 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
 	 */
 	LWLockAcquire(pgss->lock, LW_SHARED);
 
+	/*
+	 * If no query_id has been computed for the calling query and there is
+	 * no entries stored, then there's likely a configuration error that caller
+	 * may not be aware of so point it out.
+	 */
+	if (pgstat_get_my_query_id() == UINT64CONST(0) &&
+		compute_query_id == COMPUTE_QUERY_ID_OFF &&
+		hash_get_num_entries(pgss_hash) == 0)
+		ereport(WARNING,
+				errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				errmsg("query identifier calculation is disabled"));
+
 	if (showtext)
 	{
 		/*
diff --git a/contrib/pg_stat_statements/pg_stat_statements.conf b/contrib/pg_stat_statements/pg_stat_statements.conf
index e47b26040f..13346e2807 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.conf
+++ b/contrib/pg_stat_statements/pg_stat_statements.conf
@@ -1,2 +1 @@
 shared_preload_libraries = 'pg_stat_statements'
-compute_query_id = on
diff --git a/contrib/pg_stat_statements/sql/pg_stat_statements.sql b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
index bc3b6493e6..827a8e3d18 100644
--- a/contrib/pg_stat_statements/sql/pg_stat_statements.sql
+++ b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
@@ -437,4 +437,10 @@ SELECT (
 
 SELECT COUNT(*) FROM pg_stat_statements WHERE query LIKE '%SELECT GROUPING%';
 
+-- Check that pg_stat_statements() will complain if the configuration appears
+-- to be broken.
+SET compute_query_id = off;
+SELECT pg_stat_statements_reset();
+SELECT count(*) FROM pg_stat_statements;
+
 DROP EXTENSION pg_stat_statements;
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 45bd1f1b7e..60d8b24f5e 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7627,7 +7627,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
      <variablelist>
 
      <varlistentry id="guc-compute-query-id" xreflabel="compute_query_id">
-      <term><varname>compute_query_id</varname> (<type>boolean</type>)
+      <term><varname>compute_query_id</varname> (<type>enum</type>)
       <indexterm>
        <primary><varname>compute_query_id</varname> configuration parameter</primary>
       </indexterm>
@@ -7643,7 +7643,12 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
         identifier to be computed.  Note that an external module can
         alternatively be used if the in-core query identifier computation
         method is not acceptable.  In this case, in-core computation
-        must be disabled.  The default is <literal>off</literal>.
+        must be always disabled.
+        Valid values are <literal>off</literal> (always disabled),
+        <literal>on</literal> (always enabled) and <literal>auto</literal>,
+        which let modules such as <xref linkend="pgstatstatements"/>
+        automatically enable it.
+        The default is <literal>auto</literal>.
        </para>
        <note>
         <para>
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index bc2b6038ee..acfb134797 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -22,10 +22,15 @@
 
  <para>
   The module will not track statistics unless query
-  identifiers are calculated.  This can be done by enabling <xref
-  linkend="guc-compute-query-id"/> or using a third-party module that
-  computes its own query identifiers.  Note that all statistics tracked
-  by this module must be reset if the query identifier method is changed.
+  identifiers are calculated.  This is done by automatically when this
+  extension is configured in <literal>shared_preload_libraries</literal> and
+  <xref linkend="guc-compute-query-id"/> is set to <literal>auto</literal>
+  (which is the default value), or always if <xref
+  linkend="guc-compute-query-id"/> is set to <literal>on</literal>.
+  You must set <xref linkend="guc-compute-query-id"/> to <literal>off</literal>
+  if you want to use a third-party module that computes its own query
+      identifiers.  Note that all statistics tracked by this module must be
+      reset if the query identifier method is changed.
  </para>
 
  <para>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 1202bf85a3..9a60865d19 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -245,7 +245,7 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 	es->summary = (summary_set) ? es->summary : es->analyze;
 
 	query = castNode(Query, stmt->query);
-	if (compute_query_id)
+	if (IsQueryIdEnabled())
 		jstate = JumbleQuery(query, pstate->p_sourcetext);
 
 	if (post_parse_analyze_hook)
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 168198acd1..201b88d1ad 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -124,7 +124,7 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 
 	query = transformTopLevelStmt(pstate, parseTree);
 
-	if (compute_query_id)
+	if (IsQueryIdEnabled())
 		jstate = JumbleQuery(query, sourceText);
 
 	if (post_parse_analyze_hook)
@@ -163,7 +163,7 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 	/* make sure all is well with parameter types */
 	check_variable_parameters(pstate, query);
 
-	if (compute_query_id)
+	if (IsQueryIdEnabled())
 		jstate = JumbleQuery(query, sourceText);
 
 	if (post_parse_analyze_hook)
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 6833f0f7f2..9ca1095f47 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -521,6 +521,7 @@ typedef struct
 	pg_time_t	first_syslogger_file_time;
 	bool		redirection_done;
 	bool		IsBinaryUpgrade;
+	bool		auto_query_id_enabled;
 	int			max_safe_fds;
 	int			MaxBackends;
 #ifdef WIN32
@@ -6168,6 +6169,7 @@ save_backend_variables(BackendParameters *param, Port *port,
 
 	param->redirection_done = redirection_done;
 	param->IsBinaryUpgrade = IsBinaryUpgrade;
+	param->auto_query_id_enabled = auto_query_id_enabled;
 	param->max_safe_fds = max_safe_fds;
 
 	param->MaxBackends = MaxBackends;
@@ -6401,6 +6403,7 @@ restore_backend_variables(BackendParameters *param, Port *port)
 
 	redirection_done = param->redirection_done;
 	IsBinaryUpgrade = param->IsBinaryUpgrade;
+	auto_query_id_enabled = param->auto_query_id_enabled;
 	max_safe_fds = param->max_safe_fds;
 
 	MaxBackends = param->MaxBackends;
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 6200699ddd..f9c6ca80f1 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -704,7 +704,7 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 
 	query = transformTopLevelStmt(pstate, parsetree);
 
-	if (compute_query_id)
+	if (IsQueryIdEnabled())
 		jstate = JumbleQuery(query, query_string);
 
 	if (post_parse_analyze_hook)
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index eb7f7181e4..4d88982a75 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -101,6 +101,7 @@
 #include "utils/plancache.h"
 #include "utils/portal.h"
 #include "utils/ps_status.h"
+#include "utils/queryjumble.h"
 #include "utils/rls.h"
 #include "utils/snapmgr.h"
 #include "utils/tzparser.h"
@@ -402,6 +403,23 @@ static const struct config_enum_entry backslash_quote_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Although only "on", "off", and "auto" are documented, we accept all the
+ * likely variants of "on" and "off".
+ */
+static const struct config_enum_entry compute_query_id_options[] = {
+	{"auto", COMPUTE_QUERY_ID_AUTO, false},
+	{"on", COMPUTE_QUERY_ID_ON, false},
+	{"off", COMPUTE_QUERY_ID_OFF, false},
+	{"true", COMPUTE_QUERY_ID_ON, true},
+	{"false", COMPUTE_QUERY_ID_OFF, true},
+	{"yes", COMPUTE_QUERY_ID_ON, true},
+	{"no", COMPUTE_QUERY_ID_OFF, true},
+	{"1", COMPUTE_QUERY_ID_ON, true},
+	{"0", COMPUTE_QUERY_ID_OFF, true},
+	{NULL, 0, false}
+};
+
 /*
  * Although only "on", "off", and "partition" are documented, we
  * accept all the likely variants of "on" and "off".
@@ -534,7 +552,6 @@ extern const struct config_enum_entry dynamic_shared_memory_options[];
 /*
  * GUC option variables that are exported from this module
  */
-bool		compute_query_id = false;
 bool		log_duration = false;
 bool		Debug_print_plan = false;
 bool		Debug_print_parse = false;
@@ -1441,15 +1458,6 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
-	{
-		{"compute_query_id", PGC_SUSET, STATS_MONITORING,
-			gettext_noop("Compute query identifiers."),
-			NULL
-		},
-		&compute_query_id,
-		false,
-		NULL, NULL, NULL
-	},
 	{
 		{"log_parser_stats", PGC_SUSET, STATS_MONITORING,
 			gettext_noop("Writes parser performance statistics to the server log."),
@@ -4619,6 +4627,16 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"compute_query_id", PGC_SUSET, STATS_MONITORING,
+			gettext_noop("Compute query identifiers."),
+			NULL
+		},
+		&compute_query_id,
+		COMPUTE_QUERY_ID_AUTO, compute_query_id_options,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"constraint_exclusion", PGC_USERSET, QUERY_TUNING_OTHER,
 			gettext_noop("Enables the planner to use constraints to optimize queries."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index efde01ee56..6e36e4c2ef 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -604,7 +604,7 @@
 
 # - Monitoring -
 
-#compute_query_id = off
+#compute_query_id = auto
 #log_statement_stats = off
 #log_parser_stats = off
 #log_planner_stats = off
diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
index f004a9ce8c..c4683d9ed2 100644
--- a/src/backend/utils/misc/queryjumble.c
+++ b/src/backend/utils/misc/queryjumble.c
@@ -39,6 +39,12 @@
 
 #define JUMBLE_SIZE				1024	/* query serialization buffer size */
 
+/* GUC parameters */
+int			compute_query_id = COMPUTE_QUERY_ID_AUTO;
+
+/* True when a module requests query IDs and they're set auto */
+bool		auto_query_id_enabled = false;
+
 static uint64 compute_utility_query_id(const char *str, int query_location, int query_len);
 static void AppendJumble(JumbleState *jstate,
 						 const unsigned char *item, Size size);
@@ -132,6 +138,33 @@ JumbleQuery(Query *query, const char *querytext)
 	return jstate;
 }
 
+/*
+ * Enables query identifier computation if the GUC is set to auto.
+ *
+ * Third-party plugins can use this function to inform core that they require
+ * a query identifier to be computed.
+ */
+void
+EnableQueryId(void)
+{
+	if (compute_query_id == COMPUTE_QUERY_ID_AUTO)
+		auto_query_id_enabled = true;
+}
+
+/*
+ * Returns whether query identifier computation has been enabled, either
+ * directly in the GUC or by a module when the setting is 'auto'.
+ */
+bool
+IsQueryIdEnabled(void)
+{
+	if (compute_query_id == COMPUTE_QUERY_ID_OFF)
+		return false;
+	if (compute_query_id == COMPUTE_QUERY_ID_ON)
+		return true;
+	return auto_query_id_enabled;
+}
+
 /*
  * Compute a query identifier for the given utility query string.
  */
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 24a5d9d3fb..a7c3a4958e 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -247,7 +247,6 @@ extern bool log_btree_build_stats;
 extern PGDLLIMPORT bool check_function_bodies;
 extern bool session_auth_is_superuser;
 
-extern bool compute_query_id;
 extern bool log_duration;
 extern int	log_parameter_max_length;
 extern int	log_parameter_max_length_on_error;
diff --git a/src/include/utils/queryjumble.h b/src/include/utils/queryjumble.h
index 83ba7339fa..60389f15b4 100644
--- a/src/include/utils/queryjumble.h
+++ b/src/include/utils/queryjumble.h
@@ -52,7 +52,23 @@ typedef struct JumbleState
 	int			highest_extern_param_id;
 } JumbleState;
 
-const char *CleanQuerytext(const char *query, int *location, int *len);
-JumbleState *JumbleQuery(Query *query, const char *querytext);
+/* Values for the compute_query_id GUC */
+typedef enum
+{
+	COMPUTE_QUERY_ID_OFF,
+	COMPUTE_QUERY_ID_ON,
+	COMPUTE_QUERY_ID_AUTO
+} ComputeQueryIdType;
+
+/* GUC parameters */
+extern int compute_query_id;
+
+/* Not a GUC: set true if any modules turn "auto" into "on" */
+extern bool auto_query_id_enabled;
+
+extern const char *CleanQuerytext(const char *query, int *location, int *len);
+extern JumbleState *JumbleQuery(Query *query, const char *querytext);
+extern void EnableQueryId(void);
+extern bool IsQueryIdEnabled(void);
 
 #endif							/* QUERYJUMBLE_H */

#123

Bruce Momjian

bruce@momjian.us

over 4 years ago

In reply to: Alvaro Herrera (#122)

Re: compute_query_id and pg_stat_statements

On Thu, May 13, 2021 at 08:04:37PM -0400, ï¿½lvaro Herrera wrote:

Here's a first attempt at what was suggested. If you say "auto" it
remains auto in SHOW, but it gets enabled if a module asks for it.

Not final yet, but I thought I'd throw it out for early commentary ...

I certainly like this idea better than having the extension change the
output of the GUC.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#124

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Bruce Momjian (#117)

Re: compute_query_id and pg_stat_statements

On Fri, May 14, 2021 at 3:12 AM Bruce Momjian <bruce@momjian.us> wrote:

Maybe we should revert this thing pending somebody doing the work to
make a version of queryid labeling that actually is negligibly cheap.
It certainly seems like that could be done; one more traversal of the
parse tree can't be that expensive in itself. I suspect that the
performance problem is with the particular hashing mechanism that
was used, which looks mighty ad-hoc anyway.

I was surprised it was ~2%.

Just to be clear, the 2% was a worst case scenario, ie. a very fast
read-only query on small data returning a single row. As soon as you
get something more realistic / expensive the overhead goes away. For
reference here is the detail:
/messages/by-id/CAOBaU_ZVmGPfKTwZ6cM_qdzaF2E1gMkrLDMwwLy4Z1JxQ6=CZg@mail.gmail.com

#125

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Bruce Momjian (#123)

Re: compute_query_id and pg_stat_statements

On Fri, May 14, 2021 at 8:13 AM Bruce Momjian <bruce@momjian.us> wrote:

On Thu, May 13, 2021 at 08:04:37PM -0400, Álvaro Herrera wrote:

Here's a first attempt at what was suggested. If you say "auto" it
remains auto in SHOW, but it gets enabled if a module asks for it.

Not final yet, but I thought I'd throw it out for early commentary ...

I certainly like this idea better than having the extension change the
output of the GUC.

Oh, I didn't understand that it was the major blocker. I'm fine with it too.

#126

Bruce Momjian

bruce@momjian.us

over 4 years ago

In reply to: Julien Rouhaud (#125)

Re: compute_query_id and pg_stat_statements

On Fri, May 14, 2021 at 09:40:15AM +0800, Julien Rouhaud wrote:

On Fri, May 14, 2021 at 8:13 AM Bruce Momjian <bruce@momjian.us> wrote:

On Thu, May 13, 2021 at 08:04:37PM -0400, ï¿½lvaro Herrera wrote:

Here's a first attempt at what was suggested. If you say "auto" it
remains auto in SHOW, but it gets enabled if a module asks for it.

Not final yet, but I thought I'd throw it out for early commentary ...

I certainly like this idea better than having the extension change the
output of the GUC.

Oh, I didn't understand that it was the major blocker. I'm fine with it too.

I think if we keep the output as 'auto', and document that you check
pg_stat_activity for a hash to see if it is enabled, that gets us pretty
far.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#127

Tom Lane

tgl@sss.pgh.pa.us

over 4 years ago

In reply to: Julien Rouhaud (#124)

Re: compute_query_id and pg_stat_statements

Julien Rouhaud <rjuju123@gmail.com> writes:

On Fri, May 14, 2021 at 3:12 AM Bruce Momjian <bruce@momjian.us> wrote:

I was surprised it was ~2%.

Just to be clear, the 2% was a worst case scenario, ie. a very fast
read-only query on small data returning a single row. As soon as you
get something more realistic / expensive the overhead goes away.

Of course, for plenty of people that IS the realistic scenario that
they care about max performance for.

regards, tom lane

#128

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Tom Lane (#127)

Re: compute_query_id and pg_stat_statements

On Thu, May 13, 2021 at 09:47:02PM -0400, Tom Lane wrote:

Julien Rouhaud <rjuju123@gmail.com> writes:

On Fri, May 14, 2021 at 3:12 AM Bruce Momjian <bruce@momjian.us> wrote:

I was surprised it was ~2%.

Just to be clear, the 2% was a worst case scenario, ie. a very fast
read-only query on small data returning a single row. As soon as you
get something more realistic / expensive the overhead goes away.

Of course, for plenty of people that IS the realistic scenario that
they care about max performance for.

I'm not arguing that the scenario is unrealistic. I'm arguing that retrieving
the first row of a join between pg_class and pg_attribute on an otherwise
vanilla database may not be the most representative workload, especially when
you take into account that it was done on hardware that still took 3 ms to do
that.

Unfortunately my laptop is pretty old and has already proven multiple time to
give unreliable benchmark results, so I'm not confident at all that those 2%
are even real outside of my machine.

#129

Fujii Masao

masao.fujii@oss.nttdata.com

over 4 years ago

In reply to: Alvaro Herrera (#122)

Re: compute_query_id and pg_stat_statements

On 2021/05/14 9:04, Alvaro Herrera wrote:

Here's a first attempt at what was suggested. If you say "auto" it
remains auto in SHOW, but it gets enabled if a module asks for it.

Not final yet, but I thought I'd throw it out for early commentary ...

Many thanks! The patch basically looks good to me.

+void
+EnableQueryId(void)
+{
+	if (compute_query_id == COMPUTE_QUERY_ID_AUTO)
+		auto_query_id_enabled = true;

Shouldn't EnableQueryId() enable auto_query_id_enabled whatever compute_query_id is?
Otherwise, for example, the following scenario can happen and it's a bit strange.

1. The server starts up with shared_preload_libraries=pg_stat_statements and compute_query_id=on
2. compute_query_id is set to auto and the configuration file is reloaded
Then, even though compute_query_id is auto and pg_stat_statements is loaded,
query ids are not computed and no queries are tracked by pg_stat_statements.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#130

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Fujii Masao (#129)

Re: compute_query_id and pg_stat_statements

On Fri, May 14, 2021 at 12:20:00PM +0900, Fujii Masao wrote:

On 2021/05/14 9:04, Alvaro Herrera wrote:

Here's a first attempt at what was suggested. If you say "auto" it
remains auto in SHOW, but it gets enabled if a module asks for it.

Not final yet, but I thought I'd throw it out for early commentary ...

Many thanks! The patch basically looks good to me.
+void
+EnableQueryId(void)
+{
+	if (compute_query_id == COMPUTE_QUERY_ID_AUTO)
+		auto_query_id_enabled = true;
Shouldn't EnableQueryId() enable auto_query_id_enabled whatever compute_query_id is?
Otherwise, for example, the following scenario can happen and it's a bit strange.

1. The server starts up with shared_preload_libraries=pg_stat_statements and compute_query_id=on
2. compute_query_id is set to auto and the configuration file is reloaded
Then, even though compute_query_id is auto and pg_stat_statements is loaded,
query ids are not computed and no queries are tracked by pg_stat_statements.

+1. Note that if you switch from compute_query_id = off + custom
query_id + pg_stat_statements to compute_query_id = auto then thing will
immediately break (as we instruct third-party plugins authors to error out in
that case), which is way better than breaking at the next random service
restart.

#131

Tom Lane

tgl@sss.pgh.pa.us

over 4 years ago

In reply to: Tom Lane (#114)

2 attachment(s)

Re: compute_query_id and pg_stat_statements

I wrote:

Maybe we should revert this thing pending somebody doing the work to
make a version of queryid labeling that actually is negligibly cheap.
It certainly seems like that could be done; one more traversal of the
parse tree can't be that expensive in itself. I suspect that the
performance problem is with the particular hashing mechanism that
was used, which looks mighty ad-hoc anyway.

To put a little bit of meat on that idea, I experimented with jacking
up the "jumble" calculation and driving some other implementations
underneath.

I thought that Julien's "worst case" scenario was pretty far from
worst case, since it involved a join which a lot of simple queries
don't. I tested this scenario instead:

$ cat naive.sql
SELECT * FROM pg_class c ORDER BY oid DESC LIMIT 1;
$ pgbench -n -f naive.sql -T 60 postgres

which is still complicated enough that there's work for the
query fingerprinter to do, but not so much for planning and
execution.

I confirm that on HEAD, there's a noticeable TPS penalty from
turning on compute_query_id: about 3.2% on my machine.

The first patch attached replaces the "jumble" calculation
with two CRC32s (two so that we still get 64 bits out at
the end). I see 2.7% penalty with this version. Now,
I'm using an Intel machine with
#define USE_SSE42_CRC32C_WITH_RUNTIME_CHECK 1
so on machines without any hardware CRC support, this'd
likely be a loss. But it still proves the point that the
existing implementation is just not very speedy.

I then tried a really dumb xor'ing implementation, and
that gives me a slowdown of 2.2%. This could undoubtedly
be improved on further, say by unrolling the loop or
processing multiple bytes at once. One problem with it
is that I suspect it will tend to concentrate the entropy
into the third/fourth and seventh/eighth bytes of the
accumulator, since so many of the fields being jumbled
are 4-byte or 8-byte fields with most of the entropy in
their low-order bits. Probably that could be improved
with a bit more thought -- say, an extra bump of the
nextbyte pointer after each field.

Anyway, I think that what we have here is quite an inferior
implementation, and we can do better with some more thought.

regards, tom lane

Attachments:

query-id-with-crc.patchtext/x-diff; charset=us-ascii; name=query-id-with-crc.patchDownload

diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
index f004a9ce8c..74ed555ed7 100644
--- a/src/backend/utils/misc/queryjumble.c
+++ b/src/backend/utils/misc/queryjumble.c
@@ -41,7 +41,7 @@
 
 static uint64 compute_utility_query_id(const char *str, int query_location, int query_len);
 static void AppendJumble(JumbleState *jstate,
-						 const unsigned char *item, Size size);
+						 const void *item, Size size);
 static void JumbleQueryInternal(JumbleState *jstate, Query *query);
 static void JumbleRangeTable(JumbleState *jstate, List *rtable);
 static void JumbleRowMarks(JumbleState *jstate, List *rowMarks);
@@ -106,9 +106,11 @@ JumbleQuery(Query *query, const char *querytext)
 	{
 		jstate = (JumbleState *) palloc(sizeof(JumbleState));
 
-		/* Set up workspace for query jumbling */
-		jstate->jumble = (unsigned char *) palloc(JUMBLE_SIZE);
-		jstate->jumble_len = 0;
+		/* Initialize CRCs for query jumbling */
+		INIT_CRC32C(jstate->crc0);
+		INIT_CRC32C(jstate->crc1);
+		jstate->whichcrc = 0;
+		/* Initialize state for tracking where constants are */
 		jstate->clocations_buf_size = 32;
 		jstate->clocations = (LocationLen *)
 			palloc(jstate->clocations_buf_size * sizeof(LocationLen));
@@ -117,9 +119,11 @@ JumbleQuery(Query *query, const char *querytext)
 
 		/* Compute query ID and mark the Query node with it */
 		JumbleQueryInternal(jstate, query);
-		query->queryId = DatumGetUInt64(hash_any_extended(jstate->jumble,
-														  jstate->jumble_len,
-														  0));
+
+		FIN_CRC32C(jstate->crc0);
+		FIN_CRC32C(jstate->crc1);
+		query->queryId = (((uint64) jstate->crc0) << 32) |
+			((uint64) jstate->crc1);
 
 		/*
 		 * If we are unlucky enough to get a hash of zero, use 1 instead, to
@@ -165,36 +169,18 @@ compute_utility_query_id(const char *query_text, int query_location, int query_l
  * the current jumble.
  */
 static void
-AppendJumble(JumbleState *jstate, const unsigned char *item, Size size)
+AppendJumble(JumbleState *jstate, const void *item, Size size)
 {
-	unsigned char *jumble = jstate->jumble;
-	Size		jumble_len = jstate->jumble_len;
-
-	/*
-	 * Whenever the jumble buffer is full, we hash the current contents and
-	 * reset the buffer to contain just that hash value, thus relying on the
-	 * hash to summarize everything so far.
-	 */
-	while (size > 0)
+	if (jstate->whichcrc)
 	{
-		Size		part_size;
-
-		if (jumble_len >= JUMBLE_SIZE)
-		{
-			uint64		start_hash;
-
-			start_hash = DatumGetUInt64(hash_any_extended(jumble,
-														  JUMBLE_SIZE, 0));
-			memcpy(jumble, &start_hash, sizeof(start_hash));
-			jumble_len = sizeof(start_hash);
-		}
-		part_size = Min(size, JUMBLE_SIZE - jumble_len);
-		memcpy(jumble + jumble_len, item, part_size);
-		jumble_len += part_size;
-		item += part_size;
-		size -= part_size;
+		COMP_CRC32C(jstate->crc1, item, size);
+		jstate->whichcrc = 0;
+	}
+	else
+	{
+		COMP_CRC32C(jstate->crc0, item, size);
+		jstate->whichcrc = 1;
 	}
-	jstate->jumble_len = jumble_len;
 }
 
 /*
@@ -202,9 +188,9 @@ AppendJumble(JumbleState *jstate, const unsigned char *item, Size size)
  * of individual local variable elements.
  */
 #define APP_JUMB(item) \
-	AppendJumble(jstate, (const unsigned char *) &(item), sizeof(item))
+	AppendJumble(jstate, (const void *) &(item), sizeof(item))
 #define APP_JUMB_STRING(str) \
-	AppendJumble(jstate, (const unsigned char *) (str), strlen(str) + 1)
+	AppendJumble(jstate, (const void *) (str), strlen(str) + 1)
 
 /*
  * JumbleQueryInternal: Selectively serialize the query tree, appending
diff --git a/src/include/utils/queryjumble.h b/src/include/utils/queryjumble.h
index 83ba7339fa..3a09c555fa 100644
--- a/src/include/utils/queryjumble.h
+++ b/src/include/utils/queryjumble.h
@@ -15,8 +15,7 @@
 #define QUERYJUBLE_H
 
 #include "nodes/parsenodes.h"
-
-#define JUMBLE_SIZE				1024	/* query serialization buffer size */
+#include "port/pg_crc32c.h"
 
 /*
  * Struct for tracking locations/lengths of constants during normalization
@@ -33,11 +32,13 @@ typedef struct LocationLen
  */
 typedef struct JumbleState
 {
-	/* Jumble of current query tree */
-	unsigned char *jumble;
-
-	/* Number of bytes used in jumble[] */
-	Size		jumble_len;
+	/*
+	 * Since we'd like a 64-bit-wide query ID, but we're using 32-bit CRC
+	 * technology, we combine two 32-bit CRCs.
+	 */
+	pg_crc32c	crc0;			/* some fields go into here ... */
+	pg_crc32c	crc1;			/* ... and others into here */
+	int			whichcrc;		/* 0 or 1, says which CRC accum to use next */
 
 	/* Array of locations of constants that should be removed */
 	LocationLen *clocations;

query-id-with-xor.patchtext/x-diff; charset=us-ascii; name=query-id-with-xor.patchDownload

diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
index f004a9ce8c..225940d40c 100644
--- a/src/backend/utils/misc/queryjumble.c
+++ b/src/backend/utils/misc/queryjumble.c
@@ -106,9 +106,10 @@ JumbleQuery(Query *query, const char *querytext)
 	{
 		jstate = (JumbleState *) palloc(sizeof(JumbleState));
 
-		/* Set up workspace for query jumbling */
-		jstate->jumble = (unsigned char *) palloc(JUMBLE_SIZE);
-		jstate->jumble_len = 0;
+		/* Initialize state for query jumbling */
+		jstate->j.id = 0;
+		jstate->nextbyte = 0;
+		/* Initialize state for tracking where constants are */
 		jstate->clocations_buf_size = 32;
 		jstate->clocations = (LocationLen *)
 			palloc(jstate->clocations_buf_size * sizeof(LocationLen));
@@ -117,9 +118,7 @@ JumbleQuery(Query *query, const char *querytext)
 
 		/* Compute query ID and mark the Query node with it */
 		JumbleQueryInternal(jstate, query);
-		query->queryId = DatumGetUInt64(hash_any_extended(jstate->jumble,
-														  jstate->jumble_len,
-														  0));
+		query->queryId = jstate->j.id;
 
 		/*
 		 * If we are unlucky enough to get a hash of zero, use 1 instead, to
@@ -167,34 +166,17 @@ compute_utility_query_id(const char *query_text, int query_location, int query_l
 static void
 AppendJumble(JumbleState *jstate, const unsigned char *item, Size size)
 {
-	unsigned char *jumble = jstate->jumble;
-	Size		jumble_len = jstate->jumble_len;
+	int			nextbyte = jstate->nextbyte;
 
-	/*
-	 * Whenever the jumble buffer is full, we hash the current contents and
-	 * reset the buffer to contain just that hash value, thus relying on the
-	 * hash to summarize everything so far.
-	 */
 	while (size > 0)
 	{
-		Size		part_size;
-
-		if (jumble_len >= JUMBLE_SIZE)
-		{
-			uint64		start_hash;
-
-			start_hash = DatumGetUInt64(hash_any_extended(jumble,
-														  JUMBLE_SIZE, 0));
-			memcpy(jumble, &start_hash, sizeof(start_hash));
-			jumble_len = sizeof(start_hash);
-		}
-		part_size = Min(size, JUMBLE_SIZE - jumble_len);
-		memcpy(jumble + jumble_len, item, part_size);
-		jumble_len += part_size;
-		item += part_size;
-		size -= part_size;
+		jstate->j.bytes[nextbyte] ^= *item++;
+		nextbyte++;
+		if (nextbyte >= sizeof(uint64))
+			nextbyte = 0;
+		size--;
 	}
-	jstate->jumble_len = jumble_len;
+	jstate->nextbyte = nextbyte;
 }
 
 /*
diff --git a/src/include/utils/queryjumble.h b/src/include/utils/queryjumble.h
index 83ba7339fa..d01712af4d 100644
--- a/src/include/utils/queryjumble.h
+++ b/src/include/utils/queryjumble.h
@@ -16,8 +16,6 @@
 
 #include "nodes/parsenodes.h"
 
-#define JUMBLE_SIZE				1024	/* query serialization buffer size */
-
 /*
  * Struct for tracking locations/lengths of constants during normalization
  */
@@ -33,11 +31,13 @@ typedef struct LocationLen
  */
 typedef struct JumbleState
 {
-	/* Jumble of current query tree */
-	unsigned char *jumble;
-
-	/* Number of bytes used in jumble[] */
-	Size		jumble_len;
+	/* Working state for accumulating a 64-bit query ID */
+	union
+	{
+		unsigned char bytes[sizeof(uint64)];
+		uint64		id;
+	}			j;
+	int			nextbyte;		/* index of next byte to update in j.bytes[] */
 
 	/* Array of locations of constants that should be removed */
 	LocationLen *clocations;

#132

Bruce Momjian

bruce@momjian.us

over 4 years ago

In reply to: Tom Lane (#131)

Re: compute_query_id and pg_stat_statements

On Fri, May 14, 2021 at 12:26:23AM -0400, Tom Lane wrote:

I then tried a really dumb xor'ing implementation, and
that gives me a slowdown of 2.2%. This could undoubtedly
be improved on further, say by unrolling the loop or
processing multiple bytes at once. One problem with it
is that I suspect it will tend to concentrate the entropy
into the third/fourth and seventh/eighth bytes of the
accumulator, since so many of the fields being jumbled
are 4-byte or 8-byte fields with most of the entropy in
their low-order bits. Probably that could be improved
with a bit more thought -- say, an extra bump of the
nextbyte pointer after each field.

Anyway, I think that what we have here is quite an inferior
implementation, and we can do better with some more thought.

Considering what even a simple query has to do, I am still baffled why
such a computation takes ~2%, though it obviously does since you have
confirmed that figure.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#133

Bruce Momjian

bruce@momjian.us

over 4 years ago

In reply to: Bruce Momjian (#126)

Re: compute_query_id and pg_stat_statements

On Thu, May 13, 2021 at 09:41:42PM -0400, Bruce Momjian wrote:

On Fri, May 14, 2021 at 09:40:15AM +0800, Julien Rouhaud wrote:

On Fri, May 14, 2021 at 8:13 AM Bruce Momjian <bruce@momjian.us> wrote:

On Thu, May 13, 2021 at 08:04:37PM -0400, ï¿½lvaro Herrera wrote:

Here's a first attempt at what was suggested. If you say "auto" it
remains auto in SHOW, but it gets enabled if a module asks for it.

Not final yet, but I thought I'd throw it out for early commentary ...

I certainly like this idea better than having the extension change the
output of the GUC.

Oh, I didn't understand that it was the major blocker. I'm fine with it too.

I think if we keep the output as 'auto', and document that you check
pg_stat_activity for a hash to see if it is enabled, that gets us pretty
far.

I think keeping the output as 'auto', and documenting that this query
must be run to determine if the query id is being computed:

SELECT query_id
FROM pg_stat_activity
WHERE pid = pg_backend_pid();

is the right approach.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#134

Bruce Momjian

bruce@momjian.us

over 4 years ago

In reply to: Bruce Momjian (#133)

Re: compute_query_id and pg_stat_statements

On Fri, May 14, 2021 at 08:35:14AM -0400, Bruce Momjian wrote:

On Thu, May 13, 2021 at 09:41:42PM -0400, Bruce Momjian wrote:

I think if we keep the output as 'auto', and document that you check
pg_stat_activity for a hash to see if it is enabled, that gets us pretty
far.

I think keeping the output as 'auto', and documenting that this query
must be run to determine if the query id is being computed:

SELECT query_id
FROM pg_stat_activity
WHERE pid = pg_backend_pid();

is the right approach.

Actually, we talked about huge_pages = try needing to use OS commands to
see if huge pages are being used, so requiring an SQL query to see if
query id is being computed seems reasonable.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#135

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Bruce Momjian (#134)

Re: compute_query_id and pg_stat_statements

On Fri, May 14, 2021 at 08:57:41AM -0400, Bruce Momjian wrote:

On Fri, May 14, 2021 at 08:35:14AM -0400, Bruce Momjian wrote:

On Thu, May 13, 2021 at 09:41:42PM -0400, Bruce Momjian wrote:

I think if we keep the output as 'auto', and document that you check
pg_stat_activity for a hash to see if it is enabled, that gets us pretty
far.

I think keeping the output as 'auto', and documenting that this query
must be run to determine if the query id is being computed:

SELECT query_id
FROM pg_stat_activity
WHERE pid = pg_backend_pid();

is the right approach.

Actually, we talked about huge_pages = try needing to use OS commands to
see if huge pages are being used, so requiring an SQL query to see if
query id is being computed seems reasonable.

I totally agree.

#136

Andrew Dunstan

andrew@dunslane.net

over 4 years ago

In reply to: Bruce Momjian (#133)

Re: compute_query_id and pg_stat_statements

Sent from my iPhone

On May 14, 2021, at 8:35 AM, Bruce Momjian <bruce@momjian.us> wrote:

On Thu, May 13, 2021 at 09:41:42PM -0400, Bruce Momjian wrote:

On Fri, May 14, 2021 at 09:40:15AM +0800, Julien Rouhaud wrote:
On Fri, May 14, 2021 at 8:13 AM Bruce Momjian <bruce@momjian.us> wrote:

On Thu, May 13, 2021 at 08:04:37PM -0400, Álvaro Herrera wrote:

Here's a first attempt at what was suggested. If you say "auto" it
remains auto in SHOW, but it gets enabled if a module asks for it.

Not final yet, but I thought I'd throw it out for early commentary ...

I certainly like this idea better than having the extension change the
output of the GUC.

Oh, I didn't understand that it was the major blocker. I'm fine with it too.

I think if we keep the output as 'auto', and document that you check
pg_stat_activity for a hash to see if it is enabled, that gets us pretty
far.

I think keeping the output as 'auto', and documenting that this query
must be run to determine if the query id is being computed:

SELECT query_id
FROM pg_stat_activity
WHERE pid = pg_backend_pid();

is the right approach.

I’d rather we added a specific function. This is not really obvious.

Cheers

Andrew

#137

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Andrew Dunstan (#136)

Re: compute_query_id and pg_stat_statements

On Fri, May 14, 2021 at 12:04:05PM -0400, Andrew Dunstan wrote:

On May 14, 2021, at 8:35 AM, Bruce Momjian <bruce@momjian.us> wrote:

On Thu, May 13, 2021 at 09:41:42PM -0400, Bruce Momjian wrote:

On Fri, May 14, 2021 at 09:40:15AM +0800, Julien Rouhaud wrote:
On Fri, May 14, 2021 at 8:13 AM Bruce Momjian <bruce@momjian.us> wrote:

On Thu, May 13, 2021 at 08:04:37PM -0400, Álvaro Herrera wrote:

Here's a first attempt at what was suggested. If you say "auto" it
remains auto in SHOW, but it gets enabled if a module asks for it.

Not final yet, but I thought I'd throw it out for early commentary ...

I certainly like this idea better than having the extension change the
output of the GUC.

Oh, I didn't understand that it was the major blocker. I'm fine with it too.

I think if we keep the output as 'auto', and document that you check
pg_stat_activity for a hash to see if it is enabled, that gets us pretty
far.

I think keeping the output as 'auto', and documenting that this query
must be run to determine if the query id is being computed:

SELECT query_id
FROM pg_stat_activity
WHERE pid = pg_backend_pid();

is the right approach.

I’d rather we added a specific function. This is not really obvious.

We could, but I don't know how much this will be used in practice. The only
way someone would try to know if "auto" means that query_id are computed is if
she has an extension like pg_stat_statements, and she will probably just check
that anyway, and will get a warning if query_id are *not* computed.

That being said no objection to an SQL wrapper around a query like it.

#138

Bruce Momjian

bruce@momjian.us

over 4 years ago

In reply to: Andrew Dunstan (#136)

Re: compute_query_id and pg_stat_statements

On Fri, May 14, 2021 at 12:04:05PM -0400, Andrew Dunstan wrote:

On May 14, 2021, at 8:35 AM, Bruce Momjian <bruce@momjian.us> wrote:

On Thu, May 13, 2021 at 09:41:42PM -0400, Bruce Momjian wrote:
I think keeping the output as 'auto', and documenting that this query
must be run to determine if the query id is being computed:

SELECT query_id
FROM pg_stat_activity
WHERE pid = pg_backend_pid();

is the right approach.

I’d rather we added a specific function. This is not really obvious.

Well, we can document this query, add a function, or add a read-only
GUC. I am not sure how we decide which one to use.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#139

Pavel Stehule

pavel.stehule@gmail.com

over 4 years ago

In reply to: Bruce Momjian (#138)

Re: compute_query_id and pg_stat_statements

pá 14. 5. 2021 v 18:21 odesílatel Bruce Momjian <bruce@momjian.us> napsal:

On Fri, May 14, 2021 at 12:04:05PM -0400, Andrew Dunstan wrote:

On May 14, 2021, at 8:35 AM, Bruce Momjian <bruce@momjian.us> wrote:

On Thu, May 13, 2021 at 09:41:42PM -0400, Bruce Momjian wrote:
I think keeping the output as 'auto', and documenting that this query
must be run to determine if the query id is being computed:

SELECT query_id
FROM pg_stat_activity
WHERE pid = pg_backend_pid();

is the right approach.

I’d rather we added a specific function. This is not really obvious.

Well, we can document this query, add a function, or add a read-only
GUC. I am not sure how we decide which one to use.

I though and I prefer read only GUC

It is easy to write "show all"

Pavel

Show quoted text

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#140

Bruce Momjian

bruce@momjian.us

over 4 years ago

In reply to: Bruce Momjian (#138)

Re: compute_query_id and pg_stat_statements

On Fri, May 14, 2021 at 12:21:23PM -0400, Bruce Momjian wrote:

On Fri, May 14, 2021 at 12:04:05PM -0400, Andrew Dunstan wrote:

On May 14, 2021, at 8:35 AM, Bruce Momjian <bruce@momjian.us> wrote:

On Thu, May 13, 2021 at 09:41:42PM -0400, Bruce Momjian wrote:
I think keeping the output as 'auto', and documenting that this query
must be run to determine if the query id is being computed:

SELECT query_id
FROM pg_stat_activity
WHERE pid = pg_backend_pid();

is the right approach.

I’d rather we added a specific function. This is not really obvious.

Well, we can document this query, add a function, or add a read-only
GUC. I am not sure how we decide which one to use.

I wonder if we should go with an SQL query now (no new API needed) and
then add a GUC once we decide on how extensions can register that they
are generating the query id, so the GUC can report the generating
source, not just a boolean. The core server can also register as the
source.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#141

Pavel Stehule

pavel.stehule@gmail.com

over 4 years ago

In reply to: Bruce Momjian (#140)

Re: compute_query_id and pg_stat_statements

pá 14. 5. 2021 v 19:28 odesílatel Bruce Momjian <bruce@momjian.us> napsal:

On Fri, May 14, 2021 at 12:21:23PM -0400, Bruce Momjian wrote:

On Fri, May 14, 2021 at 12:04:05PM -0400, Andrew Dunstan wrote:

On May 14, 2021, at 8:35 AM, Bruce Momjian <bruce@momjian.us> wrote:

On Thu, May 13, 2021 at 09:41:42PM -0400, Bruce Momjian wrote:
I think keeping the output as 'auto', and documenting that this query
must be run to determine if the query id is being computed:

SELECT query_id
FROM pg_stat_activity
WHERE pid = pg_backend_pid();

is the right approach.

I’d rather we added a specific function. This is not really obvious.

Well, we can document this query, add a function, or add a read-only
GUC. I am not sure how we decide which one to use.

I wonder if we should go with an SQL query now (no new API needed) and
then add a GUC once we decide on how extensions can register that they
are generating the query id, so the GUC can report the generating
source, not just a boolean. The core server can also register as the
source.

I have no problem with it. This is an internal feature and can be enhanced
(fixed) in time without problems.

Pavel

Show quoted text

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#142

Alvaro Herrera

alvherre@alvh.no-ip.org

over 4 years ago

In reply to: Julien Rouhaud (#130)

1 attachment(s)

Re: compute_query_id and pg_stat_statements

On 2021-May-14, Julien Rouhaud wrote:

On Fri, May 14, 2021 at 12:20:00PM +0900, Fujii Masao wrote:

+void
+EnableQueryId(void)
+{
+	if (compute_query_id == COMPUTE_QUERY_ID_AUTO)
+		auto_query_id_enabled = true;
Shouldn't EnableQueryId() enable auto_query_id_enabled whatever compute_query_id is?
Otherwise, for example, the following scenario can happen and it's a bit strange.

1. The server starts up with shared_preload_libraries=pg_stat_statements and compute_query_id=on
2. compute_query_id is set to auto and the configuration file is reloaded
Then, even though compute_query_id is auto and pg_stat_statements is loaded,
query ids are not computed and no queries are tracked by pg_stat_statements.
+1.

That makes sense. Done in this version.

I took out the new WARNING in pg_stat_statements. It's not clear to me
that that's terribly useful (it stops working as soon as you have one
query in the pg_stat_statements stash and later disable everything).
Maybe there is an useful warning to add, but I think it's independent of
changing the GUC behabior.

I also made IsQueryIdEnabled() a static inline in queryjumble.h, to
avoid a function call at every site where we need that. Also did some
light doc editing.

I think I should aim at pushing this tomorrow morning.

Note that if you switch from compute_query_id = off + custom
query_id + pg_stat_statements to compute_query_id = auto then thing will
immediately break (as we instruct third-party plugins authors to error out in
that case), which is way better than breaking at the next random service
restart.

Hmm, ok. I tested pg_queryid and that behavior of throwing an error
seems quite unhelpful -- it basically makes every single query fail if
you set things wrong. I think that instruction is bogus and should be
reconsidered. Maybe pg_queryid could use a custom Boolean GUC that
tells it to overwrite the core query_id if that is enabled, or to just
silently do nothing in that case.

While thinking about this patch it occurred to that an useful gadget
might be to let pg_stat_statements be loaded always, but with
compute_query_id=false; so it's never active; except if you
ALTER {DATABASE, USER} foo SET (compute_query_id=on);
so that it's possible to enable it selectively. I think this doesn't
currently achieve anything because pgss_store is always called
regardless of query ID being active (so you'd always have at least one
function call as performance penalty, only that the work would be for
naught), but that seems a simple change to make. I didn't look closely
to see what other things would need patched.

--
ï¿½lvaro Herrera 39ï¿½49'30"S 73ï¿½17'W
Maybe there's lots of data loss but the records of data loss are also lost.
(Lincoln Yeoh)

Attachments:

v4-0001-Allow-compute_query_id-set-to-auto.patchtext/x-diff; charset=us-asciiDownload

From acd29794dbba5e9828d78fd348621374a73c6d7a Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Fri, 14 May 2021 16:55:18 -0400
Subject: [PATCH v4] Allow compute_query_id set to 'auto'

---
 .../pg_stat_statements/pg_stat_statements.c   |  6 +++
 .../pg_stat_statements.conf                   |  1 -
 doc/src/sgml/config.sgml                      |  9 ++++-
 doc/src/sgml/pgstatstatements.sgml            | 14 +++----
 src/backend/commands/explain.c                |  2 +-
 src/backend/parser/analyze.c                  |  4 +-
 src/backend/postmaster/postmaster.c           |  3 ++
 src/backend/tcop/postgres.c                   |  2 +-
 src/backend/utils/misc/guc.c                  | 38 ++++++++++++++-----
 src/backend/utils/misc/postgresql.conf.sample |  2 +-
 src/backend/utils/misc/queryjumble.c          | 23 +++++++++++
 src/include/utils/guc.h                       |  1 -
 src/include/utils/queryjumble.h               | 33 +++++++++++++++-
 13 files changed, 108 insertions(+), 30 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index a85f962801..09433c8c96 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -369,6 +369,12 @@ _PG_init(void)
 	if (!process_shared_preload_libraries_in_progress)
 		return;
 
+	/*
+	 * Inform the postmaster that we want to enable query_id calculation if
+	 * compute_query_id is set to auto.
+	 */
+	EnableQueryId();
+
 	/*
 	 * Define (or redefine) custom GUC variables.
 	 */
diff --git a/contrib/pg_stat_statements/pg_stat_statements.conf b/contrib/pg_stat_statements/pg_stat_statements.conf
index e47b26040f..13346e2807 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.conf
+++ b/contrib/pg_stat_statements/pg_stat_statements.conf
@@ -1,2 +1 @@
 shared_preload_libraries = 'pg_stat_statements'
-compute_query_id = on
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 45bd1f1b7e..60d8b24f5e 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7627,7 +7627,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
      <variablelist>
 
      <varlistentry id="guc-compute-query-id" xreflabel="compute_query_id">
-      <term><varname>compute_query_id</varname> (<type>boolean</type>)
+      <term><varname>compute_query_id</varname> (<type>enum</type>)
       <indexterm>
        <primary><varname>compute_query_id</varname> configuration parameter</primary>
       </indexterm>
@@ -7643,7 +7643,12 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
         identifier to be computed.  Note that an external module can
         alternatively be used if the in-core query identifier computation
         method is not acceptable.  In this case, in-core computation
-        must be disabled.  The default is <literal>off</literal>.
+        must be always disabled.
+        Valid values are <literal>off</literal> (always disabled),
+        <literal>on</literal> (always enabled) and <literal>auto</literal>,
+        which let modules such as <xref linkend="pgstatstatements"/>
+        automatically enable it.
+        The default is <literal>auto</literal>.
        </para>
        <note>
         <para>
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index bc2b6038ee..aa332d8cc2 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -18,18 +18,14 @@
   <xref linkend="guc-shared-preload-libraries"/> in
   <filename>postgresql.conf</filename>, because it requires additional shared memory.
   This means that a server restart is needed to add or remove the module.
+  In addition, query identifier calculation must be enabled in order for the
+  module to be active, which is done automatically if <xref linkend="guc-compute-query-id"/>
+  is set to <literal>auto</literal> or <literal>on</literal>, or any third-party
+  module that calculates query identifiers is loaded.
  </para>
 
  <para>
-  The module will not track statistics unless query
-  identifiers are calculated.  This can be done by enabling <xref
-  linkend="guc-compute-query-id"/> or using a third-party module that
-  computes its own query identifiers.  Note that all statistics tracked
-  by this module must be reset if the query identifier method is changed.
- </para>
-
- <para>
-   When <filename>pg_stat_statements</filename> is loaded, it tracks
+   When <filename>pg_stat_statements</filename> is active, it tracks
    statistics across all databases of the server.  To access and manipulate
    these statistics, the module provides views
    <structname>pg_stat_statements</structname> and
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 1202bf85a3..9a60865d19 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -245,7 +245,7 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 	es->summary = (summary_set) ? es->summary : es->analyze;
 
 	query = castNode(Query, stmt->query);
-	if (compute_query_id)
+	if (IsQueryIdEnabled())
 		jstate = JumbleQuery(query, pstate->p_sourcetext);
 
 	if (post_parse_analyze_hook)
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 168198acd1..201b88d1ad 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -124,7 +124,7 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 
 	query = transformTopLevelStmt(pstate, parseTree);
 
-	if (compute_query_id)
+	if (IsQueryIdEnabled())
 		jstate = JumbleQuery(query, sourceText);
 
 	if (post_parse_analyze_hook)
@@ -163,7 +163,7 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
 	/* make sure all is well with parameter types */
 	check_variable_parameters(pstate, query);
 
-	if (compute_query_id)
+	if (IsQueryIdEnabled())
 		jstate = JumbleQuery(query, sourceText);
 
 	if (post_parse_analyze_hook)
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 6833f0f7f2..9ca1095f47 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -521,6 +521,7 @@ typedef struct
 	pg_time_t	first_syslogger_file_time;
 	bool		redirection_done;
 	bool		IsBinaryUpgrade;
+	bool		auto_query_id_enabled;
 	int			max_safe_fds;
 	int			MaxBackends;
 #ifdef WIN32
@@ -6168,6 +6169,7 @@ save_backend_variables(BackendParameters *param, Port *port,
 
 	param->redirection_done = redirection_done;
 	param->IsBinaryUpgrade = IsBinaryUpgrade;
+	param->auto_query_id_enabled = auto_query_id_enabled;
 	param->max_safe_fds = max_safe_fds;
 
 	param->MaxBackends = MaxBackends;
@@ -6401,6 +6403,7 @@ restore_backend_variables(BackendParameters *param, Port *port)
 
 	redirection_done = param->redirection_done;
 	IsBinaryUpgrade = param->IsBinaryUpgrade;
+	auto_query_id_enabled = param->auto_query_id_enabled;
 	max_safe_fds = param->max_safe_fds;
 
 	MaxBackends = param->MaxBackends;
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index dd2ade7bb6..8cea10c901 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -704,7 +704,7 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 
 	query = transformTopLevelStmt(pstate, parsetree);
 
-	if (compute_query_id)
+	if (IsQueryIdEnabled())
 		jstate = JumbleQuery(query, query_string);
 
 	if (post_parse_analyze_hook)
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index eb7f7181e4..ee731044b6 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -101,6 +101,7 @@
 #include "utils/plancache.h"
 #include "utils/portal.h"
 #include "utils/ps_status.h"
+#include "utils/queryjumble.h"
 #include "utils/rls.h"
 #include "utils/snapmgr.h"
 #include "utils/tzparser.h"
@@ -402,6 +403,23 @@ static const struct config_enum_entry backslash_quote_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Although only "on", "off", and "auto" are documented, we accept
+ * all the likely variants of "on" and "off".
+ */
+static const struct config_enum_entry compute_query_id_options[] = {
+	{"auto", COMPUTE_QUERY_ID_AUTO, false},
+	{"on", COMPUTE_QUERY_ID_ON, false},
+	{"off", COMPUTE_QUERY_ID_OFF, false},
+	{"true", COMPUTE_QUERY_ID_ON, true},
+	{"false", COMPUTE_QUERY_ID_OFF, true},
+	{"yes", COMPUTE_QUERY_ID_ON, true},
+	{"no", COMPUTE_QUERY_ID_OFF, true},
+	{"1", COMPUTE_QUERY_ID_ON, true},
+	{"0", COMPUTE_QUERY_ID_OFF, true},
+	{NULL, 0, false}
+};
+
 /*
  * Although only "on", "off", and "partition" are documented, we
  * accept all the likely variants of "on" and "off".
@@ -534,7 +552,6 @@ extern const struct config_enum_entry dynamic_shared_memory_options[];
 /*
  * GUC option variables that are exported from this module
  */
-bool		compute_query_id = false;
 bool		log_duration = false;
 bool		Debug_print_plan = false;
 bool		Debug_print_parse = false;
@@ -1441,15 +1458,6 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
-	{
-		{"compute_query_id", PGC_SUSET, STATS_MONITORING,
-			gettext_noop("Compute query identifiers."),
-			NULL
-		},
-		&compute_query_id,
-		false,
-		NULL, NULL, NULL
-	},
 	{
 		{"log_parser_stats", PGC_SUSET, STATS_MONITORING,
 			gettext_noop("Writes parser performance statistics to the server log."),
@@ -4619,6 +4627,16 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"compute_query_id", PGC_SUSET, STATS_MONITORING,
+			gettext_noop("Compute query identifiers."),
+			NULL
+		},
+		&compute_query_id,
+		COMPUTE_QUERY_ID_AUTO, compute_query_id_options,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"constraint_exclusion", PGC_USERSET, QUERY_TUNING_OTHER,
 			gettext_noop("Enables the planner to use constraints to optimize queries."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index efde01ee56..6e36e4c2ef 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -604,7 +604,7 @@
 
 # - Monitoring -
 
-#compute_query_id = off
+#compute_query_id = auto
 #log_statement_stats = off
 #log_parser_stats = off
 #log_planner_stats = off
diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
index f004a9ce8c..5b442debf0 100644
--- a/src/backend/utils/misc/queryjumble.c
+++ b/src/backend/utils/misc/queryjumble.c
@@ -39,6 +39,12 @@
 
 #define JUMBLE_SIZE				1024	/* query serialization buffer size */
 
+/* GUC parameters */
+int			compute_query_id = COMPUTE_QUERY_ID_AUTO;
+
+/* True when a module requests query IDs and they're set auto */
+bool		query_id_enabled = false;
+
 static uint64 compute_utility_query_id(const char *str, int query_location, int query_len);
 static void AppendJumble(JumbleState *jstate,
 						 const unsigned char *item, Size size);
@@ -91,6 +97,10 @@ CleanQuerytext(const char *query, int *location, int *len)
 	return query;
 }
 
+/*
+ * This should only be called if IsQueryIdEnabled()
+ * return true.
+ */
 JumbleState *
 JumbleQuery(Query *query, const char *querytext)
 {
@@ -132,6 +142,19 @@ JumbleQuery(Query *query, const char *querytext)
 	return jstate;
 }
 
+/*
+ * Enables query identifier computation.
+ *
+ * Third-party plugins can use this function to inform core that they require
+ * a query identifier to be computed.
+ */
+void
+EnableQueryId(void)
+{
+	if (compute_query_id != COMPUTE_QUERY_ID_OFF)
+		query_id_enabled = true;
+}
+
 /*
  * Compute a query identifier for the given utility query string.
  */
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 24a5d9d3fb..a7c3a4958e 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -247,7 +247,6 @@ extern bool log_btree_build_stats;
 extern PGDLLIMPORT bool check_function_bodies;
 extern bool session_auth_is_superuser;
 
-extern bool compute_query_id;
 extern bool log_duration;
 extern int	log_parameter_max_length;
 extern int	log_parameter_max_length_on_error;
diff --git a/src/include/utils/queryjumble.h b/src/include/utils/queryjumble.h
index 83ba7339fa..15f9e71f93 100644
--- a/src/include/utils/queryjumble.h
+++ b/src/include/utils/queryjumble.h
@@ -52,7 +52,36 @@ typedef struct JumbleState
 	int			highest_extern_param_id;
 } JumbleState;
 
-const char *CleanQuerytext(const char *query, int *location, int *len);
-JumbleState *JumbleQuery(Query *query, const char *querytext);
+/* Values for the compute_query_id GUC */
+typedef enum
+{
+	COMPUTE_QUERY_ID_OFF,
+	COMPUTE_QUERY_ID_ON,
+	COMPUTE_QUERY_ID_AUTO
+} ComputeQueryIdType;
+
+/* GUC parameters */
+extern int compute_query_id;
+
+
+extern const char *CleanQuerytext(const char *query, int *location, int *len);
+extern JumbleState *JumbleQuery(Query *query, const char *querytext);
+extern void EnableQueryId(void);
+
+extern bool query_id_enabled;
+
+/*
+ * Returns whether query identifier computation has been enabled, either
+ * directly in the GUC or by a module when the setting is 'auto'.
+ */
+static inline bool
+IsQueryIdEnabled(void)
+{
+	if (compute_query_id == COMPUTE_QUERY_ID_OFF)
+		return false;
+	if (compute_query_id == COMPUTE_QUERY_ID_ON)
+		return true;
+	return query_id_enabled;
+}
 
 #endif							/* QUERYJUMBLE_H */
-- 
2.20.1

#143

Justin Pryzby

pryzby@telsasoft.com

over 4 years ago

In reply to: Alvaro Herrera (#142)

Re: compute_query_id and pg_stat_statements

On Fri, May 14, 2021 at 07:50:13PM -0400, Alvaro Herrera wrote:

+++ b/doc/src/sgml/config.sgml
@@ -7643,7 +7643,12 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
identifier to be computed.  Note that an external module can
alternatively be used if the in-core query identifier computation
method is not acceptable.  In this case, in-core computation
-        must be disabled.  The default is <literal>off</literal>.
+        must be always disabled.
+        Valid values are <literal>off</literal> (always disabled),
+        <literal>on</literal> (always enabled) and <literal>auto</literal>,
+        which let modules such as <xref linkend="pgstatstatements"/>
+        automatically enable it.
+        The default is <literal>auto</literal>.

which lets

+/* True when a module requests query IDs and they're set auto */
+bool		query_id_enabled = false;

Does "they're" mean the GUC compute_query_id ?

+/*
+ * This should only be called if IsQueryIdEnabled()
+ * return true.
+ */
JumbleState *
JumbleQuery(Query *query, const char *querytext)

Should it Assert() that ?

Maybe you should update this too ?
doc/src/sgml/release-14.sgml

#144

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Alvaro Herrera (#142)

Re: compute_query_id and pg_stat_statements

On Sat, May 15, 2021 at 7:50 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

I took out the new WARNING in pg_stat_statements. It's not clear to me
that that's terribly useful (it stops working as soon as you have one
query in the pg_stat_statements stash and later disable everything).

If no query_id is calculated and you have entries in
pg_stat_statements, it means someone deliberately deactivated
compute_query_id. In that case it's clear that they know the GUC
exists, so there's no much point in warning them that they deactivated
it I think.

Maybe there is an useful warning to add, but I think it's independent of
changing the GUC behabior.

I'm fine with it.

Note that if you switch from compute_query_id = off + custom
query_id + pg_stat_statements to compute_query_id = auto then thing will
immediately break (as we instruct third-party plugins authors to error out in
that case), which is way better than breaking at the next random service
restart.

Hmm, ok. I tested pg_queryid and that behavior of throwing an error
seems quite unhelpful -- it basically makes every single query fail if
you set things wrong. I think that instruction is bogus and should be
reconsidered. Maybe pg_queryid could use a custom Boolean GUC that
tells it to overwrite the core query_id if that is enabled, or to just
silently do nothing in that case.

Unless I'm missing something, if we remove that instruction it means
that we encourage users to dynamically change the query_id source
without any safeguard, which will in the majority of case result in
unwanted behavior, going from duplicated entries, poor performance in
pg_stat_statements if that leads to more evictions, or even totally
bogus metrics if that leads to hash collision.

While thinking about this patch it occurred to that an useful gadget
might be to let pg_stat_statements be loaded always, but with
compute_query_id=false; so it's never active; except if you
ALTER {DATABASE, USER} foo SET (compute_query_id=on);
so that it's possible to enable it selectively. I think this doesn't
currently achieve anything because pgss_store is always called
regardless of query ID being active (so you'd always have at least one
function call as performance penalty, only that the work would be for
naught), but that seems a simple change to make. I didn't look closely
to see what other things would need patched.

Couldn't it already be achieved with ALTER [ DATABASE | USER ] foo SET
pg_stat_statements.track = [ none | top | all ] ?

#145

Bruce Momjian

bruce@momjian.us

over 4 years ago

In reply to: Julien Rouhaud (#144)

Re: compute_query_id and pg_stat_statements

On Sat, May 15, 2021 at 04:09:32PM +0800, Julien Rouhaud wrote:

While thinking about this patch it occurred to that an useful gadget
might be to let pg_stat_statements be loaded always, but with
compute_query_id=false; so it's never active; except if you
ALTER {DATABASE, USER} foo SET (compute_query_id=on);
so that it's possible to enable it selectively. I think this doesn't
currently achieve anything because pgss_store is always called
regardless of query ID being active (so you'd always have at least one
function call as performance penalty, only that the work would be for
naught), but that seems a simple change to make. I didn't look closely
to see what other things would need patched.

Couldn't it already be achieved with ALTER [ DATABASE | USER ] foo SET
pg_stat_statements.track = [ none | top | all ] ?

I am no longer the committer in charge of this feature, but I would like
to remind the group that beta1 will be wrapped on Monday, and it is hard
to change non-read-only GUCs after beta since the settings are embedded
in postgresql.conf. There is also a release notes item that probably
will need to be adjusted.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#146

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Bruce Momjian (#145)

Re: compute_query_id and pg_stat_statements

On Sat, May 15, 2021 at 10:00 PM Bruce Momjian <bruce@momjian.us> wrote:

I am no longer the committer in charge of this feature, but I would like
to remind the group that beta1 will be wrapped on Monday, and it is hard
to change non-read-only GUCs after beta since the settings are embedded
in postgresql.conf. There is also a release notes item that probably
will need to be adjusted.

It seems that everyone agrees on the definition of compute_query_id in
Álvaro's v4 patch (module Justin's comments) so this could be
committed before the beta1. If the safeguards for custom query_id or
GUC misconfiguration have to be tweaked it shouldn't impact the GUC in
any way.

#147

Alvaro Herrera

alvherre@alvh.no-ip.org

over 4 years ago

In reply to: Julien Rouhaud (#146)

Re: compute_query_id and pg_stat_statements

On 2021-May-16, Julien Rouhaud wrote:

On Sat, May 15, 2021 at 10:00 PM Bruce Momjian <bruce@momjian.us> wrote:

I am no longer the committer in charge of this feature, but I would like
to remind the group that beta1 will be wrapped on Monday, and it is hard
to change non-read-only GUCs after beta since the settings are embedded
in postgresql.conf. There is also a release notes item that probably
will need to be adjusted.

It seems that everyone agrees on the definition of compute_query_id in
ï¿½lvaro's v4 patch (module Justin's comments) so this could be
committed before the beta1. If the safeguards for custom query_id or
GUC misconfiguration have to be tweaked it shouldn't impact the GUC in
any way.

Pushed after adding the fixes from Justin. Note I didn't include the
WARNING in pg_stat_statements when this is disabled; if anybody wants to
argue for that, let's add it separately.

I commented out the release notes para that is now wrong. What remains
is this:

Move query hash computation from pg_stat_statements to the core server (Julien Rouhaud)

We could perhaps add something like

Extension pg_stat_statements continues to work without requiring any
configuration changes.

but that seems a bit pointless. Or maybe

Extension pg_stat_statements automatically enables query identifier
computation if compute_query_id is set to auto. Third-party modules
to compute query identifiers can be installed and used if this is set
to off.

I wonder why the initial line says "query hash" instead of "query
identifier". Do we want to say "hash" everywhere? Why didn't we name
the GUC "compute_query_hash" in that case?

Anyway, let me remind you that it is pretty common to require initdb
during the beta period.

--
ï¿½lvaro Herrera Valdivia, Chile

#148

Bruce Momjian

bruce@momjian.us

over 4 years ago

In reply to: Alvaro Herrera (#147)

Re: compute_query_id and pg_stat_statements

On Sat, May 15, 2021 at 02:21:59PM -0400, ï¿½lvaro Herrera wrote:

I commented out the release notes para that is now wrong. What remains
is this:

Move query hash computation from pg_stat_statements to the core server (Julien Rouhaud)

We could perhaps add something like

Extension pg_stat_statements continues to work without requiring any
configuration changes.

but that seems a bit pointless. Or maybe

Extension pg_stat_statements automatically enables query identifier
computation if compute_query_id is set to auto. Third-party modules
to compute query identifiers can be installed and used if this is set
to off.

OK, new text is:

<listitem>

<para>
Move query hash computation from pg_stat_statements to the core
server (Julien Rouhaud)
</para>

<para>
The new server variable compute_query_id's default of 'auto' will
automatically enable query id computation when this extension
is loaded.
</para>
</listitem>

I also added Alvaro as an author of the compute_query_id item.

I wonder why the initial line says "query hash" instead of "query
identifier". Do we want to say "hash" everywhere? Why didn't we name
the GUC "compute_query_hash" in that case?

It is queryid (no underscore) in pg_stat_statements, which was a whole
different discussion. ;-)

Anyway, let me remind you that it is pretty common to require initdb
during the beta period.

True.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#149

Alvaro Herrera

alvherre@alvh.no-ip.org

over 4 years ago

In reply to: Bruce Momjian (#148)

Re: compute_query_id and pg_stat_statements

On 2021-May-15, Bruce Momjian wrote:

On Sat, May 15, 2021 at 02:21:59PM -0400, ï¿½lvaro Herrera wrote:

I wonder why the initial line says "query hash" instead of "query
identifier". Do we want to say "hash" everywhere? Why didn't we name
the GUC "compute_query_hash" in that case?

It is queryid (no underscore) in pg_stat_statements, which was a whole
different discussion. ;-)

Yeah, I realize that, but I wonder if we shouldn't use the term "query
identifier" instead of "query hash" in that paragraph.

I also added Alvaro as an author of the compute_query_id item.

I've been wondering if I should ask to stick my name in other features I
helped get committed -- specifically the PQtrace() item and autovacuum
for partitioned tables. I'll go comment in the release notes thread.

--
ï¿½lvaro Herrera Valdivia, Chile
"Digital and video cameras have this adjustment and film cameras don't for the
same reason dogs and cats lick themselves: because they can." (Ken Rockwell)

#150

Bruce Momjian

bruce@momjian.us

over 4 years ago

In reply to: Bruce Momjian (#148)

Re: compute_query_id and pg_stat_statements

On Sat, May 15, 2021 at 05:32:58PM -0400, Bruce Momjian wrote:

OK, new text is:

<listitem>


<para>
Move query hash computation from pg_stat_statements to the core
server (Julien Rouhaud)
</para>

<para>
The new server variable compute_query_id's default of 'auto' will
automatically enable query id computation when this extension
is loaded.
</para>
</listitem>

I also added Alvaro as an author of the compute_query_id item.

--------------------------------------------------------------

Based on the commit message, adding Alvaro was incorrect, so I will
revert this change.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#151

Alvaro Herrera

alvherre@alvh.no-ip.org

over 4 years ago

In reply to: Bruce Momjian (#150)

Re: compute_query_id and pg_stat_statements

On 2021-May-15, Bruce Momjian wrote:

On Sat, May 15, 2021 at 05:32:58PM -0400, Bruce Momjian wrote:

I also added Alvaro as an author of the compute_query_id item.

--------------------------------------------------------------

Based on the commit message, adding Alvaro was incorrect, so I will
revert this change.

Agreed. My work on this one was janitorial.

--
ï¿½lvaro Herrera 39ï¿½49'30"S 73ï¿½17'W
"Hay que recordar que la existencia en el cosmos, y particularmente la
elaboraciï¿½n de civilizaciones dentro de ï¿½l no son, por desgracia,
nada idï¿½licas" (Ijon Tichy)

#152

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Alvaro Herrera (#151)

Re: compute_query_id and pg_stat_statements

Le dim. 16 mai 2021 à 11:23, Alvaro Herrera <alvherre@alvh.no-ip.org> a
écrit :

On 2021-May-15, Bruce Momjian wrote:

On Sat, May 15, 2021 at 05:32:58PM -0400, Bruce Momjian wrote:

I also added Alvaro as an author of the compute_query_id item.

--------------------------------------------------------------

Based on the commit message, adding Alvaro was incorrect, so I will
revert this change.

Agreed. My work on this one was janitorial.

Thanks a lot Alvaro and Bruce!

Show quoted text

#153

Bruce Momjian

bruce@momjian.us

over 4 years ago

In reply to: Alvaro Herrera (#149)

Re: compute_query_id and pg_stat_statements

On Sat, May 15, 2021 at 07:01:25PM -0400, ï¿½lvaro Herrera wrote:

On 2021-May-15, Bruce Momjian wrote:

On Sat, May 15, 2021 at 02:21:59PM -0400, ï¿½lvaro Herrera wrote:

I wonder why the initial line says "query hash" instead of "query
identifier". Do we want to say "hash" everywhere? Why didn't we name
the GUC "compute_query_hash" in that case?

It is queryid (no underscore) in pg_stat_statements, which was a whole
different discussion. ;-)

Yeah, I realize that, but I wonder if we shouldn't use the term "query
identifier" instead of "query hash" in that paragraph.

Yes, of course, you are right --- updated text:

<listitem>

<para>
If server variable compute_query_id is enabled, display the query
id in pg_stat_activity, EXPLAIN VERBOSE, csvlog, and optionally
in log_line_prefix (Julien Rouhaud)
</para>

<para>
A query id computed by an extension will also be displayed.
</para>
</listitem>

I also added Alvaro as an author of the compute_query_id item.

I've been wondering if I should ask to stick my name in other features I
helped get committed -- specifically the PQtrace() item and autovacuum
for partitioned tables. I'll go comment in the release notes thread.

Yes, done.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#154

Bruce Momjian

bruce@momjian.us

over 4 years ago

In reply to: Alvaro Herrera (#151)

Re: compute_query_id and pg_stat_statements

On Sat, May 15, 2021 at 11:23:25PM -0400, ï¿½lvaro Herrera wrote:

On 2021-May-15, Bruce Momjian wrote:

On Sat, May 15, 2021 at 05:32:58PM -0400, Bruce Momjian wrote:

I also added Alvaro as an author of the compute_query_id item.

--------------------------------------------------------------

Based on the commit message, adding Alvaro was incorrect, so I will
revert this change.

Agreed. My work on this one was janitorial.

OK, removed, thanks.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#155

Bruce Momjian

bruce@momjian.us

over 4 years ago

In reply to: Julien Rouhaud (#152)

Re: compute_query_id and pg_stat_statements

On Sun, May 16, 2021 at 08:39:33PM +0800, Julien Rouhaud wrote:

Le dim. 16 mai 2021 ï¿½ 11:23, Alvaro Herrera <alvherre@alvh.no-ip.org> a ï¿½critï¿½:

On 2021-May-15, Bruce Momjian wrote:

On Sat, May 15, 2021 at 05:32:58PM -0400, Bruce Momjian wrote:

I also added Alvaro as an author of the compute_query_id item.

ï¿½ ï¿½--------------------------------------------------------------

Based on the commit message, adding Alvaro was incorrect, so I will
revert this change.

Agreed.ï¿½ My work on this one was janitorial.

Thanks a lot Alvaro and Bruce!ï¿½

We are going to get to the goal line, one way or the other! ;-)

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#156

Andrew Dunstan

andrew@dunslane.net

over 4 years ago

In reply to: Bruce Momjian (#155)

Re: compute_query_id and pg_stat_statements

On 5/16/21 11:13 PM, Bruce Momjian wrote:

On Sun, May 16, 2021 at 08:39:33PM +0800, Julien Rouhaud wrote:

Le dim. 16 mai 2021 à 11:23, Alvaro Herrera <alvherre@alvh.no-ip.org> a écrit :

On 2021-May-15, Bruce Momjian wrote:

On Sat, May 15, 2021 at 05:32:58PM -0400, Bruce Momjian wrote:

I also added Alvaro as an author of the compute_query_id item.

--------------------------------------------------------------

Based on the commit message, adding Alvaro was incorrect, so I will
revert this change.

Agreed. My work on this one was janitorial.

Thanks a lot Alvaro and Bruce!

We are going to get to the goal line, one way or the other! ;-)

I've discussed this with Alvaro. He's not planning to do anything more
regarding this and I think we can close the open item.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#157

Bruce Momjian

bruce@momjian.us

over 4 years ago

In reply to: Andrew Dunstan (#156)

Re: compute_query_id and pg_stat_statements

On Fri, May 21, 2021 at 02:19:13PM -0400, Andrew Dunstan wrote:

On 5/16/21 11:13 PM, Bruce Momjian wrote:

On Sun, May 16, 2021 at 08:39:33PM +0800, Julien Rouhaud wrote:

Le dim. 16 mai 2021 ï¿½ 11:23, Alvaro Herrera <alvherre@alvh.no-ip.org> a ï¿½critï¿½:

On 2021-May-15, Bruce Momjian wrote:

On Sat, May 15, 2021 at 05:32:58PM -0400, Bruce Momjian wrote:

I also added Alvaro as an author of the compute_query_id item.

ï¿½ ï¿½--------------------------------------------------------------

Based on the commit message, adding Alvaro was incorrect, so I will
revert this change.

Agreed.ï¿½ My work on this one was janitorial.

Thanks a lot Alvaro and Bruce!ï¿½

We are going to get to the goal line, one way or the other! ;-)

I've discussed this with Alvaro. He's not planning to do anything more
regarding this and I think we can close the open item.

Works for me.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#158

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Bruce Momjian (#157)

Re: compute_query_id and pg_stat_statements

Le sam. 22 mai 2021 à 02:27, Bruce Momjian <bruce@momjian.us> a écrit :

On Fri, May 21, 2021 at 02:19:13PM -0400, Andrew Dunstan wrote:

On 5/16/21 11:13 PM, Bruce Momjian wrote:

On Sun, May 16, 2021 at 08:39:33PM +0800, Julien Rouhaud wrote:

Le dim. 16 mai 2021 à 11:23, Alvaro Herrera <alvherre@alvh.no-ip.org>

a écrit :

On 2021-May-15, Bruce Momjian wrote:

On Sat, May 15, 2021 at 05:32:58PM -0400, Bruce Momjian wrote:

I also added Alvaro as an author of the compute_query_id item.

--------------------------------------------------------------

Based on the commit message, adding Alvaro was incorrect, so I

will

revert this change.

Agreed. My work on this one was janitorial.

Thanks a lot Alvaro and Bruce!

We are going to get to the goal line, one way or the other! ;-)

I've discussed this with Alvaro. He's not planning to do anything more
regarding this and I think we can close the open item.

Works for me.

works for me too.

Show quoted text