Problems around compute_query_id
Hi all,
Starting a new thread as the one that has introduced compute_query_id
is already long enough.
Fujii-san has reported on Twitter that enabling the computation of
query IDs does not work properly with log_statement as the query ID is
calculated at parse analyze time and the query is logged before that.
As far as I can see, that's really a problem as any queries logged are
combined with a query ID of 0, and log parsers would not really be
able to use that, even if the information provided by for example
log_duration gives the computed query ID prevent in pg_stat_activity.
While making the feature run on some test server, I have noticed that
%Q would log some garbage query ID for autovacuum workers that's kept
around. That looks wrong.
Thoughts?
--
Michael
On Mon, Apr 12, 2021 at 03:12:40PM +0900, Michael Paquier wrote:
Hi all,
Starting a new thread as the one that has introduced compute_query_id
is already long enough.Fujii-san has reported on Twitter that enabling the computation of
query IDs does not work properly with log_statement as the query ID is
calculated at parse analyze time and the query is logged before that.
As far as I can see, that's really a problem as any queries logged are
combined with a query ID of 0, and log parsers would not really be
able to use that, even if the information provided by for example
log_duration gives the computed query ID prevent in pg_stat_activity.
I don't see any way around that. The goal of log_statements is to log all
syntactically valid queries, including otherwise invalid queries. For
instance, it would log
SELECT notacolumn;
and there's no way to compute a queryid in that case. I think that
log_statements already causes some issues with log parsers. At least pgbadger
recommends to avoid using that:
"Do not enable log_statement as its log format will not be parsed by pgBadger."
I think we should simply document that %Q is not compatible with
log_statements.
While making the feature run on some test server, I have noticed that
%Q would log some garbage query ID for autovacuum workers that's kept
around. That looks wrong.
I've not been able to reproduce it, do you have some hint on how to do it?
Maybe setting a zero queryid at the beginning of AutoVacWorkerMain() could fix
the problem?
Hi,
On Mon, Apr 12, 2021 at 02:56:59PM +0800, Julien Rouhaud wrote:
On Mon, Apr 12, 2021 at 03:12:40PM +0900, Michael Paquier wrote:
Fujii-san has reported on Twitter that enabling the computation of
query IDs does not work properly with log_statement as the query ID is
calculated at parse analyze time and the query is logged before that.
As far as I can see, that's really a problem as any queries logged are
combined with a query ID of 0, and log parsers would not really be
able to use that, even if the information provided by for example
log_duration gives the computed query ID prevent in pg_stat_activity.I don't see any way around that. The goal of log_statements is to log all
syntactically valid queries, including otherwise invalid queries. For
instance, it would logSELECT notacolumn;
and there's no way to compute a queryid in that case. I think that
log_statements already causes some issues with log parsers. At least pgbadger
recommends to avoid using that:"Do not enable log_statement as its log format will not be parsed by pgBadger."
I think we should simply document that %Q is not compatible with
log_statements.
What about log_statement_sample_rate ? Does compute_query_id have the
same problem with that?
Michael
--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de
credativ GmbH, HRB M�nchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 M�nchengladbach
Gesch�ftsf�hrung: Dr. Michael Meskes, Sascha Heuer
Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz
On Mon, Apr 12, 2021 at 09:20:07AM +0200, Michael Banck wrote:
What about log_statement_sample_rate ? Does compute_query_id have the
same problem with that?
No, log_statement_sample_rate samples log_min_duration_statements, not
log_statements so it works as expected.
On Mon, Apr 12, 2021 at 03:26:33PM +0800, Julien Rouhaud wrote:
On Mon, Apr 12, 2021 at 09:20:07AM +0200, Michael Banck wrote:
What about log_statement_sample_rate ? Does compute_query_id have the
same problem with that?No, log_statement_sample_rate samples log_min_duration_statements, not
log_statements so it works as expected.
While on that topic, it's probably worth mentioning that log_duration is now
way more useful if you have the queryid in you log_line_prefix. It avoids to
log the full query text while still being able to know what was the underlying
normalized query by dumping the content of pg_stat_statements.
On Mon, Apr 12, 2021 at 02:56:59PM +0800, Julien Rouhaud wrote:
I think we should simply document that %Q is not compatible with
log_statements.
Hearing no objection I documented that limitation.
While making the feature run on some test server, I have noticed that
%Q would log some garbage query ID for autovacuum workers that's kept
around. That looks wrong.I've not been able to reproduce it, do you have some hint on how to do it?
Maybe setting a zero queryid at the beginning of AutoVacWorkerMain() could fix
the problem?
It turns out that the problem was simply that some process can inherit a
PgBackendStatus for which a previous backend reported a queryid. For processes
like autovacuum process, they will never report a new identifier so they
reported the previous one. Resetting the field like the other ones in
pgstat_bestart() fixes the problem for autovacuum and any similar process.
Attachments:
v1-0001-Additional-fixes-for-compute_query_id.patchtext/x-diff; charset=us-asciiDownload+11-1
On Thu, Apr 15, 2021 at 03:43:59PM +0800, Julien Rouhaud wrote:
On Mon, Apr 12, 2021 at 02:56:59PM +0800, Julien Rouhaud wrote:
I think we should simply document that %Q is not compatible with
log_statements.Hearing no objection I documented that limitation.
While making the feature run on some test server, I have noticed that
%Q would log some garbage query ID for autovacuum workers that's kept
around. That looks wrong.I've not been able to reproduce it, do you have some hint on how to do it?
Maybe setting a zero queryid at the beginning of AutoVacWorkerMain() could fix
the problem?It turns out that the problem was simply that some process can inherit a
PgBackendStatus for which a previous backend reported a queryid. For processes
like autovacuum process, they will never report a new identifier so they
reported the previous one. Resetting the field like the other ones in
pgstat_bestart() fixes the problem for autovacuum and any similar process.
I slightly adjusted the patch and applied it. Thanks. I think this
closes all the open issues around query_id. :-)
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
If only the physical world exists, free will is an illusion.
Attachments:
master.difftext/x-diff; charset=us-asciiDownload+11-0
On Tue, Apr 20, 2021 at 12:59:10PM -0400, Bruce Momjian wrote:
On Thu, Apr 15, 2021 at 03:43:59PM +0800, Julien Rouhaud wrote:
On Mon, Apr 12, 2021 at 02:56:59PM +0800, Julien Rouhaud wrote:
I think we should simply document that %Q is not compatible with
log_statements.Hearing no objection I documented that limitation.
While making the feature run on some test server, I have noticed that
%Q would log some garbage query ID for autovacuum workers that's kept
around. That looks wrong.I've not been able to reproduce it, do you have some hint on how to do it?
Maybe setting a zero queryid at the beginning of AutoVacWorkerMain() could fix
the problem?It turns out that the problem was simply that some process can inherit a
PgBackendStatus for which a previous backend reported a queryid. For processes
like autovacuum process, they will never report a new identifier so they
reported the previous one. Resetting the field like the other ones in
pgstat_bestart() fixes the problem for autovacuum and any similar process.I slightly adjusted the patch and applied it. Thanks. I think this
closes all the open issues around query_id. :-)
Thanks a lot Bruce! There was also [1]/messages/by-id/CAJcOf-fXyb2QiDbwftD813UF70w-+BsK-03bFp1GrijXU9GQYQ@mail.gmail.com, but Michael already committed the
proposed fix, so I also think that all open issues for query_id are not taken
care of!
[1]: /messages/by-id/CAJcOf-fXyb2QiDbwftD813UF70w-+BsK-03bFp1GrijXU9GQYQ@mail.gmail.com