Problems around compute_query_id

Started by Michael Paquierabout 5 years ago8 messageshackers
Jump to latest
#1Michael Paquier
michael@paquier.xyz

Hi all,

Starting a new thread as the one that has introduced compute_query_id
is already long enough.

Fujii-san has reported on Twitter that enabling the computation of
query IDs does not work properly with log_statement as the query ID is
calculated at parse analyze time and the query is logged before that.
As far as I can see, that's really a problem as any queries logged are
combined with a query ID of 0, and log parsers would not really be
able to use that, even if the information provided by for example
log_duration gives the computed query ID prevent in pg_stat_activity.

While making the feature run on some test server, I have noticed that
%Q would log some garbage query ID for autovacuum workers that's kept
around. That looks wrong.

Thoughts?
--
Michael

#2Julien Rouhaud
rjuju123@gmail.com
In reply to: Michael Paquier (#1)
Re: Problems around compute_query_id

On Mon, Apr 12, 2021 at 03:12:40PM +0900, Michael Paquier wrote:

Hi all,

Starting a new thread as the one that has introduced compute_query_id
is already long enough.

Fujii-san has reported on Twitter that enabling the computation of
query IDs does not work properly with log_statement as the query ID is
calculated at parse analyze time and the query is logged before that.
As far as I can see, that's really a problem as any queries logged are
combined with a query ID of 0, and log parsers would not really be
able to use that, even if the information provided by for example
log_duration gives the computed query ID prevent in pg_stat_activity.

I don't see any way around that. The goal of log_statements is to log all
syntactically valid queries, including otherwise invalid queries. For
instance, it would log

SELECT notacolumn;

and there's no way to compute a queryid in that case. I think that
log_statements already causes some issues with log parsers. At least pgbadger
recommends to avoid using that:

"Do not enable log_statement as its log format will not be parsed by pgBadger."

I think we should simply document that %Q is not compatible with
log_statements.

While making the feature run on some test server, I have noticed that
%Q would log some garbage query ID for autovacuum workers that's kept
around. That looks wrong.

I've not been able to reproduce it, do you have some hint on how to do it?

Maybe setting a zero queryid at the beginning of AutoVacWorkerMain() could fix
the problem?

#3Michael Banck
michael.banck@credativ.de
In reply to: Julien Rouhaud (#2)
Re: Problems around compute_query_id

Hi,

On Mon, Apr 12, 2021 at 02:56:59PM +0800, Julien Rouhaud wrote:

On Mon, Apr 12, 2021 at 03:12:40PM +0900, Michael Paquier wrote:

Fujii-san has reported on Twitter that enabling the computation of
query IDs does not work properly with log_statement as the query ID is
calculated at parse analyze time and the query is logged before that.
As far as I can see, that's really a problem as any queries logged are
combined with a query ID of 0, and log parsers would not really be
able to use that, even if the information provided by for example
log_duration gives the computed query ID prevent in pg_stat_activity.

I don't see any way around that. The goal of log_statements is to log all
syntactically valid queries, including otherwise invalid queries. For
instance, it would log

SELECT notacolumn;

and there's no way to compute a queryid in that case. I think that
log_statements already causes some issues with log parsers. At least pgbadger
recommends to avoid using that:

"Do not enable log_statement as its log format will not be parsed by pgBadger."

I think we should simply document that %Q is not compatible with
log_statements.

What about log_statement_sample_rate ? Does compute_query_id have the
same problem with that?

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB M�nchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 M�nchengladbach
Gesch�ftsf�hrung: Dr. Michael Meskes, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

#4Julien Rouhaud
rjuju123@gmail.com
In reply to: Michael Banck (#3)
Re: Problems around compute_query_id

On Mon, Apr 12, 2021 at 09:20:07AM +0200, Michael Banck wrote:

What about log_statement_sample_rate ? Does compute_query_id have the
same problem with that?

No, log_statement_sample_rate samples log_min_duration_statements, not
log_statements so it works as expected.

#5Julien Rouhaud
rjuju123@gmail.com
In reply to: Julien Rouhaud (#4)
Re: Problems around compute_query_id

On Mon, Apr 12, 2021 at 03:26:33PM +0800, Julien Rouhaud wrote:

On Mon, Apr 12, 2021 at 09:20:07AM +0200, Michael Banck wrote:

What about log_statement_sample_rate ? Does compute_query_id have the
same problem with that?

No, log_statement_sample_rate samples log_min_duration_statements, not
log_statements so it works as expected.

While on that topic, it's probably worth mentioning that log_duration is now
way more useful if you have the queryid in you log_line_prefix. It avoids to
log the full query text while still being able to know what was the underlying
normalized query by dumping the content of pg_stat_statements.

#6Julien Rouhaud
rjuju123@gmail.com
In reply to: Julien Rouhaud (#2)
Re: Problems around compute_query_id

On Mon, Apr 12, 2021 at 02:56:59PM +0800, Julien Rouhaud wrote:

I think we should simply document that %Q is not compatible with
log_statements.

Hearing no objection I documented that limitation.

While making the feature run on some test server, I have noticed that
%Q would log some garbage query ID for autovacuum workers that's kept
around. That looks wrong.

I've not been able to reproduce it, do you have some hint on how to do it?

Maybe setting a zero queryid at the beginning of AutoVacWorkerMain() could fix
the problem?

It turns out that the problem was simply that some process can inherit a
PgBackendStatus for which a previous backend reported a queryid. For processes
like autovacuum process, they will never report a new identifier so they
reported the previous one. Resetting the field like the other ones in
pgstat_bestart() fixes the problem for autovacuum and any similar process.

Attachments:

v1-0001-Additional-fixes-for-compute_query_id.patchtext/x-diff; charset=us-asciiDownload+11-1
#7Bruce Momjian
bruce@momjian.us
In reply to: Julien Rouhaud (#6)
Re: Problems around compute_query_id

On Thu, Apr 15, 2021 at 03:43:59PM +0800, Julien Rouhaud wrote:

On Mon, Apr 12, 2021 at 02:56:59PM +0800, Julien Rouhaud wrote:

I think we should simply document that %Q is not compatible with
log_statements.

Hearing no objection I documented that limitation.

While making the feature run on some test server, I have noticed that
%Q would log some garbage query ID for autovacuum workers that's kept
around. That looks wrong.

I've not been able to reproduce it, do you have some hint on how to do it?

Maybe setting a zero queryid at the beginning of AutoVacWorkerMain() could fix
the problem?

It turns out that the problem was simply that some process can inherit a
PgBackendStatus for which a previous backend reported a queryid. For processes
like autovacuum process, they will never report a new identifier so they
reported the previous one. Resetting the field like the other ones in
pgstat_bestart() fixes the problem for autovacuum and any similar process.

I slightly adjusted the patch and applied it. Thanks. I think this
closes all the open issues around query_id. :-)

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

Attachments:

master.difftext/x-diff; charset=us-asciiDownload+11-0
#8Julien Rouhaud
rjuju123@gmail.com
In reply to: Bruce Momjian (#7)
Re: Problems around compute_query_id

On Tue, Apr 20, 2021 at 12:59:10PM -0400, Bruce Momjian wrote:

On Thu, Apr 15, 2021 at 03:43:59PM +0800, Julien Rouhaud wrote:

On Mon, Apr 12, 2021 at 02:56:59PM +0800, Julien Rouhaud wrote:

I think we should simply document that %Q is not compatible with
log_statements.

Hearing no objection I documented that limitation.

While making the feature run on some test server, I have noticed that
%Q would log some garbage query ID for autovacuum workers that's kept
around. That looks wrong.

I've not been able to reproduce it, do you have some hint on how to do it?

Maybe setting a zero queryid at the beginning of AutoVacWorkerMain() could fix
the problem?

It turns out that the problem was simply that some process can inherit a
PgBackendStatus for which a previous backend reported a queryid. For processes
like autovacuum process, they will never report a new identifier so they
reported the previous one. Resetting the field like the other ones in
pgstat_bestart() fixes the problem for autovacuum and any similar process.

I slightly adjusted the patch and applied it. Thanks. I think this
closes all the open issues around query_id. :-)

Thanks a lot Bruce! There was also [1]/messages/by-id/CAJcOf-fXyb2QiDbwftD813UF70w-+BsK-03bFp1GrijXU9GQYQ@mail.gmail.com, but Michael already committed the
proposed fix, so I also think that all open issues for query_id are not taken
care of!

[1]: /messages/by-id/CAJcOf-fXyb2QiDbwftD813UF70w-+BsK-03bFp1GrijXU9GQYQ@mail.gmail.com