Re: RFC: Logging plan of the running query
Hi!
I'm here to answer your questions about contrib/pg_query_state.
I only took a quick look at pg_query_state, I have some questions.
pg_query_state seems using shm_mq to expose the plan information, but
there was a discussion that this kind of architecture would be tricky
to do properly [1].
Does pg_query_state handle difficulties listed on the discussion?[1]
/messages/by-id/9a50371e15e741e295accabc72a41df1@oss.nttdata.com
I doubt that it was the right link.
But on the topic I will say that extension really use shared memory,
interaction is implemented by sending / receiving messages. This
architecture provides the required reliability and convenience.
It seems the caller of the pg_query_state() has to wait until the
target process pushes the plan information into shared memory, can it
lead to deadlock situations?
I came up with this question because when trying to make a view for
memory contexts of other backends, we encountered deadlock situations.
After all, we gave up view design and adopted sending signal and
logging.Discussion at the following URL.
/messages/by-id/9a50371e15e741e295accabc72a41df1@oss.nttdata.com
Before extracting information about side process we check its state.
Information will only be retrieved for a process willing to provide it.
Otherwise, we will receive an error message about impossibility of
getting query execution statistics + process status. Also checking fact
of extracting your own status exists. This is even verified in tests.
Thanks for your attention.
Just in case, I am ready to discuss this topic in more detail.
About overhead:
I haven't measured it yet, but I believe that the overhead for backends
which are not called pg_log_current_plan() would be slight since the
patch just adds the logic for saving QueryDesc on ExecutorRun().
The overhead for backends which is called pg_log_current_plan() might
not slight, but since the target process are assumed dealing with
long-running query and the user want to know its plan, its overhead
would be worth the cost.
I think it would be useful for us to have couple of examples with a
different number of rows compared to using without this functionality.
Hope this helps.
--
Ekaterina Sokolova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
This patch no longer applies on top of HEAD, please submit a rebased version.
--
Daniel Gustafsson https://vmware.com/
On 2021-11-02 20:32, Ekaterina Sokolova wrote:
Thanks for your response!
Hi!
I'm here to answer your questions about contrib/pg_query_state.
I only took a quick look at pg_query_state, I have some questions.
pg_query_state seems using shm_mq to expose the plan information, but
there was a discussion that this kind of architecture would be tricky
to do properly [1].
Does pg_query_state handle difficulties listed on the discussion?[1]
/messages/by-id/9a50371e15e741e295accabc72a41df1@oss.nttdata.comI doubt that it was the right link.
Sorry for make you confused, here is the link.
/messages/by-id/CA+TgmobkpFV0UB67kzXuD36--OFHwz1bs=L_6PZbD4nxKqUQMw@mail.gmail.com
But on the topic I will say that extension really use shared memory,
interaction is implemented by sending / receiving messages. This
architecture provides the required reliability and convenience.
As described in the link, using shared memory for this kind of work
would need DSM and It'd be also necessary to exchange information
between requestor and responder.
For example, when I looked at a little bit of pg_query_state code, it
looks like the size of the queue is fixed at QUEUE_SIZE, and I wonder
how plans that exceed QUEUE_SIZE are handled.
It seems the caller of the pg_query_state() has to wait until the
target process pushes the plan information into shared memory, can it
lead to deadlock situations?
I came up with this question because when trying to make a view for
memory contexts of other backends, we encountered deadlock
situations.
After all, we gave up view design and adopted sending signal and
logging.Discussion at the following URL.
/messages/by-id/9a50371e15e741e295accabc72a41df1@oss.nttdata.comBefore extracting information about side process we check its state.
Information will only be retrieved for a process willing to provide
it. Otherwise, we will receive an error message about impossibility of
getting query execution statistics + process status. Also checking
fact of extracting your own status exists. This is even verified in
tests.Thanks for your attention.
Just in case, I am ready to discuss this topic in more detail.
I imagined the following procedure.
Does it cause dead lock in pg_query_state?
- session1
BEGIN; TRUNCATE t;
- session2
BEGIN; TRUNCATE t; -- wait
- session1
SELECT * FROM pg_query_state(<pid of session>); -- wait and dead locked?
About overhead:
I haven't measured it yet, but I believe that the overhead for
backends
which are not called pg_log_current_plan() would be slight since the
patch just adds the logic for saving QueryDesc on ExecutorRun().
The overhead for backends which is called pg_log_current_plan() might
not slight, but since the target process are assumed dealing with
long-running query and the user want to know its plan, its overhead
would be worth the cost.I think it would be useful for us to have couple of examples with a
different number of rows compared to using without this functionality.
Do you have any expectaion that the number of rows would affect the
performance of this functionality?
This patch adds some codes to ExecutorRun(), but I thought the number of
rows would not give impact on the performance.
--
Regards,
--
Atsushi Torikoshi
NTT DATA CORPORATION
Hi!
You forgot my last fix to build correctly on Mac. I have added it.
About our discussion of pg_query_state:
torikoshia писал 2021-11-04 15:49:
I doubt that it was the right link.
Sorry for make you confused, here is the link.
/messages/by-id/CA+TgmobkpFV0UB67kzXuD36--OFHwz1bs=L_6PZbD4nxKqUQMw@mail.gmail.com
Thank you. I'll see it soon.
I imagined the following procedure.
Does it cause dead lock in pg_query_state?- session1
BEGIN; TRUNCATE t;- session2
BEGIN; TRUNCATE t; -- wait- session1
SELECT * FROM pg_query_state(<pid of session>); -- wait and dead
locked?
As I know, pg_query_state use non-blocking read and write. I have wrote
few tests trying to deadlock it (on 14 version), but all finished
correctly.
Have a nice day. Please feel free to contact me if you need any further
information.
--
Ekaterina Sokolova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
v13-0002-log-running-query-plan.patchtext/x-diff; name=v13-0002-log-running-query-plan.patchDownload+355-58
On 2021-11-17 22:44, Ekaterina Sokolova wrote:
Hi!
You forgot my last fix to build correctly on Mac. I have added it.
Thanks for the notification!
Since the patch could not be applied to the HEAD anymore, I also updated
it.
About our discussion of pg_query_state:
torikoshia писал 2021-11-04 15:49:
I doubt that it was the right link.
Sorry for make you confused, here is the link.
/messages/by-id/CA+TgmobkpFV0UB67kzXuD36--OFHwz1bs=L_6PZbD4nxKqUQMw@mail.gmail.comThank you. I'll see it soon.
I imagined the following procedure.
Does it cause dead lock in pg_query_state?- session1
BEGIN; TRUNCATE t;- session2
BEGIN; TRUNCATE t; -- wait- session1
SELECT * FROM pg_query_state(<pid of session>); -- wait and dead
locked?As I know, pg_query_state use non-blocking read and write. I have
wrote few tests trying to deadlock it (on 14 version), but all
finished correctly.Have a nice day. Please feel free to contact me if you need any
further information.
Thanks for your information and help!
--
Regards,
--
Atsushi Torikoshi
NTT DATA CORPORATION
Attachments:
v14-0001-log-running-query-plan.patchtext/x-diff; name=v14-0001-log-running-query-plan.patchDownload+355-58
On 2021-11-26 12:39, torikoshia wrote:
Since the patch could not be applied to the HEAD anymore, I also
updated it.
Updated the patch for fixing compiler warning about the format on
windows.
--
Regards,
--
Atsushi Torikoshi
NTT DATA CORPORATION
Attachments:
v15-0001-log-running-query-plan.patchtext/x-diff; name=v15-0001-log-running-query-plan.patchDownload+355-58
On 2022-01-07 14:30, torikoshia wrote:
Updated the patch for fixing compiler warning about the format on
windows.
I got another compiler warning, updated the patch again.
--
Regards,
--
Atsushi Torikoshi
NTT DATA CORPORATION
Attachments:
v16-0001-log-running-query-plan.patchtext/x-diff; name=v16-0001-log-running-query-plan.patchDownload+355-58
On 2022/01/07 20:58, torikoshia wrote:
On 2022-01-07 14:30, torikoshia wrote:
Updated the patch for fixing compiler warning about the format on windows.
I got another compiler warning, updated the patch again.
Thanks for updating the patch!
I ran the following query every 0.1s by using \watch psql command from three different sessions while make installcheck test was running.
SELECT pg_log_query_plan(pid) FROM pg_stat_activity;
And then I got the segmentation fault as follows.
2022-01-07 21:40:32 JST [postmaster] LOG: server process (PID 51017) was terminated by signal 11: Segmentation fault: 11
2022-01-07 21:40:32 JST [postmaster] DETAIL: Failed process was running: select description, (test_conv(inbytes, 'utf8', 'utf8')).* from utf8_verification_inputs;
2022-01-07 21:40:32 JST [postmaster] LOG: terminating any other active server processes
2022-01-07 21:40:32 JST [postmaster] LOG: all server processes terminated; reinitializing
The backtrace I got from the core file was:
(lldb) target create --core "/cores/core.51017"
Core file '/cores/core.51017' (x86_64) was loaded.
(lldb) bt
* thread #1, stop reason = signal SIGSTOP
* frame #0: 0x00007fff20484552 libsystem_platform.dylib`_platform_strlen + 18
frame #1: 0x000000011076e36c postgres`dopr(target=0x00007ffedfd1b1d8, format="\n", args=0x00007ffedfd1b450) at snprintf.c:444:20
frame #2: 0x000000011076e133 postgres`pg_vsnprintf(str="Query Text: \x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f"..., count=1024, fmt="%s: %s\n", args=0x00007ffedfd1b450) at snprintf.c:195:2
frame #3: 0x000000011075e51d postgres`pvsnprintf(buf="Query Text: \x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f"..., len=1024, fmt="%s: %s\n", args=0x00007ffedfd1b450) at psprintf.c:110:13
frame #4: 0x000000011076052d postgres`appendStringInfoVA(str=0x00007fe56804bec0, fmt="%s: %s\n", args=0x00007ffedfd1b450) at stringinfo.c:149:13
frame #5: 0x0000000110760449 postgres`appendStringInfo(str=0x00007fe56804bec0, fmt="%s: %s\n") at stringinfo.c:103:12
frame #6: 0x000000011016f6cf postgres`ExplainProperty(qlabel="Query Text", unit=0x0000000000000000, value="", numeric=false, es=0x00007fe56804be28) at explain.c:4390:5
frame #7: 0x000000011016eace postgres`ExplainPropertyText(qlabel="Query Text", value="", es=0x00007fe56804be28) at explain.c:4435:2
frame #8: 0x000000011016ea97 postgres`ExplainQueryText(es=0x00007fe56804be28, queryDesc=0x00007fe568022320) at explain.c:969:3
frame #9: 0x000000011016faeb postgres`ProcessLogCurrentPlanInterrupt at explain.c:5009:2
frame #10: 0x00000001104dda84 postgres`ProcessInterrupts at postgres.c:3373:3
frame #11: 0x0000000110c81c57 plpgsql.so`exec_stmts(estate=0x00007ffedfd1bab8, stmts=0x00007fe568050e00) at pl_exec.c:1994:3
frame #12: 0x0000000110c80a78 plpgsql.so`exec_stmt_block(estate=0x00007ffedfd1bab8, block=0x00007fe56801c570) at pl_exec.c:1777:9
frame #13: 0x0000000110c81c91 plpgsql.so`exec_stmts(estate=0x00007ffedfd1bab8, stmts=0x00007fe56801c5c8) at pl_exec.c:1999:10
frame #14: 0x0000000110c80f21 plpgsql.so`exec_stmt_block(estate=0x00007ffedfd1bab8, block=0x00007fe56801c678) at pl_exec.c:1926:8
frame #15: 0x0000000110c7ed92 plpgsql.so`exec_toplevel_block(estate=0x00007ffedfd1bab8, block=0x00007fe56801c678) at pl_exec.c:1617:7
frame #16: 0x0000000110c7d3e4 plpgsql.so`plpgsql_exec_function(func=0x00007fe56a00aa20, fcinfo=0x00007fe56803c888, simple_eval_estate=0x0000000000000000, simple_eval_resowner=0x0000000000000000, procedure_resowner=0x0000000000000000, atomic=true) at pl_exec.c:611:7
frame #17: 0x0000000110c9bf12 plpgsql.so`plpgsql_call_handler(fcinfo=0x00007fe56803c888) at pl_handler.c:277:13
frame #18: 0x0000000110228b06 postgres`ExecInterpExpr(state=0x00007fe56803bf88, econtext=0x00007fe56803ba70, isnull=0x00007ffedfd1bf7f) at execExprInterp.c:725:8
frame #19: 0x000000011024858b postgres`ExecEvalExprSwitchContext(state=0x00007fe56803bf88, econtext=0x00007fe56803ba70, isNull=0x00007ffedfd1bf7f) at executor.h:339:13
frame #20: 0x00000001102482aa postgres`ExecProject(projInfo=0x00007fe56803bf80) at executor.h:373:9
frame #21: 0x0000000110247e2f postgres`ExecScan(node=0x00007fe56803b958, accessMtd=(postgres`SeqNext at nodeSeqscan.c:51), recheckMtd=(postgres`SeqRecheck at nodeSeqscan.c:90)) at execScan.c:238:12
frame #22: 0x000000011028c575 postgres`ExecSeqScan(pstate=0x00007fe56803b958) at nodeSeqscan.c:112:9
frame #23: 0x000000011023bbe2 postgres`ExecProcNode(node=0x00007fe56803b958) at executor.h:257:9
frame #24: 0x0000000110237471 postgres`ExecutePlan(estate=0x00007fe56803b720, planstate=0x00007fe56803b958, use_parallel_mode=false, operation=CMD_SELECT, sendTuples=true, numberTuples=0, direction=ForwardScanDirection, dest=0x00007fe568046ac8, execute_once=true) at execMain.c:1561:10
frame #25: 0x0000000110237320 postgres`standard_ExecutorRun(queryDesc=0x00007fe568037320, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:371:3
frame #26: 0x00000001102370b8 postgres`ExecutorRun(queryDesc=0x00007fe568037320, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:313:3
frame #27: 0x00000001104e6bdb postgres`PortalRunSelect(portal=0x00007fe567825920, forward=true, count=0, dest=0x00007fe568046ac8) at pquery.c:921:4
frame #28: 0x00000001104e65ba postgres`PortalRun(portal=0x00007fe567825920, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x00007fe568046ac8, altdest=0x00007fe568046ac8, qc=0x00007ffedfd1c350) at pquery.c:765:18
frame #29: 0x00000001104e1861 postgres`exec_simple_query(query_string="select description, (test_conv(inbytes, 'utf8', 'utf8')).* from utf8_verification_inputs;") at postgres.c:1216:10
frame #30: 0x00000001104e099c postgres`PostgresMain(dbname="regression", username="postgres") at postgres.c:4505:7
frame #31: 0x00000001103ee2a2 postgres`BackendRun(port=0x00007fe567704080) at postmaster.c:4594:2
frame #32: 0x00000001103ed868 postgres`BackendStartup(port=0x00007fe567704080) at postmaster.c:4322:3
frame #33: 0x00000001103ec64c postgres`ServerLoop at postmaster.c:1802:7
frame #34: 0x00000001103e9e29 postgres`PostmasterMain(argc=3, argv=0x00007fe567405e70) at postmaster.c:1474:11
frame #35: 0x00000001102c8b69 postgres`main(argc=3, argv=0x00007fe567405e70) at main.c:198:3
frame #36: 0x00007fff2045cf3d libdyld.dylib`start + 1
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Hi,
On Fri, Jan 07, 2022 at 09:54:31PM +0900, Fujii Masao wrote:
On 2022/01/07 20:58, torikoshia wrote:
On 2022-01-07 14:30, torikoshia wrote:
Updated the patch for fixing compiler warning about the format on windows.
I got another compiler warning, updated the patch again.
Thanks for updating the patch!
I ran the following query every 0.1s by using \watch psql command from three different sessions while make installcheck test was running.
SELECT pg_log_query_plan(pid) FROM pg_stat_activity;
And then I got the segmentation fault as follows.
2022-01-07 21:40:32 JST [postmaster] LOG: server process (PID 51017) was terminated by signal 11: Segmentation fault: 11
2022-01-07 21:40:32 JST [postmaster] DETAIL: Failed process was running: select description, (test_conv(inbytes, 'utf8', 'utf8')).* from utf8_verification_inputs;
2022-01-07 21:40:32 JST [postmaster] LOG: terminating any other active server processes
2022-01-07 21:40:32 JST [postmaster] LOG: all server processes terminated; reinitializingThe backtrace I got from the core file was:
(lldb) target create --core "/cores/core.51017"
Core file '/cores/core.51017' (x86_64) was loaded.
(lldb) bt
* thread #1, stop reason = signal SIGSTOP
* frame #0: 0x00007fff20484552 libsystem_platform.dylib`_platform_strlen + 18
frame #1: 0x000000011076e36c postgres`dopr(target=0x00007ffedfd1b1d8, format="\n", args=0x00007ffedfd1b450) at snprintf.c:444:20
frame #2: 0x000000011076e133 postgres`pg_vsnprintf(str="Query Text: \x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f"..., count=1024, fmt="%s: %s\n", args=0x00007ffedfd1b450) at snprintf.c:195:2
On top of the visibly use-after-pfree memory issue, the patch doesn't apply
anymore:
http://cfbot.cputube.org/patch_36_3142.log
=== Applying patches on top of PostgreSQL commit ID a18b6d2dc288dfa6e7905ede1d4462edd6a8af47 ===
=== applying patch ./v16-0001-log-running-query-plan.patch
[...]
patching file src/test/regress/expected/misc_functions.out
Hunk #1 FAILED at 134.
Hunk #2 succeeded at 163 (offset 7 lines).
Hunk #3 succeeded at 180 (offset 7 lines).
1 out of 3 hunks FAILED -- saving rejects to file src/test/regress/expected/misc_functions.out.rej
patching file src/test/regress/sql/misc_functions.sql
Hunk #1 FAILED at 31.
1 out of 1 hunk FAILED -- saving rejects to file src/test/regress/sql/misc_functions.sql.rej
I'm switching the patch to Waiting on Author.
On 2022-01-14 15:38, Julien Rouhaud wrote:
Hi,
On Fri, Jan 07, 2022 at 09:54:31PM +0900, Fujii Masao wrote:
I ran the following query every 0.1s by using \watch psql command from
three different sessions while make installcheck test was running.SELECT pg_log_query_plan(pid) FROM pg_stat_activity;
And then I got the segmentation fault as follows.
2022-01-07 21:40:32 JST [postmaster] LOG: server process (PID 51017)
was terminated by signal 11: Segmentation fault: 11
2022-01-07 21:40:32 JST [postmaster] DETAIL: Failed process was
running: select description, (test_conv(inbytes, 'utf8', 'utf8')).*
from utf8_verification_inputs;
2022-01-07 21:40:32 JST [postmaster] LOG: terminating any other
active server processes
2022-01-07 21:40:32 JST [postmaster] LOG: all server processes
terminated; reinitializingThe backtrace I got from the core file was:
(lldb) target create --core "/cores/core.51017"
Core file '/cores/core.51017' (x86_64) was loaded.
(lldb) bt
* thread #1, stop reason = signal SIGSTOP
* frame #0: 0x00007fff20484552
libsystem_platform.dylib`_platform_strlen + 18
frame #1: 0x000000011076e36c
postgres`dopr(target=0x00007ffedfd1b1d8, format="\n",
args=0x00007ffedfd1b450) at snprintf.c:444:20
frame #2: 0x000000011076e133 postgres`pg_vsnprintf(str="Query
Text:
\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x
7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f
\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x
7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f
\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f"...,
count=1024, fmt="%s: %s\n", args=0x00007ffedfd1b450) at
snprintf.c:195:2
Thanks for testing!
I could reproduce the situation and confirmed almost the same backtrace.
The cause seems that the previous patch didn't do cleanup when the
transactions were aborted. It leaded segmentation faults when
referencing ActiveQueryDesc-> after the transaction were aborted.
The attached patch added cleanups when transactions and subtransactions
are aborted.
BTW, the latest patch for pg_log_backtrace() added a new function for
deduplicating codes for pg_log_*()[1]/messages/by-id/20211115194251.GP17618@telsasoft.com duplicate code to a new function CheckPostgresProcessId which can be used by pg_log_backtrace,.
It seems better to use it for pg_log_query_plan() after the patch is
merged.
[1]: /messages/by-id/20211115194251.GP17618@telsasoft.com duplicate code to a new function CheckPostgresProcessId which can be used by pg_log_backtrace,
/messages/by-id/20211115194251.GP17618@telsasoft.com
duplicate code to a new function
CheckPostgresProcessId which can be used by pg_log_backtrace,
On top of the visibly use-after-pfree memory issue, the patch doesn't
apply
Thanks for letting me know!
As mentioned above, I updated the patch.
Regards,
--
Atsushi Torikoshi
NTT DATA CORPORATION
Attachments:
v17-0001-log-running-query-plan.patchtext/x-diff; name=v17-0001-log-running-query-plan.patchDownload+312-23
On 2022/01/24 14:33, torikoshia wrote:
As mentioned above, I updated the patch.
Thanks for updating the patch!
Here are another review comments:
+LOG: plan of the query running on backend with PID 17793 is:
This seems not the same as what actually logged.
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process", pid)));
Like commit 7fa945b857 changed, this warning message should be "PID %d is not a PostgreSQL backend process"?
+ if (SendProcSignal(pid, PROCSIG_LOG_CURRENT_PLAN, InvalidBackendId) < 0)
proc->backendId should be specified instead of InvalidBackendId, to speed up the processing in SendProcSignal()?
+ PROCSIG_LOG_CURRENT_PLAN, /* ask backend to log plan of the current query */
+volatile sig_atomic_t LogCurrentPlanPending = false;
+extern void HandleLogCurrentPlanInterrupt(void);
+extern void ProcessLogCurrentPlanInterrupt(void);
Isn't it better to use the names that are more consistent with the function name, i.e., pg_log_query_plan? For example, PROCSIG_LOG_QUERY_PLAN instead of PROCSIG_LOG_CURRENT_PLAN?
+ ereport(LOG_SERVER_ONLY,
+ errmsg("backend with PID %d is not running a query",
+ MyProcPid));
errhidestmt(true) and errhidecontext(true) need to be added, don't they? Otherwise, for example, if pg_log_query_plan() is executed after debug_query_string is set but before ActiveQueryDesc is set, STATEMENT message would be output even though the message saying "not running a query" is output. Which seems confusing.
+ hash_seq_init(&status, GetLockMethodLocalHash());
+ while ((locallock = (LOCALLOCK *) hash_seq_search(&status)) != NULL)
+ {
+ if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_PAGE)
Why did you use the search for local lock hash instead of IsPageLockHeld flag variable, to check whether a page lock is held or not? Because there is the corner case where the interrupt is processed after the local lock is registered into the hash but before IsPageLockHeld is enabled?
There is the case where the request to log a query plan is skipped even while the target backend is running a query. If this happens, users can just retry pg_log_query_plan(). These things should be documented?
+ ereport(LOG_SERVER_ONLY,
+ errmsg("backend with PID %d is holding a page lock. Try again",
+ MyProcPid));
It seems more proper to output this message in DETAIL or HINT, instead. So how about something like the following messages?
LOG: could not log the query plan
DETAIL: query plan cannot be logged while page level lock is being held
HINT: Try pg_log_query_plan() after a few ....
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On 2022-01-27 20:18, Fujii Masao wrote:
Here are another review comments:
Thanks for reviewing!
+LOG: plan of the query running on backend with PID 17793 is:
This seems not the same as what actually logged.
Modified.
+ ereport(WARNING, + (errmsg("PID %d is not a PostgreSQL server process", pid)));Like commit 7fa945b857 changed, this warning message should be "PID %d
is not a PostgreSQL backend process"?
Modified.
+ if (SendProcSignal(pid, PROCSIG_LOG_CURRENT_PLAN, InvalidBackendId) <
0)proc->backendId should be specified instead of InvalidBackendId, to
speed up the processing in SendProcSignal()?
Agreed. Modified.
+ PROCSIG_LOG_CURRENT_PLAN, /* ask backend to log plan of the current query */ +volatile sig_atomic_t LogCurrentPlanPending = false; +extern void HandleLogCurrentPlanInterrupt(void); +extern void ProcessLogCurrentPlanInterrupt(void);Isn't it better to use the names that are more consistent with the
function name, i.e., pg_log_query_plan? For example,
PROCSIG_LOG_QUERY_PLAN instead of PROCSIG_LOG_CURRENT_PLAN?
Agreed.
I removed 'current' from the variable and function names and used
'query' instead.
+ ereport(LOG_SERVER_ONLY, + errmsg("backend with PID %d is not running a query", + MyProcPid));errhidestmt(true) and errhidecontext(true) need to be added, don't
they? Otherwise, for example, if pg_log_query_plan() is executed after
debug_query_string is set but before ActiveQueryDesc is set, STATEMENT
message would be output even though the message saying "not running a
query" is output. Which seems confusing.
Agreed. Added errhidestmt(true) and errhidecontext(true).
+ hash_seq_init(&status, GetLockMethodLocalHash()); + while ((locallock = (LOCALLOCK *) hash_seq_search(&status)) != NULL) + { + if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_PAGE)Why did you use the search for local lock hash instead of
IsPageLockHeld flag variable, to check whether a page lock is held or
not? Because there is the corner case where the interrupt is processed
after the local lock is registered into the hash but before
IsPageLockHeld is enabled?
As far as I read CheckAndSetLockHeld(), IsPageLockHeld can be used only
when USE_ASSERT_CHECKING is enabled.
Since removing USE_ASSERT_CHECKING from CheckAndSetLockHeld() would give
performance impact on every granting/removing local lock, I used the
search for local local hash.
There is the case where the request to log a query plan is skipped
even while the target backend is running a query. If this happens,
users can just retry pg_log_query_plan(). These things should be
documented?
Agreed.
Added following:
+ Note that there is the case where the request to log a query
+ plan is skipped even while the target backend is running a
+ query due to lock conflict avoidance.
+ If this happens, users can just retry pg_log_query_plan().
|
+ ereport(LOG_SERVER_ONLY, + errmsg("backend with PID %d is holding a page lock. Try again", + MyProcPid));It seems more proper to output this message in DETAIL or HINT,
instead. So how about something like the following messages?LOG: could not log the query plan
DETAIL: query plan cannot be logged while page level lock is being held
HINT: Try pg_log_query_plan() after a few ....
Agreed.
I felt the HINT message 'after a few ...' is difficult to describe, and
wrote as below.
| HINT: Retrying pg_log_query_plan() might succeed since the lock
duration of page level locks are usually short
How do you think?
--
Regards,
--
Atsushi Torikoshi
NTT DATA CORPORATION
Attachments:
v18-0001-log-running-query-plan.patchtext/x-diff; name=v18-0001-log-running-query-plan.patchDownload+317-23
On 2022/01/28 17:45, torikoshia wrote:
There is the case where the request to log a query plan is skipped
even while the target backend is running a query. If this happens,
users can just retry pg_log_query_plan(). These things should be
documented?Agreed.
Added following:+ Note that there is the case where the request to log a query
+ plan is skipped even while the target backend is running a
+ query due to lock conflict avoidance.
+ If this happens, users can just retry pg_log_query_plan().
This may cause users to misunderstand that pg_log_query_plan() fails while the target backend is holding *any* locks? Isn't it better to mention "page-level locks", instead? So how about the following?
--------------------------
Note that the request to log the query plan will be ignored if it's received during a short period while the target backend is holding a page-level lock.
--------------------------
+ ereport(LOG_SERVER_ONLY, + errmsg("backend with PID %d is holding a page lock. Try again", + MyProcPid));It seems more proper to output this message in DETAIL or HINT,
instead. So how about something like the following messages?LOG: could not log the query plan
DETAIL: query plan cannot be logged while page level lock is being held
HINT: Try pg_log_query_plan() after a few ....Agreed.
I felt the HINT message 'after a few ...' is difficult to describe, and wrote as below.| HINT: Retrying pg_log_query_plan() might succeed since the lock duration of page level locks are usually short
How do you think?
Or we don't need HINT message?
+ errmsg("could not log the query plan"),
+ errdetail("query plan cannot be logged while page level lock is being held"),
In detail message, the first word of sentences should be capitalized. How about "Cannot log the query plan while holding page-level lock.", instead?
Thanks for updating the patch! Here are some review comments.
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_log_query_plan</primary>
+ </indexterm>
This entry is placed before one for pg_log_backend_memory_contexts(). But it should be *after* that since those entries seem to be placed in alphabetical order in the table?
+ Requests to log the plan of the query currently running on the
+ backend with specified process ID along with the untruncated
+ query string.
Other descriptions about logging of query string seem not to mention something like "untruncated query string". For example, auto_explain, log_statement, etc. Why do we need to mention "along with the untruncated query string" specially for pg_log_query_plan()?
+ Note that nested statements (statements executed inside a function) are not
+ considered for logging. Only the plan of the most deeply nested query is logged.
Now the plan of even nested statement can be logged. So this description needs to be updated?
@@ -440,6 +450,7 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
MemoryContextSwitchTo(oldcontext);
+ ActiveQueryDesc = NULL;
ActiveQueryDesc seems unnecessary. Why does ActiveQueryDesc need to be reset to NULL in standard_ExecutorFinish()?
Currently even during ProcessLogQueryPlanInterrupt(), pg_log_query_plan() can be call and another ProcessLogQueryPlanInterrupt() can be executed. So repeatable re-entrances to ProcessLogQueryPlanInterrupt() could cause "stack depth limit exceeded" error. To avoid this, shouldn't we make ProcessLogQueryPlanInterrupt() do nothing and return immediately, if it's called during another ProcessLogQueryPlanInterrupt()?
pg_log_backend_memory_contexts() also might have the same issue.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
At Tue, 1 Feb 2022 01:51:11 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
On 2022/01/28 17:45, torikoshia wrote:
There is the case where the request to log a query plan is skipped
even while the target backend is running a query. If this happens,
users can just retry pg_log_query_plan(). These things should be
documented?Agreed.
Added following:
+ Note that there is the case where the request to log a
query
+ plan is skipped even while the target backend is running a
+ query due to lock conflict avoidance.
+ If this happens, users can just retry pg_log_query_plan().This may cause users to misunderstand that pg_log_query_plan() fails
while the target backend is holding *any* locks? Isn't it better to
mention "page-level locks", instead? So how about the following?--------------------------
Note that the request to log the query plan will be ignored if it's
received during a short period while the target backend is holding a
page-level lock.
--------------------------+ ereport(LOG_SERVER_ONLY, + errmsg("backend with PID %d is holding a page lock. Try again", + MyProcPid));It seems more proper to output this message in DETAIL or HINT,
instead. So how about something like the following messages?LOG: could not log the query plan
DETAIL: query plan cannot be logged while page level lock is being
held
HINT: Try pg_log_query_plan() after a few ....Agreed.
I felt the HINT message 'after a few ...' is difficult to describe,
and wrote as below.
| HINT: Retrying pg_log_query_plan() might succeed since the lock
| duration of page level locks are usually short
How do you think?Or we don't need HINT message?
+ errmsg("could not log the query plan"), + errdetail("query plan cannot be logged while page level lock is being held"),In detail message, the first word of sentences should be
capitalized. How about "Cannot log the query plan while holding
page-level lock.", instead?Thanks for updating the patch! Here are some review comments.
+ <row> + <entry role="func_table_entry"><para role="func_signature"> + <indexterm> + <primary>pg_log_query_plan</primary> + </indexterm>This entry is placed before one for
pg_log_backend_memory_contexts(). But it should be *after* that since
those entries seem to be placed in alphabetical order in the table?+ Requests to log the plan of the query currently running on the + backend with specified process ID along with the untruncated + query string.Other descriptions about logging of query string seem not to mention
something like "untruncated query string". For example, auto_explain,
log_statement, etc. Why do we need to mention "along with the
untruncated query string" specially for pg_log_query_plan()?+ Note that nested statements (statements executed inside a function) are not + considered for logging. Only the plan of the most deeply nested query is logged.Now the plan of even nested statement can be logged. So this
description needs to be updated?@@ -440,6 +450,7 @@ standard_ExecutorFinish(QueryDesc *queryDesc) MemoryContextSwitchTo(oldcontext); + ActiveQueryDesc = NULL;ActiveQueryDesc seems unnecessary. Why does ActiveQueryDesc need to be
reset to NULL in standard_ExecutorFinish()?Currently even during ProcessLogQueryPlanInterrupt(),
pg_log_query_plan() can be call and another
ProcessLogQueryPlanInterrupt() can be executed. So repeatable
re-entrances to ProcessLogQueryPlanInterrupt() could cause "stack
depth limit exceeded" error. To avoid this, shouldn't we make
ProcessLogQueryPlanInterrupt() do nothing and return immediately, if
it's called during another ProcessLogQueryPlanInterrupt()?pg_log_backend_memory_contexts() also might have the same issue.
Good catch.
By the way, I'm anxious about the following part and I'd like to
remove it.
+ * Ensure no page lock is held on this process.
It seems to me what is wrong is ginInsertCleanup(), not this feature.
As I read the comment for the assertion, I don't believe we want to
allow CFI while holding a page lock. And AFAICS the function is the
only point where doing that. (It is the alone user of LockPage()...)
This is the assertion.
lock.c: 902
/*
* We don't acquire any other heavyweight lock while holding the page lock
* except for relation extension.
*/
Assert(!IsPageLockHeld ||
(locktag->locktag_type == LOCKTAG_RELATION_EXTEND));
It is added by a recent commit 72e78d831ab5550c39f2dcc7cc5d44c406ec3dc2.
* Similar to relation extension, page locks are also held for a short
* duration, so imposing such a restriction won't hurt.
I don't believe a path involving vacuum_delay_point() calls is
short-duration'ed.
/messages/by-id/CAH2-WznzCPUKnOV+re30_rHLCkqQWm2JTSVjTCAei9LySTc2pw@mail.gmail.com
One thing that really bothers me about commit e2c79e14 is that
LockPage() is called, not LockBuffer(). GIN had no LockPage() calls
before that commit, and is now the only code in the entire system that
calls LockPage()/ConditionalLockPage() (the hash am no longer uses
page heavyweight locks following recent work there).
I agree to the discussion. Can't we use other mechanism here to get
rid of the Lockpage()?
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On 2022/02/01 17:27, Kyotaro Horiguchi wrote:
* Similar to relation extension, page locks are also held for a short
* duration, so imposing such a restriction won't hurt.I don't believe a path involving vacuum_delay_point() calls is
short-duration'ed.
Yes.
One thing that really bothers me about commit e2c79e14 is that
LockPage() is called, not LockBuffer(). GIN had no LockPage() calls
before that commit, and is now the only code in the entire system that
calls LockPage()/ConditionalLockPage() (the hash am no longer uses
page heavyweight locks following recent work there).I agree to the discussion. Can't we use other mechanism here to get
rid of the Lockpage()?
I have no good idea for that yet, but I agree it's better to get rid of page level lock.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
At Tue, 1 Feb 2022 23:11:03 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
One thing that really bothers me about commit e2c79e14 is that
LockPage() is called, not LockBuffer(). GIN had no LockPage() calls
before that commit, and is now the only code in the entire system that
calls LockPage()/ConditionalLockPage() (the hash am no longer uses
page heavyweight locks following recent work there).I agree to the discussion. Can't we use other mechanism here to get
rid of the Lockpage()?I have no good idea for that yet, but I agree it's better to get rid
of page level lock.
It's my turn?
The page lock is used to hold-off simultaneous cleanups on the same
index. ShareUpdateExclusive lock on the index relation works that
way. In that path it seems like we are always holding a RowExclusive
lock, so it seems to me we can use ShareUpdateExclusive for our
purpose.
There might be a false blocking case when another backend is holding a
conflicting lock on the index. They are, Share, ShareRowExclusive,
Exclusive and AccessExclusive. The last three cases don't seem worth
discussion. I'm not sure about Share and Share Row cases. AFAICS
Share lock is taken on an index in ATExecReplicaIdentity, and there no
use of ShareRowExclusive lock on an index. It's no use discussing about
explicit locking.
So aren't we able to use ShareUpdateExclusive lock for that?
In the attached patch, ginInsertCleanup has an extra check for such
stronger locks not being held. At least "make check" doesn't cause
the extra assertion to fire.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
PoC_get_rid_of_PageLock_from_ginInsertCleanup.txttext/plain; charset=us-asciiDownload+14-5
At Wed, 02 Feb 2022 16:49:57 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
So aren't we able to use ShareUpdateExclusive lock for that?
In the attached patch, ginInsertCleanup has an extra check for such
stronger locks not being held. At least "make check" doesn't cause
the extra assertion to fire.
Actually, the discussion is a bit dubious. What we need really to
check is wheter such locks are not held on an index *elsewhere*.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
2022-02-01 01:51, Fujii Masao wrote:
Thanks for reviewing and suggestions!
+ Note that there is the case where the request to log a
query
+ plan is skipped even while the target backend is running a
+ query due to lock conflict avoidance.
+ If this happens, users can just retry pg_log_query_plan().This may cause users to misunderstand that pg_log_query_plan() fails
while the target backend is holding *any* locks? Isn't it better to
mention "page-level locks", instead? So how about the following?--------------------------
Note that the request to log the query plan will be ignored if it's
received during a short period while the target backend is holding a
page-level lock.
--------------------------
Agreed.
+ ereport(LOG_SERVER_ONLY, + errmsg("backend with PID %d is holding a page lock. Try again", + MyProcPid));It seems more proper to output this message in DETAIL or HINT,
instead. So how about something like the following messages?LOG: could not log the query plan
DETAIL: query plan cannot be logged while page level lock is being
held
HINT: Try pg_log_query_plan() after a few ....Agreed.
I felt the HINT message 'after a few ...' is difficult to describe,
and wrote as below.| HINT: Retrying pg_log_query_plan() might succeed since the lock
duration of page level locks are usually shortHow do you think?
Or we don't need HINT message?
Removed the HINT message.
+ errmsg("could not log the query plan"), + errdetail("query plan cannot be logged while page level lock is being held"),In detail message, the first word of sentences should be capitalized.
How about "Cannot log the query plan while holding page-level lock.",
instead?
Agreed.
Thanks for updating the patch! Here are some review comments.
+ <row> + <entry role="func_table_entry"><para role="func_signature"> + <indexterm> + <primary>pg_log_query_plan</primary> + </indexterm>This entry is placed before one for pg_log_backend_memory_contexts().
But it should be *after* that since those entries seem to be placed in
alphabetical order in the table?
Modified it.
+ Requests to log the plan of the query currently running on the + backend with specified process ID along with the untruncated + query string.Other descriptions about logging of query string seem not to mention
something like "untruncated query string". For example, auto_explain,
log_statement, etc. Why do we need to mention "along with the
untruncated query string" specially for pg_log_query_plan()?
Modified it as below:
Requests to log the plan of the query currently running on
the
- backend with specified process ID along with the
untruncated
- query string.
- They will be logged at <literal>LOG</literal> message level
and
+ backend with specified process ID.
+ It will be logged at <literal>LOG</literal> message level
and
+ Note that nested statements (statements executed inside a function) are not + considered for logging. Only the plan of the most deeply nested query is logged.Now the plan of even nested statement can be logged. So this
description needs to be updated?
Modified it as below:
- Note that nested statements (statements executed inside a
function) are not
- considered for logging. Only the plan of the most deeply nested
query is logged.
+ Note that when the statements are executed inside a function,
only the
+ plan of the most deeply nested query is logged.
@@ -440,6 +450,7 @@ standard_ExecutorFinish(QueryDesc *queryDesc) MemoryContextSwitchTo(oldcontext); + ActiveQueryDesc = NULL;ActiveQueryDesc seems unnecessary. Why does ActiveQueryDesc need to be
reset to NULL in standard_ExecutorFinish()?
ActiveQueryDesc should not be reset in standard_ExecutorFinish().
Removed it.
Currently even during ProcessLogQueryPlanInterrupt(),
pg_log_query_plan() can be call and another
ProcessLogQueryPlanInterrupt() can be executed. So repeatable
re-entrances to ProcessLogQueryPlanInterrupt() could cause "stack
depth limit exceeded" error. To avoid this, shouldn't we make
ProcessLogQueryPlanInterrupt() do nothing and return immediately, if
it's called during another ProcessLogQueryPlanInterrupt()?pg_log_backend_memory_contexts() also might have the same issue.
As you pointed out offlist, the issue could occur even when both
pg_log_backend_memory_contexts and pg_log_query_plan are called and it
may be better to make another patch.
You also pointed out offlist that it would be necessary to call
LockErrorCleanup() if ProcessLogQueryPlanInterrupt() can incur ERROR.
AFAICS it can call ereport(ERROR), i.e., palloc0() in NewExplainState(),
so I added PG_TRY/CATCH and make it call LockErrorCleanup() when ERROR
occurs.
On 2022-02-01 17:27, Kyotaro Horiguchi wrote:
Thanks for reviewing Horiguchi-san!
By the way, I'm anxious about the following part and I'd like to
remove it.
I also think it would be nice if it's possible.
+ * Ensure no page lock is held on this process.
It seems to me what is wrong is ginInsertCleanup(), not this feature.
Actually, the discussion is a bit dubious. What we need really to
check is wheter such locks are not held on an index *elsewhere*.
Since I'm not sure how long it will take to discuss this point, the
attached patch is based on the current HEAD at this time.
I also think it may be better to discuss it on another thread.
--
Regards,
--
Atsushi Torikoshi
NTT DATA CORPORATION
Attachments:
v19-0001-log-running-query-plan.patchtext/x-diff; name=v19-0001-log-running-query-plan.patchDownload+322-23
On Wed, Feb 2, 2022 at 7:59 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
2022-02-01 01:51, Fujii Masao wrote:
<snip>
+ Note that nested statements (statements executed inside a function) are not + considered for logging. Only the plan of the most deeply nested query is logged.Now the plan of even nested statement can be logged. So this
description needs to be updated?Modified it as below:
- Note that nested statements (statements executed inside a function) are not - considered for logging. Only the plan of the most deeply nested query is logged. + Note that when the statements are executed inside a function, only the + plan of the most deeply nested query is logged.
Minor nit, but I think the "the" is superfluous.. ie.
"Note that when statements are executed inside a function,
only the plan of the most deeply nested query is logged"
<snip>
On 2022-02-01 17:27, Kyotaro Horiguchi wrote:
Thanks for reviewing Horiguchi-san!
By the way, I'm anxious about the following part and I'd like to
remove it.I also think it would be nice if it's possible.
+ * Ensure no page lock is held on this process.
It seems to me what is wrong is ginInsertCleanup(), not this feature.
Actually, the discussion is a bit dubious. What we need really to
check is wheter such locks are not held on an index *elsewhere*.
Since I'm not sure how long it will take to discuss this point, the
attached patch is based on the current HEAD at this time.
I also think it may be better to discuss it on another thread.
While I agree on the above points, IMHO I don't believe it should be a
show-stopper for adding this functionality to v15, but we have a few
more commitments before we get to that point.
Robert Treat
https://xzilla.net
On 2022/02/02 21:59, torikoshia wrote:
This may cause users to misunderstand that pg_log_query_plan() fails
while the target backend is holding *any* locks? Isn't it better to
mention "page-level locks", instead? So how about the following?--------------------------
Note that the request to log the query plan will be ignored if it's
received during a short period while the target backend is holding a
page-level lock.
--------------------------Agreed.
On second thought, this note is confusing rather than helpful? Because the users don't know when and what operation needs page-level lock. So now I'm thinking it's better to remove this note.
As you pointed out offlist, the issue could occur even when both pg_log_backend_memory_contexts and pg_log_query_plan are called and it may be better to make another patch.
OK.
You also pointed out offlist that it would be necessary to call LockErrorCleanup() if ProcessLogQueryPlanInterrupt() can incur ERROR.
AFAICS it can call ereport(ERROR), i.e., palloc0() in NewExplainState(), so I added PG_TRY/CATCH and make it call LockErrorCleanup() when ERROR occurs.
As we discussed off-list, this treatment might not be necessary. Because when ERROR or FATAL error happens, AbortTransaction() is called and LockErrorCleanup() is also called inside there.
Since I'm not sure how long it will take to discuss this point, the attached patch is based on the current HEAD at this time.
Thanks for updating the patch!
@@ -5048,6 +5055,12 @@ AbortSubTransaction(void)
*/
PG_SETMASK(&UnBlockSig);
+ /*
+ * When ActiveQueryDesc is referenced after abort, some of its elements
+ * are freed. To avoid accessing them, reset ActiveQueryDesc here.
+ */
+ ActiveQueryDesc = NULL;
AbortSubTransaction() should reset ActiveQueryDesc to save_ActiveQueryDesc that ExecutorRun() set, instead of NULL? Otherwise ActiveQueryDesc of top-level statement will be unavailable after subtransaction is aborted in the nested statements.
For example, no plan is logged while the following "select pg_sleep(test())" is running, because the exception inside test() function aborts the subtransaction and resets ActiveQueryDesc to NULL.
create or replace function test () returns int as $$
begin
perform 1/0;
exception when others then
return 30;
end;
$$ language plpgsql;
select pg_sleep(test());
-CREATE ROLE regress_log_memory;
+CREATE ROLE regress_log;
Isn't this name too generic? How about regress_log_function, for example?
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION