pgsql: Add TAP tests for timeouts

Started by Alexander Korotkovover 2 years ago11 messagescomitters
Jump to latest
#1Alexander Korotkov
aekorotkov@gmail.com

Add TAP tests for timeouts

This commit adds new tests to verify that transaction_timeout,
idle_session_timeout, and idle_in_transaction_session_timeout work as expected.
We introduce new injection points in before throwing a timeout FATAL error
and check these injection points are reached.

Discussion: /messages/by-id/CAAhFRxiQsRs2Eq5kCo9nXE3HTugsAAJdSQSmxncivebAxdmBjQ@mail.gmail.com
Author: Andrey Borodin
Reviewed-by: Alexander Korotkov

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/eeefd4280f6e5167d70efabb89586b7d38922d95

Modified Files
--------------
src/backend/tcop/postgres.c | 10 +++
src/test/modules/test_misc/Makefile | 4 +
src/test/modules/test_misc/meson.build | 4 +
src/test/modules/test_misc/t/005_timeouts.pl | 129 +++++++++++++++++++++++++++
4 files changed, 147 insertions(+)

#2Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Alexander Korotkov (#1)
Re: pgsql: Add TAP tests for timeouts

Hello.

At Thu, 14 Mar 2024 11:25:42 +0000, Alexander Korotkov <akorotkov@postgresql.org> wrote in

Add TAP tests for timeouts

This commit adds new tests to verify that transaction_timeout,
idle_session_timeout, and idle_in_transaction_session_timeout work as expected.
We introduce new injection points in before throwing a timeout FATAL error
and check these injection points are reached.

Discussion: /messages/by-id/CAAhFRxiQsRs2Eq5kCo9nXE3HTugsAAJdSQSmxncivebAxdmBjQ@mail.gmail.com
Author: Andrey Borodin
Reviewed-by: Alexander Korotkov

In 005_timeouts.pl, I found the following comment.

# If we send \q with $psql_session->quit it can get to pump already closed.
# So \q is in initial script, here we only finish IPC::Run.
$psql_session->{run}->finish;

I'm not sure if "it can get to pump already closed." makes sense. I
guess that it means "the command can get to be pumped (or "can be
sent") to the session already closed" or something similar?

# 2. Test of the sidle in transaction timeout

s/sidle/idle/ ?

# Wait until the backend is in the timeout injection point.

I'm not sure, but it seems that "is in" meant "passes" or something like?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#3Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro Horiguchi (#2)
Re: pgsql: Add TAP tests for timeouts

On Fri, Mar 15, 2024 at 10:42:35AM +0900, Kyotaro Horiguchi wrote:

In 005_timeouts.pl, I found the following comment.

Note also that the test is not stable, one of my machines with
injection points enabled has complained twice in its last three runs:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hachi&amp;dt=2024-03-14%2015%3A05%3A04
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hachi&amp;dt=2024-03-15%2003%3A21%3A15
--
Michael

#4Andrey Borodin
amborodin@acm.org
In reply to: Michael Paquier (#3)
Re: pgsql: Add TAP tests for timeouts

Kyotaro, thanks for your corrections. I agree that wordings should be improved. But let’s deal with failures first.

On 15 Mar 2024, at 10:28, Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Mar 15, 2024 at 10:42:35AM +0900, Kyotaro Horiguchi wrote:

In 005_timeouts.pl, I found the following comment.

Note also that the test is not stable, one of my machines with
injection points enabled has complained twice in its last three runs:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hachi&amp;dt=2024-03-14%2015%3A05%3A04
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hachi&amp;dt=2024-03-15%2003%3A21%3A15

The failure is

t/005_timeouts.pl (Wstat: 65280 Tests: 0 Failed: 0)
Non-zero exit status: 255
Parse errors: No plan found in TAP output
Files=5, Tests=68, 185 wallclock secs ( 0.03 usr 0.00 sys + 0.86 cusr 0.80 csys = 1.69 CPU)
Result: FAIL

The failure seems to be Perl-related.
As far as I can see I’ve done everything akin to 041_checkpoint_at_promote.pl. On batta this test pass, but hachi seems to be unhappy with this test.
And hachi sometimes pass this test too [0]https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=hachi&amp;dt=2024-03-14%2022%3A02%3A41&amp;stg=module-test_misc-check.

I’ll look more on this. Do I understand right that we have only 2 buildfarm members with injection points?

Best regards, Andrey Borodin.

[0]: https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=hachi&amp;dt=2024-03-14%2022%3A02%3A41&amp;stg=module-test_misc-check

#5Andrey Borodin
amborodin@acm.org
In reply to: Andrey Borodin (#4)
Re: pgsql: Add TAP tests for timeouts

On 15 Mar 2024, at 11:20, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

The failure seems to be Perl-related.

It’s not Perl-related. This is how wait_for_log() times out.

timed out waiting for match: terminating connection due to transaction timeout at t/005_timeouts.pl line 58.

But it did not even wait for a 100ms…

2024-03-15 03:32:26.492 UTC [1405044:4] 005_timeouts.pl LOG: statement: SELECT injection_points_wakeup('transaction-timeout’);
///// here we start to wait.
2024-03-15 03:32:26.492 UTC [1405044:5] 005_timeouts.pl LOG: disconnection: session time: 0:00:00.002 user=admin database=postgres host=[local]
2024-03-15 03:35:26.623 UTC [1405009:4] LOG: received immediate shutdown request

Best regards, Andrey Borodin.

#6Michael Paquier
michael@paquier.xyz
In reply to: Andrey Borodin (#4)
Re: pgsql: Add TAP tests for timeouts

On Fri, Mar 15, 2024 at 11:20:49AM +0500, Andrey M. Borodin wrote:

And hachi sometimes pass this test too [0].

I’ll look more on this. Do I understand right that we have only 2 buildfarm members with injection points?

I have added --enable-injection-points to three of them: hachi,
gokiburi and batta. The switch is also enabled in the CI by defaultm
where you may be able to capture a bit more information than the
buildfarm.
--
Michael

#7Alexander Korotkov
aekorotkov@gmail.com
In reply to: Michael Paquier (#6)
Re: pgsql: Add TAP tests for timeouts

On Fri, Mar 15, 2024 at 9:44 AM Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Mar 15, 2024 at 11:20:49AM +0500, Andrey M. Borodin wrote:

And hachi sometimes pass this test too [0].

I’ll look more on this. Do I understand right that we have only 2

buildfarm members with injection points?

I have added --enable-injection-points to three of them: hachi,
gokiburi and batta. The switch is also enabled in the CI by defaultm
where you may be able to capture a bit more information than the
buildfarm.

I managed to reproduce the issue locally by setting the following extra
config via TEMP_CONFIG:

log_line_prefix = '%m [%p:%l] %q%a '
log_connections = 'true'
log_disconnections = 'true'
log_checkpoints = 'true'
log_statement = 'all'
wal_compression = 'on'
debug_parallel_query = regress
wal_compression = lz4
default_toast_compression = lz4

The backtrace is as follows. We loop in the interrupt checking, given that
the injection point handler checks for interrupts internally. I propose to
unset the timeout flag before the injection point (see the attached patch).

* thread #1, name = 'postgres', stop reason = signal SIGSTOP
* frame #0: 0x00007f3756f289e7 libc.so.6`epoll_wait(epfd=9,
events=0x0000556a302cabc8, maxevents=1, timeout=-1) at epoll_wait.c:30:10
frame #1: 0x0000556a2fc41aab postgres`WaitEventSetWait at latch.c:1570:7
frame #2: 0x0000556a2fc41e95 postgres`WaitLatch(latch=<unavailable>,
wakeEvents=<unavailable>, timeout=<unavailable>, wait_event_info=117440513)
at latch.c:538:6
frame #3: 0x0000556a2fc4f611
postgres`ConditionVariableTimedSleep(cv=0x00007f3757b0e224, timeout=-1,
wait_event_info=117440513) at condition_variable.c:163:10
frame #4: 0x00007f3757b10628 injection_points.so`injection_wait at
injection_points.c:153:3
frame #5: 0x0000556a2fdd4f54 postgres`InjectionPointRun at
injection_point.c:313:2
frame #6: 0x0000556a2fc6a8b3 postgres`ProcessInterrupts.part.0 at
postgres.c:3430:4
frame #7: 0x0000556a2fc4f70b
postgres`ConditionVariableTimedSleep(cv=0x00007f3757b0e224, timeout=-1,
wait_event_info=117440513) at condition_variable.c:196:3
frame #8: 0x00007f3757b10628 injection_points.so`injection_wait at
injection_points.c:153:3
frame #9: 0x0000556a2fdd4f54 postgres`InjectionPointRun at
injection_point.c:313:2
frame #10: 0x0000556a2fc6a8b3 postgres`ProcessInterrupts.part.0 at
postgres.c:3430:4
frame #11: 0x0000556a2fc4f70b
postgres`ConditionVariableTimedSleep(cv=0x00007f3757b0e224, timeout=-1,
wait_event_info=117440513) at condition_variable.c:196:3
frame #12: 0x00007f3757b10628 injection_points.so`injection_wait at
injection_points.c:153:3
frame #13: 0x0000556a2fdd4f54 postgres`InjectionPointRun at
injection_point.c:313:2
frame #14: 0x0000556a2fc6a8b3 postgres`ProcessInterrupts.part.0 at
postgres.c:3430:4
frame #15: 0x0000556a2fabe63d postgres`ExecGather at nodeGather.c:315:3
frame #16: 0x0000556a2fabe574 postgres`ExecGather at nodeGather.c:269:10
frame #17: 0x0000556a2fabe549
postgres`ExecGather(pstate=0x0000556a303d8138) at nodeGather.c:222:9
frame #18: 0x0000556a2faa21da postgres`standard_ExecutorRun at
executor.h:274:9
frame #19: 0x0000556a2fc6f3cf postgres`PortalRunSelect at pquery.c:924:4
frame #20: 0x0000556a2fc70b7e postgres`PortalRun at pquery.c:768:18
frame #21: 0x0000556a2fc6c996 postgres`exec_simple_query at
postgres.c:1274:10
frame #22: 0x0000556a2fc6e378 postgres`PostgresMain at postgres.c:4682:7
frame #23: 0x0000556a2fbcefa0 postgres`ServerLoop.isra.0 at
postmaster.c:4452:2
frame #24: 0x0000556a2fbcff8b postgres`PostmasterMain at
postmaster.c:1478:11
frame #25: 0x0000556a2f8cb5a7 postgres`main(argc=4,
argv=0x0000556a302ca340) at main.c:197:3
frame #26: 0x00007f3756e28150
libc.so.6`__libc_start_call_main(main=(postgres`main at main.c:59:1),
argc=4, argv=0x00007ffffc17e678) at libc_start_call_main.h:58:16
frame #27: 0x00007f3756e28209
libc.so.6`__libc_start_main_impl(main=(postgres`main at main.c:59:1),
argc=4, argv=0x00007ffffc17e678, init=<unavailable>, fini=<unavailable>,
rtld_fini=<unavailable>, stack_end=0x00007ffffc17e668) at libc-start.c:360:3
frame #28: 0x0000556a2f8cbb75 postgres`_start + 37

------
Regards,
Alexander Korotkov

Attachments:

transaction_timeout_patch_fix.patchapplication/octet-stream; name=transaction_timeout_patch_fix.patchDownload+5-7
#8Andrey Borodin
amborodin@acm.org
In reply to: Alexander Korotkov (#7)
Re: pgsql: Add TAP tests for timeouts

On 15 Mar 2024, at 15:44, Alexander Korotkov <aekorotkov@gmail.com> wrote:

We loop in the interrupt checking, given that the injection point handler checks for interrupts internally. I propose to unset the timeout flag before the injection point (see the attached patch).

Oh, cool.
As far as I understand, this is only necessary for the test with injection point.
So, maybe unset it only when injection points are enabled? Something like this is already used in GIN.

#ifdef USE_INJECTION_POINTS
....
#endif

Best regards, Andrey Borodin.

#9Alexander Korotkov
aekorotkov@gmail.com
In reply to: Andrey Borodin (#8)
Re: pgsql: Add TAP tests for timeouts

On Fri, Mar 15, 2024 at 1:27 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 15 Mar 2024, at 15:44, Alexander Korotkov <aekorotkov@gmail.com> wrote:

We loop in the interrupt checking, given that the injection point handler checks for interrupts internally. I propose to unset the timeout flag before the injection point (see the attached patch).

Oh, cool.
As far as I understand, this is only necessary for the test with injection point.
So, maybe unset it only when injection points are enabled? Something like this is already used in GIN.

#ifdef USE_INJECTION_POINTS
....
#endif

Maybe, but do you see any negative side effects of the unconditionally
unset of flags? If not, I would prefer to keep the code simple.

------
Regards,
Alexander Korotkov

#10Andrey Borodin
amborodin@acm.org
In reply to: Alexander Korotkov (#9)
Re: pgsql: Add TAP tests for timeouts

On 15 Mar 2024, at 16:30, Alexander Korotkov <aekorotkov@gmail.com> wrote:

Maybe, but do you see any negative side effects of the unconditionally
unset of flags?

Nope, just expressed possible option.

If not, I would prefer to keep the code simple.

IMO that's fine. Let's incorporate wording improvement from Kyotaro to the same patch? PFA draft for this.

Maybe it worth to wait a little to see if some other failures will pop up?

Best regards, Andrey Borodin.

Attachments:

v2-0001-Fix-race-condition-in-transaction-timeout-TAP-tes.patchapplication/octet-stream; name=v2-0001-Fix-race-condition-in-transaction-timeout-TAP-tes.patch; x-unix-mode=0644Download+11-14
#11Alexander Korotkov
aekorotkov@gmail.com
In reply to: Andrey Borodin (#10)
Re: pgsql: Add TAP tests for timeouts

On Fri, Mar 15, 2024 at 2:25 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 15 Mar 2024, at 16:30, Alexander Korotkov <aekorotkov@gmail.com> wrote:

Maybe, but do you see any negative side effects of the unconditionally
unset of flags?

Nope, just expressed possible option.

If not, I would prefer to keep the code simple.

IMO that's fine. Let's incorporate wording improvement from Kyotaro to the same patch? PFA draft for this.

Thank you, I've pushed this, but split into two commits for bugfix and
for improved wordings.

Maybe it worth to wait a little to see if some other failures will pop up?

I prefer to push now because we're quite confident this fix will
improve the situation. It should increase the chances of catching
other bugs if any, because buildfarm which was failing on this bug
will go ahead and have a chance to spot more.

------
Regards,
Alexander Korotkov