PostgreSQL debug log doesn't record whole procedure(from receiving request to sending response)
Hi,
We recently encounter one issue about PostgreSQL ODBC client doesn’t receive response from PostgreSQL server in time (client set 5 seconds timeout) occasionally (1 or 2 times per 24 hours). Both PostgreSQL and its client are deployed in VM against. It took us days to debug where cause the timeout. We enable PostgreSQL server debug log via below configuration.
log_min_messages = debug5
log_min_error_statement = debug5
log_line_prefix = '[%p-%c-%l] %m <%h %i> '
log_error_verbosity = verbose
log_statement = 'all'
logging_collector = on
log_destination = 'stderr'
log_directory = '/tmp'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_duration = on
log_autovacuum_min_duration = 2
Client log indicate it received response 10 seconds after sending the request:
Nov 13 22:25:58 CFPU-0 rnwman[2701607]: RU:OMU-0|PNAME:QNRNWMAN|LPID:2701607|PID:0x50A 0x11 0x0|TIME:2021-11-14 00:25:58:050678|TEXT:WQTST before call db:1|DATA:
Nov 13 22:25:58 CFPU-0 raktor[2701622]: RU:OMU-0|PNAME:QNRAKTOR|LPID:2701622|PID:0x4EE 0x31 0x0|TIME:2021-11-14 00:26:03:054718|TEXT:RAV:set_wcel_state_r-db response timeout|DATA:
Nov 13 22:26:08 CFPU-0 rnwman[2701607]: RU:OMU-0|PNAME:QNRNWMAN|LPID:2701607|PID:0x50A 0x11 0x0|TIME:2021-11-14 00:26:08:012336|TEXT:RAV Exit: set_wcel_state__r:= |DATA:
Nov 13 22:26:08 CFPU-0 rnwman[2701607]: RU:OMU-0|PNAME:QNRNWMAN|LPID:2701607|PID:0x50A 0x11 0x0|TIME:2021-11-14 00:26:08:012501|TEXT:WQTST after call db:1|DATA:
While PostgreSQL log recorded for the procedure indicate it executed quickly.
[2701833-618d1b70.293a09-173237] 2021-11-13 22:25:58.050 GMT <169.254.0.21 idle> DEBUG: 00000: StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0
[2701833-618d1b70.293a09-173238] 2021-11-13 22:25:58.050 GMT <169.254.0.21 idle> LOCATION: ShowTransactionStateRec, xact.c:5333
[2701833-618d1b70.293a09-173239] 2021-11-13 22:25:58.050 GMT <169.254.0.21 idle> STATEMENT: DEALLOCATE "_PLAN0x7f2ec021ab60"
[2701833-618d1b70.293a09-173240] 2021-11-13 22:25:58.050 GMT <169.254.0.21 idle> LOG: 00000: statement: DEALLOCATE "_PLAN0x7f2ec021ab60"
[2701833-618d1b70.293a09-173241] 2021-11-13 22:25:58.050 GMT <169.254.0.21 idle> LOCATION: exec_simple_query, postgres.c:1042
[2701833-618d1b70.293a09-173242] 2021-11-13 22:25:58.050 GMT <169.254.0.21 DEALLOCATE> DEBUG: 00000: CommitTransaction(1) name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 0/1/0
[2701833-618d1b70.293a09-173243] 2021-11-13 22:25:58.050 GMT <169.254.0.21 DEALLOCATE> LOCATION: ShowTransactionStateRec, xact.c:5333
[2701833-618d1b70.293a09-173244] 2021-11-13 22:25:58.050 GMT <169.254.0.21 DEALLOCATE> STATEMENT: DEALLOCATE "_PLAN0x7f2ec021ab60"
[2701833-618d1b70.293a09-173245] 2021-11-13 22:25:58.050 GMT <169.254.0.21 DEALLOCATE> LOG: 00000: duration: 0.098 ms
[2701833-618d1b70.293a09-173246] 2021-11-13 22:25:58.050 GMT <169.254.0.21 DEALLOCATE> LOCATION: exec_simple_query, postgres.c:1287
[2701833-618d1b70.293a09-173247] 2021-11-13 22:25:58.050 GMT <169.254.0.21 PARSE> DEBUG: 00000: parse _PLAN0x7f2ec021ab60: UPDATE act_wcel SET wcel_state=$1 WHERE wbts_id=74 and lcr_id=7402
[2701833-618d1b70.293a09-173248] 2021-11-13 22:25:58.050 GMT <169.254.0.21 PARSE> LOCATION: exec_parse_message, postgres.c:1342
[2701833-618d1b70.293a09-173249] 2021-11-13 22:25:58.050 GMT <169.254.0.21 PARSE> STATEMENT: UPDATE act_wcel SET wcel_state=$1 WHERE wbts_id=74 and lcr_id=7402
[2701833-618d1b70.293a09-173250] 2021-11-13 22:25:58.051 GMT <169.254.0.21 PARSE> DEBUG: 00000: StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0
[2701833-618d1b70.293a09-173251] 2021-11-13 22:25:58.051 GMT <169.254.0.21 PARSE> LOCATION: ShowTransactionStateRec, xact.c:5333
[2701833-618d1b70.293a09-173252] 2021-11-13 22:25:58.051 GMT <169.254.0.21 PARSE> STATEMENT: UPDATE act_wcel SET wcel_state=$1 WHERE wbts_id=74 and lcr_id=7402
[2701833-618d1b70.293a09-173253] 2021-11-13 22:25:58.051 GMT <169.254.0.21 PARSE> LOG: 00000: duration: 0.089 ms
[2701833-618d1b70.293a09-173254] 2021-11-13 22:25:58.051 GMT <169.254.0.21 PARSE> LOCATION: exec_parse_message, postgres.c:1550
[2701833-618d1b70.293a09-173255] 2021-11-13 22:25:58.051 GMT <169.254.0.21 PARSE> DEBUG: 00000: CommitTransaction(1) name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 0/1/0
[2701833-618d1b70.293a09-173256] 2021-11-13 22:25:58.051 GMT <169.254.0.21 PARSE> LOCATION: ShowTransactionStateRec, xact.c:5333
[2701833-618d1b70.293a09-173257] 2021-11-13 22:25:58.051 GMT <169.254.0.21 idle> DEBUG: 00000: StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0
[2701833-618d1b70.293a09-173258] 2021-11-13 22:25:58.051 GMT <169.254.0.21 idle> LOCATION: ShowTransactionStateRec, xact.c:5333
[2701833-618d1b70.293a09-173259] 2021-11-13 22:25:58.051 GMT <169.254.0.21 idle> DEBUG: 00000: CommitTransaction(1) name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 0/1/0
[2701833-618d1b70.293a09-173260] 2021-11-13 22:25:58.051 GMT <169.254.0.21 idle> LOCATION: ShowTransactionStateRec, xact.c:5333
[2701833-618d1b70.293a09-173261] 2021-11-13 22:25:58.051 GMT <169.254.0.21 idle> DEBUG: 00000: bind <unnamed> to _PLAN0x7f2ec021ab60
[2701833-618d1b70.293a09-173262] 2021-11-13 22:25:58.051 GMT <169.254.0.21 idle> LOCATION: exec_bind_message, postgres.c:1600
[2701833-618d1b70.293a09-173263] 2021-11-13 22:25:58.051 GMT <169.254.0.21 BIND> DEBUG: 00000: StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0
[2701833-618d1b70.293a09-173264] 2021-11-13 22:25:58.051 GMT <169.254.0.21 BIND> LOCATION: ShowTransactionStateRec, xact.c:5333
[2701833-618d1b70.293a09-173265] 2021-11-13 22:25:58.051 GMT <169.254.0.21 BIND> STATEMENT: UPDATE act_wcel SET wcel_state=$1 WHERE wbts_id=74 and lcr_id=7402
[2701833-618d1b70.293a09-173266] 2021-11-13 22:25:58.051 GMT <169.254.0.21 BIND> LOG: 00000: duration: 0.107 ms
[2701833-618d1b70.293a09-173267] 2021-11-13 22:25:58.051 GMT <169.254.0.21 BIND> LOCATION: exec_bind_message, postgres.c:1917
[2701833-618d1b70.293a09-173268] 2021-11-13 22:25:58.051 GMT <169.254.0.21 UPDATE> LOG: 00000: execute _PLAN0x7f2ec021ab60: UPDATE act_wcel SET wcel_state=$1 WHERE wbts_id=74 and lcr_id=7402
[2701833-618d1b70.293a09-173269] 2021-11-13 22:25:58.051 GMT <169.254.0.21 UPDATE> DETAIL: parameters: $1 = '0'
[2701833-618d1b70.293a09-173270] 2021-11-13 22:25:58.051 GMT <169.254.0.21 UPDATE> LOCATION: exec_execute_message, postgres.c:2055
[2701833-618d1b70.293a09-173271] 2021-11-13 22:25:58.051 GMT <169.254.0.21 UPDATE> LOG: 00000: duration: 0.080 ms
[2701833-618d1b70.293a09-173272] 2021-11-13 22:25:58.051 GMT <169.254.0.21 UPDATE> LOCATION: exec_execute_message, postgres.c:2138
[2701833-618d1b70.293a09-173273] 2021-11-13 22:25:58.051 GMT <169.254.0.21 UPDATE> DEBUG: 00000: CommitTransaction(1) name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 23280/1/1
[2701833-618d1b70.293a09-173274] 2021-11-13 22:25:58.051 GMT <169.254.0.21 UPDATE> LOCATION: ShowTransactionStateRec, xact.c:5333
Finally, the tcpdump indicate the delay happened at PostgreSQL server.
[cid:image002.jpg@01D7DA05.A24D3720]
This mail is not to investigate why PostgreSQL server delay sending the response message. We suspect the backend I/O performance issue.
This mail is to ask why PostgreSQL debug log doesn’t really include the response message delay which may cause misleading why troubleshooting. It looks to me the debug log doesn’t record the whole procedure. If there are some developer options include the missing part but we didn’t enable?
Best Regards
It always takes longer than you expect,
even when you take into account
----------------------------------------------------------------------------------------------
Yan-Jack Chen (陈雁)
Tel: +8613957141340
Addr: No.567 XinCheng Rd, Binjiang District, Hangzhou, China, 310053
"Chen, Yan-Jack (NSB - CN/Hangzhou)" <yan-jack.chen@nokia-sbell.com> writes:
We recently encounter one issue about PostgreSQL ODBC client doesn¡¯t receive response from PostgreSQL server in time (client set 5 seconds timeout) occasionally (1 or 2 times per 24 hours). Both PostgreSQL and its client are deployed in VM against. It took us days to debug where cause the timeout. We enable PostgreSQL server debug log via below configuration.
We can see from this that the server spent 10 seconds in
CommitTransaction, so the question is what took so long.
I'd wonder first about end-of-transaction triggers (have you got foreign
keys on that table? maybe an event trigger?), and then second about
delays in writing/fsyncing WAL (what's the underlying storage? do you
have synchronous_commit replication turned on?).
This mail is to ask why PostgreSQL debug log doesn¡¯t really include the response message delay which may cause misleading why troubleshooting. It looks to me the debug log doesn¡¯t record the whole procedure. If there are some developer options include the missing part but we didn¡¯t enable?
[ shrug... ] Sure, we could put an elog(DEBUG) after every line of code
in the server, and then high-level debugging logs would be even more
impossibly voluminous than they are now. I'd say the existing logging
gave you plenty of clue where to look.
regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes:
We can see from this that the server spent 10 seconds in CommitTransaction, so the question is what took so long.
I'd wonder first about end-of-transaction triggers (have you got foreign keys on that table? maybe an event trigger?), and then second about delays in writing/fsyncing WAL (what's the underlying storage? do you have synchronous_commit replication turned on?).
We have the same as your second suspicion. Delays in fsyncing WAL as the backend ceph cluster storage and we see there was significant latency increased during that time. Yes. the synchronous_commit is
on.
fsync | on | Forces synchronization of updates to disk.
synchronous_commit | on | Sets the current transaction's synchronization level.
[ shrug... ] Sure, we could put an elog(DEBUG) after every line of code in the server, and then high-level debugging logs would be even more impossibly voluminous than they are now. I'd say the existing logging gave you plenty of clue where to look.
Actually, do not need to write log for every line code. The last debug log we can see is for CommitTransaction which state is INPROGRESS. But we can't see when CommitTransaction state is DONE/COMPLETE from the debug log which I think this kind of log is valuable which can help to identify where the delay comes, from server or client from the postgres server debug log only shall be enough without to monitoring and analyzing the TCP message.
[2701833-618d1b70.293a09-173273] 2021-11-13 22:25:58.051 GMT <169.254.0.21 UPDATE> DEBUG: 00000: CommitTransaction(1) name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 23280/1/1
[2701833-618d1b70.293a09-173274] 2021-11-13 22:25:58.051 GMT <169.254.0.21 UPDATE> LOCATION: ShowTransactionStateRec, xact.c:5333
Best Regards
It always takes longer than you expect,
even when you take into account
----------------------------------------------------------------------------------------------
Yan-Jack Chen (陈雁)
Tel: +8613957141340
Addr: No.567 XinCheng Rd, Binjiang District, Hangzhou, China, 310053
-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: 2021年11月15日 23:09
To: Chen, Yan-Jack (NSB - CN/Hangzhou) <yan-jack.chen@nokia-sbell.com>
Cc: pgsql-general@postgresql.org
Subject: Re: PostgreSQL debug log doesn't record whole procedure(from receiving request to sending response)
"Chen, Yan-Jack (NSB - CN/Hangzhou)" <yan-jack.chen@nokia-sbell.com> writes:
We recently encounter one issue about PostgreSQL ODBC client doesn¡¯t receive response from PostgreSQL server in time (client set 5 seconds timeout) occasionally (1 or 2 times per 24 hours). Both PostgreSQL and its client are deployed in VM against. It took us days to debug where cause the timeout. We enable PostgreSQL server debug log via below configuration.
We can see from this that the server spent 10 seconds in CommitTransaction, so the question is what took so long.
I'd wonder first about end-of-transaction triggers (have you got foreign keys on that table? maybe an event trigger?), and then second about delays in writing/fsyncing WAL (what's the underlying storage? do you have synchronous_commit replication turned on?).
This mail is to ask why PostgreSQL debug log doesn¡¯t really include the response message delay which may cause misleading why troubleshooting. It looks to me the debug log doesn¡¯t record the whole procedure. If there are some developer options include the missing part but we didn¡¯t enable?
[ shrug... ] Sure, we could put an elog(DEBUG) after every line of code in the server, and then high-level debugging logs would be even more impossibly voluminous than they are now. I'd say the existing logging gave you plenty of clue where to look.
regards, tom lane