AIX and EAGAIN on open()
Hello,
a customer running PG on AIX [1]We have PostgreSQL 11.13 on powerpc-ibm-aix7.2.5.0, compiled by /opt/IBM/xlc/13.1.0/bin/xlc, 64-bit is occasionally seeing "Resource
temporarily unavailable" (EAGAIN) returned by open() calls:
[1]: We have PostgreSQL 11.13 on powerpc-ibm-aix7.2.5.0, compiled by /opt/IBM/xlc/13.1.0/bin/xlc, 64-bit
2022-05-19 03:28:13 CEST:127.0.0.1(63265):x@x:[64029168]: ERROR: could not open file "base/16401/935915821_fsm": Resource temporarily unavailable
2022-05-19 03:28:13 CEST:127.0.0.1(63265):x@x:[64029168]: CONTEXT: SQL statement "INSERT INTO s[...]"
PL/pgSQL function s...() line 12 at SQL statement
2022-05-19 03:28:13 CEST:127.0.0.1(63265):x@x:[64029168]: STATEMENT: PREPARE ... AS insert into ...
2022-04-16 01:45:31 CEST:127.0.0.1(58946):x@x:[20906970]: ERROR: could not access status of transaction 0
2022-04-16 01:45:31 CEST:127.0.0.1(58946):x@x:[20906970]: DETAIL: Could not open file "pg_subtrans/6158": Resource temporarily unavailable.
2022-04-16 01:45:31 CEST:127.0.0.1(58946):x@x:[20906970]: STATEMENT: PREPARE ... AS update ...
2020-12-01 09:24:30 CET:127.0.0.1(59898):x@x:[6227520]: ERROR: could not access status of transaction 0
2020-12-01 09:24:30 CET:127.0.0.1(59898):x@x:[6227520]: DETAIL: Could not open file "pg_subtrans/AC9E": Resource temporarily unavailable.
2020-12-01 09:24:30 CET:127.0.0.1(59898):x@x:[6227520]: STATEMENT: PREPARE ... AS DELETE FROM ....
open() should not return EAGAIN as per POSIX [2]https://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html#tag_16_357_05,
[2]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html#tag_16_357_05
and the AIX documentation says it would only return EAGAIN if O_TRUNC
is used [3]https://www.ibm.com/docs/en/aix/7.2?topic=o-open-openat-openx-openxat-open64-open64at-open64x-open64xat-creat-creat64-subroutine, but as far as I can tell, PG does not use that flag.
IBM's reply to the issue back in December 2020 was this:
The man page / infocenter document is not intended as an exhaustive
list of all possible error codes returned and their circumstances.
"Resource temporarily unavailable" may also be returned for
O_NSHARE, O_RSHARE with O_NONBLOCK.
Afaict, PG does not use these flags either.
We also ruled out that the system is using any anti-virus or similar
tooling that would intercept IO traffic.
Does anything of that ring a bell for someone? Is that an AIX bug, a
PG bug, or something else?
Christoph
--
Senior Consultant, Tel.: +49 2166 9901 187
credativ GmbH, HRB M�nchengladbach 12080, USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 M�nchengladbach
Gesch�ftsf�hrung: Dr. Michael Meskes, Geoff Richardson, Peter Lilley
Unser Umgang mit personenbezogenen Daten unterliegt folgenden
Bestimmungen: https://www.credativ.de/datenschutz
On Mon, Jun 20, 2022 at 9:53 PM Christoph Berg
<christoph.berg@credativ.de> wrote:
IBM's reply to the issue back in December 2020 was this:
The man page / infocenter document is not intended as an exhaustive
list of all possible error codes returned and their circumstances.
"Resource temporarily unavailable" may also be returned for
O_NSHARE, O_RSHARE with O_NONBLOCK.Afaict, PG does not use these flags either.
We also ruled out that the system is using any anti-virus or similar
tooling that would intercept IO traffic.Does anything of that ring a bell for someone? Is that an AIX bug, a
PG bug, or something else?
No clue here. Anything unusual about the file system (NFS etc)? Can
you truss/strace the system calls, to sanity check the flags arriving
into open(), and see if there's any unexpected other activity around
open() calls that might be coming from something you're linked
against?
Re: Thomas Munro
Does anything of that ring a bell for someone? Is that an AIX bug, a
PG bug, or something else?No clue here. Anything unusual about the file system (NFS etc)? Can
you truss/strace the system calls, to sanity check the flags arriving
into open(), and see if there's any unexpected other activity around
open() calls that might be coming from something you're linked
against?
Hi,
it's local storage, 16Gb SAN, Unity 500 storage, all data is on SSD
disks, and file system is JFS2 (mount options are rw,log=INLINE).
Good point about the flags, but we don't have access to the servers,
so not sure if it will be possible to retrieve strace information.
I'll try asking.
Thanks,
Christoph
--
Senior Consultant, Tel.: +49 2166 9901 187
credativ GmbH, HRB M�nchengladbach 12080, USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 M�nchengladbach
Gesch�ftsf�hrung: Dr. Michael Meskes, Geoff Richardson, Peter Lilley
Unser Umgang mit personenbezogenen Daten unterliegt folgenden
Bestimmungen: https://www.credativ.de/datenschutz