PANIC: could not write to log file {} at offset {}, length {}: Invalid argument
Hi all,
We are running PostgreSQL v9.5.19 over Windows Server 2012 R2, 16GB RAM.
Lately, postgres started to crash (happened already 3 times ~once a month)
and before its crashes I found this message in Event Log:
PANIC: could not write to log file {} at offset {}, length {}: Invalid
argument
(so I assumed it is related).
attached is our configuration.
Any ideas about what is the problem? or anything else I need to check?
Thanks is advance,
Shani Israeli - Software Developer
+972 54 6689920
sisraeli@illusivenetworks.com
www.illusivenetworks.com
Attachments:
On 4 November 2020 11:24:03 CET, Shani Israeli <sisraeli@illusivenetworks.com> wrote:
Hi all,
We are running PostgreSQL v9.5.19 over Windows Server 2012 R2, 16GB
RAM.
Lately, postgres started to crash (happened already 3 times ~once a
month)
and before its crashes I found this message in Event Log:PANIC: could not write to log file {} at offset {}, length {}: Invalid
argument(so I assumed it is related).
attached is our configuration.
Any ideas about what is the problem? or anything else I need to check?
wild guess: Antivirus Software?
Thanks is advance,
Shani Israeli - Software Developer+972 54 6689920
sisraeli@illusivenetworks.com
www.illusivenetworks.com
--
2ndQuadrant - The PostgreSQL Support Company
I Could not read your config file but
What is the size of the Postgres log file?
Do you have a log rotation policy on it?
Perhaps your Postgres log level is to high or your connections are
generating a lot of errors that need investigating.
Dave
On Wed, Nov 4, 2020 at 5:24 AM Shani Israeli <sisraeli@illusivenetworks.com>
wrote:
Show quoted text
Hi all,
We are running PostgreSQL v9.5.19 over Windows Server 2012 R2, 16GB RAM.
Lately, postgres started to crash (happened already 3 times ~once a month)
and before its crashes I found this message in Event Log:PANIC: could not write to log file {} at offset {}, length {}: Invalid
argument(so I assumed it is related).
attached is our configuration.
Any ideas about what is the problem? or anything else I need to check?
Thanks is advance,
Shani Israeli - Software Developer+972 54 6689920
sisraeli@illusivenetworks.com
www.illusivenetworks.com
On 11/4/20 2:24 AM, Shani Israeli wrote:
Hi all,
We are running PostgreSQL v9.5.19 over Windows Server 2012 R2, 16GB RAM.
Lately, postgres started to crash (happened already 3 times ~once a
month) and before its crashes I found this message in Event Log:PANIC: could not write to log file {} at offset {}, length {}:
Invalid argument(so I assumed it is related).
attached is our configuration.
Any ideas about what is the problem? or anything else I need to check?
Any time I see seemingly random crashes involving file corruption on
Windows I think anti-virus software. Has someone turned an AV program
loose on this machine?
Thanks is advance,
Shani Israeli - Software Developer
+972 54 6689920sisraeli@illusivenetworks.com
www.illusivenetworks.com <https://www.illusivenetworks.com/>
--
Adrian Klaver
adrian.klaver@aklaver.com
On Wed, Nov 04, 2020 at 01:24:46PM +0100, Andreas Kretschmer wrote:
Any ideas about what is the problem? or anything else I need to check?
wild guess: Antivirus Software?
Perhaps not. To bring more context in here, PostgreSQL opens any
files on WIN32 with shared writes and reads allowed to have an
equivalent of what we do on all *nix platforms. Note here that the
problem comes from a WAL segment write, which is done after the file
handle is opened in shared mode. As long as the fd is correctly
opened, any attempt for an antivirus software to open a file with an
exclusive write would be blocked, no?
--
Michael
Michael Paquier <michael@paquier.xyz> writes:
On Wed, Nov 04, 2020 at 01:24:46PM +0100, Andreas Kretschmer wrote:
wild guess: Antivirus Software?
Perhaps not. To bring more context in here, PostgreSQL opens any
files on WIN32 with shared writes and reads allowed to have an
equivalent of what we do on all *nix platforms. Note here that the
problem comes from a WAL segment write, which is done after the file
handle is opened in shared mode. As long as the fd is correctly
opened, any attempt for an antivirus software to open a file with an
exclusive write would be blocked, no?
The only hard data point we've got here is the "Invalid argument"
string, which should mean EINVAL, although I'm not entirely sure
where that string is determined in a Windows build. So it seems
like there are two possibilities:
* The actual underlying Windows error code is one of the ones
that win32error.c maps to EINVAL:
ERROR_INVALID_FUNCTION, EINVAL
ERROR_INVALID_ACCESS, EINVAL
ERROR_INVALID_DATA, EINVAL
ERROR_INVALID_PARAMETER, EINVAL
ERROR_INVALID_HANDLE, EINVAL
ERROR_NEGATIVE_SEEK, EINVAL
* The actual underlying Windows error code is something that
win32error.c doesn't know, which would cause _dosmaperr() to
return EINVAL.
The latter case would result in a LOG message "unrecognized win32 error
code", so it would be good to know if any of those are showing up in
the postmaster log.
Seems like maybe it wasn't a great idea for _dosmaperr's fallback
errno to be something that is also a real error code.
regards, tom lane
On Thu, Nov 5, 2020 at 3:12 AM Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Nov 04, 2020 at 01:24:46PM +0100, Andreas Kretschmer wrote:
Any ideas about what is the problem? or anything else I need to check?
wild guess: Antivirus Software?
Perhaps not. To bring more context in here, PostgreSQL opens any
files on WIN32 with shared writes and reads allowed to have an
equivalent of what we do on all *nix platforms. Note here that the
problem comes from a WAL segment write, which is done after the file
handle is opened in shared mode. As long as the fd is correctly
opened, any attempt for an antivirus software to open a file with an
exclusive write would be blocked, no?
The problem with AVs generally doesn't come from them opening files in
non-share mode (I've, surprisingly enough, seen backup software that
causes that problem for example). It might happen on scheduled scans
for example, but the bigger problem with AV software has always been
their filter driver software which intercepts both the open/close and
the read/write calls an application makes and "does it's magic" on
them before handing the actual call up to the operating system. It's
completely independent of how the file is opened.
--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/
On Thu, Nov 05, 2020 at 10:21:40AM +0100, Magnus Hagander wrote:
The problem with AVs generally doesn't come from them opening files in
non-share mode (I've, surprisingly enough, seen backup software that
causes that problem for example). It might happen on scheduled scans
for example, but the bigger problem with AV software has always been
their filter driver software which intercepts both the open/close and
the read/write calls an application makes and "does it's magic" on
them before handing the actual call up to the operating system. It's
completely independent of how the file is opened.
This one is a bit new to me. I certainly saw my share of stat() or
open() calls failing on ENOPERM because of file handles taken
exclusively by external scanners around here or even with
customer-related issues, and I did not expect that such dark magic
could be involved in a write. It would indeed not be surprising to
see a PANIC depending on what gets done.
--
Michael
On Wed, Nov 04, 2020 at 10:23:04PM -0500, Tom Lane wrote:
The latter case would result in a LOG message "unrecognized win32 error
code", so it would be good to know if any of those are showing up in
the postmaster log.
Yeah. Not sure which one it could be here:
https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-writefile
One possibility could also be ERROR_OPERATION_ABORTED, which is not in
the mapping table. So that would map to EINVAL.
Seems like maybe it wasn't a great idea for _dosmaperr's fallback
errno to be something that is also a real error code.
You could say that for any fallback errno as long as you don't know if
there's a LOG to show that a DWORD does not map with the table of
win32error.c, no?
(I got to wonder whether it would be worth the complexity to show more
information when using _dosmaperr() for WIN32 on stuff like
elog(ERROR, "%m"), just a wild thought).
--
Michael
Michael Paquier <michael@paquier.xyz> writes:
(I got to wonder whether it would be worth the complexity to show more
information when using _dosmaperr() for WIN32 on stuff like
elog(ERROR, "%m"), just a wild thought).
Maybe. It's been in the back of my mind for a long time that the
_dosmaperr() mapping may be confusing us in some of these hard-to-explain
trouble reports. It'd be great if we could see the original Windows error
code too. Not quite sure how to mechanize that, though. Places where we
do stuff like save-and-restore errno across some other operation would break
any easy solution.
regards, tom lane