BUG #16927: Postgres can`t access WAL files
The following bug has been logged on the website:
Bug reference: 16927
Logged by: Ярослав Пашинский
Email address: yarik97.6@gmail.com
PostgreSQL version: 13.2
Operating system: Windows Server 2019
Description:
Recently, I had updated 4 clusters postgreSQL from version 9.6 to 13.2, and
now have issue with errors that you can see below from log file. There is
high frequency of how this error gets occurred.
With such type of errors I can`t set up replication slot and I worry about
is it safe for data in DB cluster.
I already took some actions to get rid of this error: 1) enabled exclusions
in GPO for Microsoft Defender (clusters directory and postgres process)
2) checked system with sfc tool.
3) checked user & system rights for cluster directory.
But nothings from this works.
Here are some lines from log file.
2021-03-15 15:44:54.674 EET [3992] LOG: could not rename file
"pg_wal/000000010000086E000000C6": No such file or directory
2021-03-15 15:48:38.884 EET [3992] LOG: could not rename file
"pg_wal/000000010000086D000000E7": Permission denied
2021-03-15 15:48:49.813 EET [3992] LOG: could not rename file
"pg_wal/000000010000086D000000FD": Permission denied
2021-03-15 15:49:00.642 EET [3992] LOG: could not rename file
"pg_wal/000000010000086E00000033": Permission denied
Best regards, Yaroslav.
On Mon, Mar 15, 2021 at 02:19:57PM +0000, PG Bug reporting form wrote:
Recently, I had updated 4 clusters postgreSQL from version 9.6 to 13.2, and
now have issue with errors that you can see below from log file. There is
high frequency of how this error gets occurred.
With such type of errors I can`t set up replication slot and I worry about
is it safe for data in DB cluster.
I already took some actions to get rid of this error: 1) enabled exclusions
in GPO for Microsoft Defender (clusters directory and postgres process)
2) checked system with sfc tool.
3) checked user & system rights for cluster directory.
But nothings from this works.
Here are some lines from log file.
2021-03-15 15:44:54.674 EET [3992] LOG: could not rename file
"pg_wal/000000010000086E000000C6": No such file or directory
2021-03-15 15:48:38.884 EET [3992] LOG: could not rename file
"pg_wal/000000010000086D000000E7": Permission denied
2021-03-15 15:48:49.813 EET [3992] LOG: could not rename file
"pg_wal/000000010000086D000000FD": Permission denied
2021-03-15 15:49:00.642 EET [3992] LOG: could not rename file
"pg_wal/000000010000086E00000033": Permission denied
There have been multiple reports of this issue for 13, though we have
not been able to determine if this was coming from Postgres or if a
recent Windows update is causing that. Here, you basically say that
all stable versions of PostgreSQL are seeing this problem, which is
new to me. Particularly, 9.6.20 did not have any issues but 9.6.21
is showing this problem, right? Or did you update from an even older
version, say 9.6.19 or 9.6.18?
Are all those instances involved with streaming replication? What is
exactly the version of your Windows server? d726e44f catches my
eyes here, looking at the diffs between both versions..
--
Michael
On Tue, Mar 16, 2021 at 10:20:20AM +0900, Michael Paquier wrote:
Are all those instances involved with streaming replication? What is
exactly the version of your Windows server? d726e44f catches my
eyes here, looking at the diffs between both versions..
And... While stressing my Windows box with a pgbench this morning, I
have been able to reproduce the problem on HEAD after 20 minutes of
run, and my box is just an old VM image provided by Microsoft that has
no fancy scanner running concurrently as far as I know. No
replication involved here, just a standalone deployment. I'll look at
what I have.
--
Michael
(Please keep pgsql-bugs in CC of this thread)
On Tue, Mar 16, 2021 at 10:16:29AM +0200, Ярослав Пашинский wrote:
2021-03-15 00:08:20.563 EET [9936] LOG: could not rename temporary
statistics file "pg_stat_tmp/global.tmp" to "pg_stat_tmp/global.stat":
Permission denied
This pattern is new.
I got this error also after updating from 9.6 to 13.2
Oh, sorry, I did not understand your last message as you used the word
"updating", which sounded like you ran a set of minor updates for
multiple servers and saw the same issue across all those versions.
But what you mean is that you did one major upgrade, from 9.6 to 13.
So, the solution could be to rollback to 9.6 and then update to 12.X
version?
There has been a collection of reports lately with 13.X misbehaving
when it comes to the recycling of WAL segments:
/messages/by-id/3861ff1e-0923-7838-e826-094cc9bef737@hot.ee
/messages/by-id/16874-c3eecd319e36a2bf@postgresql.org
/messages/by-id/095ccf8d-7f58-d928-427c-b17ace23cae6@burgess.co.nz
Your report is the 4th one of its kind, and based on the data
collected up to now, 12.X or older major versions do not see the
issue. I have begun a thread about the problem on -hackers, that's
too much to be a coincidence, and more than one version of Windows
sees the problem:
/messages/by-id/YFBcRbnBiPdGZvfW@paquier.xyz
--
Michael
Import Notes
Reply to msg id not found: CADLmToLD6a=eXvh+b6u+Ju5Bm2ZvtCHBnCqY_d8U9we+315HZg@mail.gmail.com
Hi,
On Tue, Mar 16, 2021 at 12:40:15PM +0200, Ярослав Пашинский wrote:
On prod server with windows server 2019 I used .zip downloaded from
https://www.enterprisedb.com/download-postgresql-binaries and unpacked it
to the default folder (Program Files/PostgreSQL/13). But the previous
version (9.6) was installed via .exe installer.
On developer server with Windows server 2016 I used an installer to install
postgres 13, but as I said before there are the same issues on both servers.
Are you talking about building patch from scratch? If yes, I`ll try it.
Yes, that's the idea, and it is an experience by itself to compile
the Postgres source code on Windows :)
We'd need to check after two things:
1) The code compiled from the source code of 13.2 is still able to
reproduce the issue.
2) Once the patch attached is applied on top of 13.2, check if the
problem goes away or not.
I am still running some tests on my own environments, but that's much
harder to hit for me, visibly, and the error code path complaining is
not the same. If I may ask, what are the contents of pg_wal on the
instances where the errors happen. Do you have some files suffixed
with ".deleted" around, or anything named like xlogtemp.N, where N is
an integer for a PID?
By the way, could you hit "reply-all" for the emails or attach
properly in CC pgsql-bugs so as everybody can see the discussion
happening here?
--
Michael
Attachments:
0001-Revert-Remove-HAVE_WORKING_LINK.patchtext/x-diff; charset=us-asciiDownload+28-11
Import Notes
Reply to msg id not found: CADLmTo+gkFJ9or_Mfp6KKEj9MAip_BgHG6A9mzxR1DMpuRp3fg@mail.gmail.com
On Wed, Mar 17, 2021 at 07:30:00AM +0900, Michael Paquier wrote:
I am still running some tests on my own environments, but that's much
harder to hit for me, visibly, and the error code path complaining is
not the same. If I may ask, what are the contents of pg_wal on the
instances where the errors happen. Do you have some files suffixed
with ".deleted" around, or anything named like xlogtemp.N, where N is
an integer for a PID?
Another thing I could do here is to share links to download all the
binaries built for 13.2 and 13.2 + a patch, that you could drop into
your own servers to test what I am suspecting causes the issue.
Depending on your server policies, perhaps that's not acceptable,
though. Just let me know which one you'd prefer.
--
Michael
I am able to run compiled binaries + patch on my servers, that not a
problem. Or I could also compile if you could tell me briefly how to do
that because it`s real useful skill :)
I attached 2 files: files_list.txt - that content of pg_wal directory;
second file file_option.png - properties of one wal file that postgres
can`t access. Strange thing is even with domain admin or local admin I
can`t see rights properties for this file.
ср, 17 мар. 2021 г. в 01:56, Michael Paquier <michael@paquier.xyz>:
Show quoted text
On Wed, Mar 17, 2021 at 07:30:00AM +0900, Michael Paquier wrote:
I am still running some tests on my own environments, but that's much
harder to hit for me, visibly, and the error code path complaining is
not the same. If I may ask, what are the contents of pg_wal on the
instances where the errors happen. Do you have some files suffixed
with ".deleted" around, or anything named like xlogtemp.N, where N is
an integer for a PID?Another thing I could do here is to share links to download all the
binaries built for 13.2 and 13.2 + a patch, that you could drop into
your own servers to test what I am suspecting causes the issue.
Depending on your server policies, perhaps that's not acceptable,
though. Just let me know which one you'd prefer.
--
Michael
On Wed, Mar 17, 2021 at 10:34:05AM +0200, Ярослав Пашинский wrote:
I am able to run compiled binaries + patch on my servers, that not a
problem. Or I could also compile if you could tell me briefly how to do
that because it`s real useful skill :)
There is some documentation to do that with Visual Studio:
https://www.postgresql.org/docs/devel/install-windows.html
In my case, I just use a command prompt to launch those commands and
do the work. I can send you links to download custom builds, of
course. My guess is that these should be able to work on your host,
as Windows is good in terms of backward-compatibility.
I attached 2 files: files_list.txt - that content of pg_wal directory;
second file file_option.png - properties of one wal file that postgres
can`t access. Strange thing is even with domain admin or local admin I
can`t see rights properties for this file.
Thanks. The .deleted files come from RemoveXlogFile() where a file
gets removed. This means that a rename before doing an unlink()
fails. What we are looking for here is what is holding those files
back.
--
Michael
Okay, I`ll try to build from source today following the instructions that
you sent to me. And another question is how to apply patch or you will send
me links with build + patch?
By the way, unfortunately, yesterday another 3 postgres instances started
"complaining" about access to wal files in logs, so that why I`m
interesting in fixing this ASAP :)
чт, 18 мар. 2021 г. в 00:27, Michael Paquier <michael@paquier.xyz>:
Show quoted text
On Wed, Mar 17, 2021 at 10:34:05AM +0200, Ярослав Пашинский wrote:
I am able to run compiled binaries + patch on my servers, that not a
problem. Or I could also compile if you could tell me briefly how to do
that because it`s real useful skill :)There is some documentation to do that with Visual Studio:
https://www.postgresql.org/docs/devel/install-windows.html
In my case, I just use a command prompt to launch those commands and
do the work. I can send you links to download custom builds, of
course. My guess is that these should be able to work on your host,
as Windows is good in terms of backward-compatibility.I attached 2 files: files_list.txt - that content of pg_wal directory;
second file file_option.png - properties of one wal file that postgres
can`t access. Strange thing is even with domain admin or local admin I
can`t see rights properties for this file.Thanks. The .deleted files come from RemoveXlogFile() where a file
gets removed. This means that a rename before doing an unlink()
fails. What we are looking for here is what is holding those files
back.
--
Michael
On Thu, Mar 18, 2021 at 10:21:53AM +0200, Ярослав Пашинский wrote:
Okay, I`ll try to build from source today following the instructions that
you sent to me. And another question is how to apply patch or you will send
me links with build + patch?
The "patch" command would be enough. Please note that I have
generated some builds of 13.2 unpatched and 13.2 patched that you
could directly reuse, so that may make your life easier. I'll send
you the links in a couple of minutes in a separate email.
By the way, unfortunately, yesterday another 3 postgres instances started
"complaining" about access to wal files in logs, so that why I`m
interesting in fixing this ASAP :)
:(
--
Michael
The strange thing is why one server works fine on unpatched binaries while
the second one requires a patched version to get rid of pg_wal access
error.
UPD: just right now on developer server I got an error: "2021-03-18
14:11:39.444 EET [892] LOG: could not rename file
"pg_wal/00000001000009130000006F": Permission denied"
Will switch to patched binaries and tell you later.
P.S: Sorry, that I didn't include in reply psql-bugs, is it ok right now?
чт, 18 мар. 2021 г. в 13:15, Michael Paquier <michael@paquier.xyz>:
Show quoted text
On Thu, Mar 18, 2021 at 12:44:29PM +0200, Ярослав Пашинский wrote:
So, I started test on my Windows server that we using for replica on
instance, which I copied from master. The binarys was unpatched, that
you sent me here. The system is: windows server 2016, os build14393.4283.
To emulate load I used pgbench with such parameters -t 10000 -c 50 -j 20.
After couple of running test in log file I found almost same errors:
"2021-03-18 11:27:14.322 EET [3748] LOG: could not rename temporary
statistics file "pg_stat_tmp/global.tmp" to "pg_stat_tmp/global.stat":
Permission denied
2021-03-18 11:27:18.928 EET [692] LOG: using stale statistics instead of
current ones because stats collector is not responding"
...and
"2021-03-18 11:48:49.630 EET [6476] LOG: could not rename file
"pg_wal/00000001000000650000008F": Permission denied"
So I decided to switch to patched binaries and sometimes get only thisone
error:
"2021-03-18 12:27:14.571 EET [4840] LOG: could not rename temporary
statistics file "pg_stat_tmp/global.tmp" to "pg_stat_tmp/global.stat":
Permission denied
2021-03-18 12:27:19.178 EET [7556] LOG: using stale statistics insteadof
current ones because stats collector is not responding"
Which is not very critical, so it`s ok.Okay, so it looks like a very good news to me. With the patched
binaries you are not seeing the renaming problem with the WAL files
anymore.On other hand, on developer server (Windows Server 2016 (version 1607, OS
build 14393.4225)) with real load and unpatched binaries now I got no
errors about about pg_wal and gets only twice this error:
"2021-03-18 12:07:26.153 EET [2956] LOG: could not rename temporary
statistics file "pg_stat_tmp/global.tmp" to "pg_stat_tmp/global.stat":
Permission denied"So, keep testing. That's strange for now. I am thinking about changing
binaries on prod server, but it will be possible on Saturday.Yes, I think that it would be good to do more tests, as it may be
possible that what you are seeing does not repeat. What you are
reporting is encouraging though. Thanks!By the way, it is very important to report that to the community
mailing lists. Could you add pgsql-bugs when replying please?
--
Michael
Import Notes
Reply to msg id not found: YFM2MRbHZGQ4DsG/@paquier.xyzReference msg id not found: YFCF+JbyfObn5zPu@paquier.xyzReference msg id not found: CADLmTo+gkFJ9or_Mfp6KKEj9MAip_BgHG6A9mzxR1DMpuRp3fg@mail.gmail.com
On Thu, Mar 18, 2021 at 02:16:21PM +0200, Ярослав Пашинский wrote:
чт, 18 мар. 2021 г. в 13:15, Michael Paquier <michael@paquier.xyz>:
On Thu, Mar 18, 2021 at 12:44:29PM +0200, Ярослав Пашинский wrote:
So, I started test on my Windows server that we using for replica on
instance, which I copied from master. The binarys was unpatched, that
you sent me here. The system is: windows server 2016, os build 14393.4283.
To emulate load I used pgbench with such parameters -t 10000 -c 50 -j 20.
After couple of running test in log file I found almost same errors:
"2021-03-18 11:27:14.322 EET [3748] LOG: could not rename temporary
statistics file "pg_stat_tmp/global.tmp" to "pg_stat_tmp/global.stat":
Permission denied
2021-03-18 11:27:18.928 EET [692] LOG: using stale statistics instead of
current ones because stats collector is not responding"
...and
"2021-03-18 11:48:49.630 EET [6476] LOG: could not rename file
"pg_wal/00000001000000650000008F": Permission denied"
So I decided to switch to patched binaries and sometimes get only this
one error:
"2021-03-18 12:27:14.571 EET [4840] LOG: could not rename temporary
statistics file "pg_stat_tmp/global.tmp" to "pg_stat_tmp/global.stat":
Permission denied
2021-03-18 12:27:19.178 EET [7556] LOG: using stale statistics insteadof current ones because stats collector is not responding"
Which is not very critical, so it`s ok.
Okay, so it looks like a very good news to me. With the patched
binaries you are not seeing the renaming problem with the WAL files
anymore.On other hand, on developer server (Windows Server 2016 (version 1607, OS
build 14393.4225)) with real load and unpatched binaries now I got no
errors about about pg_wal and gets only twice this error:
"2021-03-18 12:07:26.153 EET [2956] LOG: could not rename temporary
statistics file "pg_stat_tmp/global.tmp" to "pg_stat_tmp/global.stat":
Permission denied"So, keep testing. That's strange for now. I am thinking about changing
binaries on prod server, but it will be possible on Saturday.Yes, I think that it would be good to do more tests, as it may be
possible that what you are seeing does not repeat. What you are
reporting is encouraging though. Thanks!By the way, it is very important to report that to the community
mailing lists. Could you add pgsql-bugs when replying please?The strange thing is why one server works fine on unpatched binaries while
the second one requires a patched version to get rid of pg_wal access
error.UPD: just right now on developer server I got an error: "2021-03-18
14:11:39.444 EET [892] LOG: could not rename file
"pg_wal/00000001000009130000006F": Permission denied"
Will switch to patched binaries and tell you later.
The issue seems to depend on timing and the load your cluster is
facing, so that is not surprising to hear that this does not show up
100% of the time. I am actually glad to hear that you have not seen
the issue anymore with the patched builds, while the unpatched builds
have shown the problem at least once. It would be a problem if the
patched builds begin to complain about the renaming of the WAL
segments though as we would have to consider a different theory.
P.S: Sorry, that I didn't include in reply psql-bugs, is it ok right now?
That's fine. Thanks :)
I have added to this email the last things we discussed, for
transparency.
--
Michael
Hi,
On 2021-03-18 14:16:21 +0200, Ярослав Пашинский wrote:
The strange thing is why one server works fine on unpatched binaries while
the second one requires a patched version to get rid of pg_wal access
error.
UPD: just right now on developer server I got an error: "2021-03-18
14:11:39.444 EET [892] LOG: could not rename file
"pg_wal/00000001000009130000006F": Permission denied"
Will switch to patched binaries and tell you later.
Could you use
https://docs.microsoft.com/en-us/sysinternals/downloads/findlinks on one
of the files that can't be renamed? Or even better, the all the WAL
files?
Greetings,
Andres Freund
Yes, for this moment I have 4 clusters on developer server and 1 cluster
that I`m testing by my own and there is no error connected with
access to WAL files for last about 17-20 hours. I`ll will run my prod
clusters and will also tell you. If I won`t send you any new message - the
problem is gone also on prod server. Thanks in advance!
P.S: the error "2021-03-18 12:22:26.096 EET [4840] LOG: could not rename
temporary statistics file "pg_stat_tmp/global.tmp" to
"pg_stat_tmp/global.stat": Permission denied" still be, as I know it`s not
very
P.S.S: will be this patch available to download for everyone?
чт, 18 мар. 2021 г. в 23:56, Michael Paquier <michael@paquier.xyz>:
Show quoted text
On Thu, Mar 18, 2021 at 02:16:21PM +0200, Ярослав Пашинский wrote:
чт, 18 мар. 2021 г. в 13:15, Michael Paquier <michael@paquier.xyz>:
On Thu, Mar 18, 2021 at 12:44:29PM +0200, Ярослав Пашинский wrote:
So, I started test on my Windows server that we using for replica on
instance, which I copied from master. The binarys was unpatched, that
you sent me here. The system is: windows server 2016, os build14393.4283.
To emulate load I used pgbench with such parameters -t 10000 -c 50 -j
20.
After couple of running test in log file I found almost same errors:
"2021-03-18 11:27:14.322 EET [3748] LOG: could not rename temporary
statistics file "pg_stat_tmp/global.tmp" to "pg_stat_tmp/global.stat":
Permission denied
2021-03-18 11:27:18.928 EET [692] LOG: using stale statistics insteadof
current ones because stats collector is not responding"
...and
"2021-03-18 11:48:49.630 EET [6476] LOG: could not rename file
"pg_wal/00000001000000650000008F": Permission denied"
So I decided to switch to patched binaries and sometimes get only this
one error:
"2021-03-18 12:27:14.571 EET [4840] LOG: could not rename temporary
statistics file "pg_stat_tmp/global.tmp" to "pg_stat_tmp/global.stat":
Permission denied
2021-03-18 12:27:19.178 EET [7556] LOG: using stale statistics insteadof current ones because stats collector is not responding"
Which is not very critical, so it`s ok.
Okay, so it looks like a very good news to me. With the patched
binaries you are not seeing the renaming problem with the WAL files
anymore.On other hand, on developer server (Windows Server 2016 (version 1607,
OS
build 14393.4225)) with real load and unpatched binaries now I got no
errors about about pg_wal and gets only twice this error:
"2021-03-18 12:07:26.153 EET [2956] LOG: could not rename temporary
statistics file "pg_stat_tmp/global.tmp" to "pg_stat_tmp/global.stat":
Permission denied"So, keep testing. That's strange for now. I am thinking about changing
binaries on prod server, but it will be possible on Saturday.Yes, I think that it would be good to do more tests, as it may be
possible that what you are seeing does not repeat. What you are
reporting is encouraging though. Thanks!By the way, it is very important to report that to the community
mailing lists. Could you add pgsql-bugs when replying please?The strange thing is why one server works fine on unpatched binaries
while
the second one requires a patched version to get rid of pg_wal access
error.UPD: just right now on developer server I got an error: "2021-03-18
14:11:39.444 EET [892] LOG: could not rename file
"pg_wal/00000001000009130000006F": Permission denied"
Will switch to patched binaries and tell you later.The issue seems to depend on timing and the load your cluster is
facing, so that is not surprising to hear that this does not show up
100% of the time. I am actually glad to hear that you have not seen
the issue anymore with the patched builds, while the unpatched builds
have shown the problem at least once. It would be a problem if the
patched builds begin to complain about the renaming of the WAL
segments though as we would have to consider a different theory.P.S: Sorry, that I didn't include in reply psql-bugs, is it ok right now?
That's fine. Thanks :)
I have added to this email the last things we discussed, for
transparency.
--
Michael
On Fri, Mar 19, 2021 at 10:08:36AM +0200, Ярослав Пашинский wrote:
Yes, for this moment I have 4 clusters on developer server and 1 cluster
that I`m testing by my own and there is no error connected with
access to WAL files for last about 17-20 hours. I`ll will run my prod
clusters and will also tell you. If I won`t send you any new message - the
problem is gone also on prod server. Thanks in advance!
Cool. So, it really looks like we have found the issue based on what
you are saying here, and that we had better consider as a first step a
revert of aaaef7a on HEAD and REL_13_STABLE.
So, what do others think? Would people agree to revert aaaef7a for
now?
P.S: the error "2021-03-18 12:22:26.096 EET [4840] LOG: could not rename
temporary statistics file "pg_stat_tmp/global.tmp" to
"pg_stat_tmp/global.stat": Permission denied" still be, as I know it`s not
very
This one is in a different code path.
P.S.S: will be this patch available to download for everyone?
Well, if a different committer or myself is able to get a patch
committed, it will available to everyone once 13.3 gets released.
This would happen in May based on the existing roadmap:
https://www.postgresql.org/developer/roadmap/
How much did you test the unpatched builds by the way?
--
Michael
Hello, sure. Right now I found recent log for one WAL file and here is a
output of find links program:
".\FindLinks64.exe "D:\DB_update\PSQL_4\pg_wal\000000010000429E00000077"
Findlinks v1.1 - Locate file hard links
Copyright (C) 2011-2016 Mark Russinovich
Sysinternals - www.sysinternals.com
Error opening d:\db_update\psql_4\pg_wal\000000010000429e00000077:
Access is denied. "
P.S.: I run program with admin rights.
пт, 19 мар. 2021 г. в 01:51, Andres Freund <andres@anarazel.de>:
Show quoted text
Hi,
On 2021-03-18 14:16:21 +0200, Ярослав Пашинский wrote:
The strange thing is why one server works fine on unpatched binaries
while
the second one requires a patched version to get rid of pg_wal access
error.
UPD: just right now on developer server I got an error: "2021-03-18
14:11:39.444 EET [892] LOG: could not rename file
"pg_wal/00000001000009130000006F": Permission denied"
Will switch to patched binaries and tell you later.Could you use
https://docs.microsoft.com/en-us/sysinternals/downloads/findlinks on one
of the files that can't be renamed? Or even better, the all the WAL
files?Greetings,
Andres Freund
"and that we had better consider as a first step a
revert of aaaef7a on HEAD and REL_13_STABLE.
So, what do others think? Would people agree to revert aaaef7a for
now?"
I didn't quite understand you here, is " aaaef7a " stands for patch name?
" How much did you test the unpatched builds by the way? "
On cluster where load was caused by pgbench I met wal file access error
after about 1 hour, but on developer server with real load (usually up to
100 connections) the error occurred after 2.5-3 hours.
пт, 19 мар. 2021 г. в 10:28, Michael Paquier <michael@paquier.xyz>:
Show quoted text
On Fri, Mar 19, 2021 at 10:08:36AM +0200, Ярослав Пашинский wrote:
Yes, for this moment I have 4 clusters on developer server and 1 cluster
that I`m testing by my own and there is no error connected with
access to WAL files for last about 17-20 hours. I`ll will run my prod
clusters and will also tell you. If I won`t send you any new message -the
problem is gone also on prod server. Thanks in advance!
Cool. So, it really looks like we have found the issue based on what
you are saying here, and that we had better consider as a first step a
revert of aaaef7a on HEAD and REL_13_STABLE.So, what do others think? Would people agree to revert aaaef7a for
now?P.S: the error "2021-03-18 12:22:26.096 EET [4840] LOG: could not rename
temporary statistics file "pg_stat_tmp/global.tmp" to
"pg_stat_tmp/global.stat": Permission denied" still be, as I know it`snot
very
This one is in a different code path.
P.S.S: will be this patch available to download for everyone?
Well, if a different committer or myself is able to get a patch
committed, it will available to everyone once 13.3 gets released.
This would happen in May based on the existing roadmap:
https://www.postgresql.org/developer/roadmap/How much did you test the unpatched builds by the way?
--
Michael
On Fri, Mar 19, 2021 at 11:04:14AM +0200, Ярослав Пашинский wrote:
I didn't quite understand you here, is " aaaef7a " stands for patch name?
There was a typo in one of my previous messages. What I was referring
to is aaa3aedd. That's a commit of the Postgres code tree, if you are
not familiar with git, here is a link to the code change:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=aaa3aedd
" How much did you test the unpatched builds by the way? "
On cluster where load was caused by pgbench I met wal file access error
after about 1 hour, but on developer server with real load (usually up to
100 connections) the error occurred after 2.5-3 hours.
OK, thanks. My environments are not that sensitive to the issue,
unfortunately.
--
Michael
I don`t know the dependencies of appearances of this issue (Honestly, I
tried to find them before mailing psql-bugs). For example, yesterday log
file was full of this messages but today I was waiting to get this error
back to check wal file linking via FindLinks program. At least I`m
confident that this issue could destroy replication and this issue not only
on one machine. Anyway, I`m happy that your patch seems to be key to solve
this problem.
пт, 19 мар. 2021 г. в 13:36, Michael Paquier <michael@paquier.xyz>:
Show quoted text
On Fri, Mar 19, 2021 at 11:04:14AM +0200, Ярослав Пашинский wrote:
I didn't quite understand you here, is " aaaef7a " stands for patch name?
There was a typo in one of my previous messages. What I was referring
to is aaa3aedd. That's a commit of the Postgres code tree, if you are
not familiar with git, here is a link to the code change:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=aaa3aedd" How much did you test the unpatched builds by the way? "
On cluster where load was caused by pgbench I met wal file access error
after about 1 hour, but on developer server with real load (usually up to
100 connections) the error occurred after 2.5-3 hours.OK, thanks. My environments are not that sensitive to the issue,
unfortunately.
--
Michael
Michael Paquier <michael@paquier.xyz> writes:
There was a typo in one of my previous messages. What I was referring
to is aaa3aedd.
Ah, I was just about to ask what the heck aaaef7a referred to.
Given the evidence that there's a problem, I agree with reverting
that. I'd suggest keeping the cosmetic rename of the function,
but we have to put back the Windows-doesn't-HAVE_WORKING_LINK logic.
Grepping in the v12 branch, I find a second use of HAVE_WORKING_LINK
in contrib/pg_standby. But that seems to be in a non-WIN32 code path,
so I don't think putting that back is necessary.
regards, tom lane