Permission Denied Error on pg_xlog/RECOVERYXLOG file

Started by Nonamealmost 10 years ago12 messageshackersgeneral
Jump to latest
#1Noname
Jeffrey.Marshall@usitc.gov
hackersgeneral

Hi Folks,

We reported a bug last month (#14119) related to receiving a permission denied error on the pg_xlog/RECOVERYXLOG file when performing a database recovery (PostgreSQL versions 9.2.16 and 9.2.17). The bug report was for version 9.2.16 but we also are having the same problem with 9.2.17. We were just curious if anyone else running 9.2.16 or 9.2.17 had encountered this issue.

The link to the bug report is below, but much of the information is included here as well.

When performing a vanilla database restore using either the 9.2.16 or 9.2.17 executables (i.e. just restoring the database files from a 'tar' backup and reading the WAL files created during the 'tar' backup - no specific PIT given in recovery.conf) the database server will abort with a permission denied error on the pg_xlog/RECOVERYXLOG file. The error occurred restoring both backups that were made under the current version (9.2.16 and 9.2.17) as well as backups made under prior versions (9.2.15 at least). The exact same restore process/backup files can be used to successfully restore the database using the 9.2.15 executables, but fail when using either 9.2.16 or 9.2.17 with the permission denied error.

ENVIRONMENT: PostgreSQL 9.2.16

[postgres-pt(at)postXX pg_xlog]$ psql --version
psql (PostgreSQL) 9.2.16

[postgres-pt(at)postXX pg_xlog]$ psql
psql (9.2.16)
Type "help" for help.

admin_db=# select version();
version
----------------------------------------------------------------------------------------------------------------
PostgreSQL 9.2.16 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7
20120313 (Red Hat 4.4.7-16), 64-bit
(1 row)

OPERATING SYSTEM INFORMATION:

The database server is running on a Redhat Linux host (Red Hat Enterprise Linux Server release 6.8 (Santiago)) -

[postgres-pt(at)postXX pg_xlog]$ uname -a
Linux 2.6.32-573.22.1.el6.x86_64 #1 SMP Thu Mar 17 03:23:39 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux

The Postgres executables were downloaded/installed using rpm/yum (yum update postgresql92-server) and not compiled locally (same download/install process as prior versions) - no additional modules/no modifications have been made.

STEPS TAKEN TO PRODUCE ERROR:

The restores being performed are not to a specific point in time (i.e. restore the database as of 4/18/16 09:59:17); the restores are just loading the database as of the completion of the last full backup.

Perform a database backup:

select pg_start_backup('backup_2016_04_19');

tar and compress the tablespace directories (2 tablespace directories in this case) and the cluster directory
save any WAL files created during the backup

select pg_stop_backup();

Restore the database:

Remove tablespace directories
Remove cluster directory
Uncompress/extract the tablespace directories
Uncompress/extract the cluster directory
Remove files from pg_xlog directory
Create a recovery.conf file that points to the directory with the WAL archive files created during the backup

restart the server using the 9.2.16 executables

pg_ctl -D /pgdata/pgsrv_cluster_data/postXX_rest_92 start

Message from database error log:

2016-04-27 17:02:05 EDT 572128cd.1811 [1-1] user=,db=,remote= LOG: 00000:
database system was interrupted; last known up at 2016-04-19 14:18:03 EDT
2016-04-27 17:02:05 EDT 572128cd.1811 [2-1] user=,db=,remote= LOCATION:
StartupXLOG, xlog.c:6372
2016-04-27 17:02:05 EDT 572128cd.1811 [3-1] user=,db=,remote= LOG: 00000:
starting archive recovery
2016-04-27 17:02:05 EDT 572128cd.1811 [4-1] user=,db=,remote= LOCATION:
StartupXLOG, xlog.c:6465
2016-04-27 17:02:06 EDT 572128cd.1811 [5-1] user=,db=,remote= LOG: 00000:
restored log file "0000000100000006000000FE" from archive
2016-04-27 17:02:06 EDT 572128cd.1811 [6-1] user=,db=,remote= LOCATION:
RestoreArchivedFile, xlog.c:3267
2016-04-27 17:02:06 EDT 572128cd.1811 [7-1] user=,db=,remote= FATAL: 42501:
could not open file "pg_xlog/RECOVERYXLOG": Permission denied
2016-04-27 17:02:06 EDT 572128cd.1811 [8-1] user=,db=,remote= LOCATION:
fsync_fname_ext, fd.c:2654
2016-04-27 17:02:06 EDT 572128cc.180f [3-1] user=,db=,remote= LOG: 00000:
startup process (PID 6161) exited with exit code 1
2016-04-27 17:02:06 EDT 572128cc.180f [4-1] user=,db=,remote= LOCATION:
LogChildExit, postmaster.c:3008
2016-04-27 17:02:06 EDT 572128cc.180f [5-1] user=,db=,remote= LOG: 00000:
aborting startup due to startup process failure
2016-04-27 17:02:06 EDT 572128cc.180f [6-1] user=,db=,remote= LOCATION:
reaper, postmaster.c:2461

Thanks,
Jeff Marshall

Link to bug report: /messages/by-id/20160428132828.19110.9264@wrigleys.postgresql.org

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noname (#1)
hackersgeneral
Re: [GENERAL] Permission Denied Error on pg_xlog/RECOVERYXLOG file

[ redirecting to -hackers ]

<Jeffrey.Marshall@usitc.gov> writes:

When performing a vanilla database restore using either the 9.2.16 or 9.2.17 executables (i.e. just restoring the database files from a 'tar' backup and reading the WAL files created during the 'tar' backup - no specific PIT given in recovery.conf) the database server will abort with a permission denied error on the pg_xlog/RECOVERYXLOG file. The error occurred restoring both backups that were made under the current version (9.2.16 and 9.2.17) as well as backups made under prior versions (9.2.15 at least). The exact same restore process/backup files can be used to successfully restore the database using the 9.2.15 executables, but fail when using either 9.2.16 or 9.2.17 with the permission denied error.

There were not very many changes between 9.2.15 and 9.2.16. Between that
and the location of the error:

2016-04-27 17:02:06 EDT 572128cd.1811 [7-1] user=,db=,remote= FATAL: 42501:
could not open file "pg_xlog/RECOVERYXLOG": Permission denied
2016-04-27 17:02:06 EDT 572128cd.1811 [8-1] user=,db=,remote= LOCATION:
fsync_fname_ext, fd.c:2654

I feel pretty confident in blaming this on the "durable_rename" patch.

The proximate cause of this might just be that the "ignore_perm" exception
is only for EACCES and not EPERM (why?). In general, though, it seems to
me that the durable_rename patch has introduced a whole lot of new failure
conditions that were not there before, for IMO very little reason.
I think we would be better off fixing those functions so that there is
*no* case other than failure of the rename() or link() call itself that
will be treated as a hard error. Blowing up completely is not an
improvement over not fsyncing.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#2)
hackersgeneral
Re: [GENERAL] Permission Denied Error on pg_xlog/RECOVERYXLOG file

On 2016-05-26 12:44:51 -0400, Tom Lane wrote:

2016-04-27 17:02:06 EDT 572128cd.1811 [7-1] user=,db=,remote= FATAL: 42501:
could not open file "pg_xlog/RECOVERYXLOG": Permission denied

So, what's the permission of RECOVERYXLOG at that point? It's pretty
weird that directly after running reason_command it's not readable. Are
you doing something involving sudo or such in restore_command?

The proximate cause of this might just be that the "ignore_perm" exception
is only for EACCES and not EPERM (why?).

I essentially just copied your logic from d8179b001ae574da00c8 ff.

In general, though, it seems to
me that the durable_rename patch has introduced a whole lot of new failure
conditions that were not there before, for IMO very little reason.

Uh. Like provably loosing data after crashes?

I think we would be better off fixing those functions so that there is
*no* case other than failure of the rename() or link() call itself that
will be treated as a hard error. Blowing up completely is not an
improvement over not fsyncing.

I'm not convinced of that. Hiding unexpected issues for longer, just to
continue kind-of-operating, can make the impact of problems a lot worse,
and it makes it very hard to actually learn about the issues.

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#3)
hackersgeneral
Re: [GENERAL] Permission Denied Error on pg_xlog/RECOVERYXLOG file

Andres Freund <andres@anarazel.de> writes:

On 2016-05-26 12:44:51 -0400, Tom Lane wrote:
2016-04-27 17:02:06 EDT 572128cd.1811 [7-1] user=,db=,remote= FATAL: 42501:
could not open file "pg_xlog/RECOVERYXLOG": Permission denied

So, what's the permission of RECOVERYXLOG at that point? It's pretty
weird that directly after running reason_command it's not readable.

s/not readable/not writable/. I doubt that it's a good idea for that
code to think that it can fail hard on non-writable files.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#4)
hackersgeneral
Re: [GENERAL] Permission Denied Error on pg_xlog/RECOVERYXLOG file

On 2016-05-27 20:54:43 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2016-05-26 12:44:51 -0400, Tom Lane wrote:
2016-04-27 17:02:06 EDT 572128cd.1811 [7-1] user=,db=,remote= FATAL: 42501:
could not open file "pg_xlog/RECOVERYXLOG": Permission denied

So, what's the permission of RECOVERYXLOG at that point? It's pretty
weird that directly after running reason_command it's not readable.

s/not readable/not writable/. I doubt that it's a good idea for that
code to think that it can fail hard on non-writable files.

But we actually sometimes write to files we've recovered; if they're the
end of the WAL after archive recovery and/or promotion. If a
restore_command restores files in a non-writable way it's buggy; I don't
see why it's worthwhile to work around that.

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#5)
hackersgeneral
Re: [GENERAL] Permission Denied Error on pg_xlog/RECOVERYXLOG file

On Sat, May 28, 2016 at 9:57 AM, Andres Freund <andres@anarazel.de> wrote:

On 2016-05-27 20:54:43 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2016-05-26 12:44:51 -0400, Tom Lane wrote:
2016-04-27 17:02:06 EDT 572128cd.1811 [7-1] user=,db=,remote= FATAL: 42501:
could not open file "pg_xlog/RECOVERYXLOG": Permission denied

So, what's the permission of RECOVERYXLOG at that point? It's pretty
weird that directly after running reason_command it's not readable.

s/not readable/not writable/. I doubt that it's a good idea for that
code to think that it can fail hard on non-writable files.

But we actually sometimes write to files we've recovered; if they're the
end of the WAL after archive recovery and/or promotion. If a
restore_command restores files in a non-writable way it's buggy; I don't
see why it's worthwhile to work around that.

Not exactly, startup process does not write directly to the files of
pg_xlog while in recovery. Even with the current code, the first file
that needs to be writable is the first WAL segment of the new
timeline, which is made as a copy of the last partial segment of the
old timeline.

Anyway, I agree with Andres here. We had definitely better be sure
that durable_rename does fsync correctly the old and new entries when
doing a rename meaning that failing on EPERM is adapted, or we may
lose the rename in case of a crash, and data would be lost if the
rename was not seen as effective. And personally I am more worrying
about data losses than strangely designed restore_command scripts,
which is likely using sudo.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#6)
hackersgeneral
Re: [GENERAL] Permission Denied Error on pg_xlog/RECOVERYXLOG file

On 2016-05-30 12:51:17 +0900, Michael Paquier wrote:

On Sat, May 28, 2016 at 9:57 AM, Andres Freund <andres@anarazel.de> wrote:

On 2016-05-27 20:54:43 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2016-05-26 12:44:51 -0400, Tom Lane wrote:
2016-04-27 17:02:06 EDT 572128cd.1811 [7-1] user=,db=,remote= FATAL: 42501:
could not open file "pg_xlog/RECOVERYXLOG": Permission denied

So, what's the permission of RECOVERYXLOG at that point? It's pretty
weird that directly after running reason_command it's not readable.

s/not readable/not writable/. I doubt that it's a good idea for that
code to think that it can fail hard on non-writable files.

But we actually sometimes write to files we've recovered; if they're the
end of the WAL after archive recovery and/or promotion. If a
restore_command restores files in a non-writable way it's buggy; I don't
see why it's worthwhile to work around that.

Not exactly, startup process does not write directly to the files of
pg_xlog while in recovery.

It does at the end of crash recovery. And the wal receiver does so at
the end of archive recovery (which can repeatedly be reached).

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Noname
Jeffrey.Marshall@usitc.gov
In reply to: Andres Freund (#7)
hackersgeneral
Re: [GENERAL] Permission Denied Error on pg_xlog/RECOVERYXLOG file

Hi Folks!

The permissions on the RECOVERYXLOG file at the time of the error are 0400:
-r--------. 1 postgres postgres 16777216 May 31 09:51 RECOVERYXLOG

I sent that info to Tom earlier this afternoon (still learning the posting protocols - sorry) - his response is below:

From Tom:

Ah, that confirms the diagnosis --- the new fsync code is assuming that all files it might need to fsync are writable by the postgres user.

What would be useful to know is how it got to be that way rather than
0600 (-rw-------). Seems like it must be a property of your archive restore script, but we don't have the details. Maybe you could show the script?

Also, I don't recall whether you said exactly what platform you're on, but that's probably important here as well.

regards, tom lane

OPERATING SYSTEM INFORMATION:

The database server is running on a Redhat Linux host (Red Hat Enterprise Linux Server release 6.8 (Santiago)) -

[postgres-pt(at)postXX pg_xlog]$ uname -a
Linux 2.6.32-573.22.1.el6.x86_64 #1 SMP Thu Mar 17 03:23:39 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux

The user account that creates the backup files is the same account that is used for performing the restores - and at no time are any elevated privileges used. The script we use for restoring our databases is not overly complex - a quick summary is below:

Save config files
Remove tablespace directories
Remove cluster directory
untar tablespace backup files
untar cluster backup file
copy back the config files

At this point in the script the pg_xlog directory is cleaned out - there are no files in the pg_xlog directory when the restore starts. The database server is then started.

Below are the contents of the pg_xlog directory from the unsuccessful restore attempts:

[postgres@postZZ postXX]$ pwd
/pgdata/pgsrv_xlog_data/postXX
[postgres@postZZ postXX]$ ls -lR
.:
total 4
drwx------ 2 postgres postgres 4096 May 31 14:58 archive_status

./archive_status:
total 0

[postgres@postZZ postXX]$ pg_ctl -D /pgdata/pgsrv_cluster_data/postXX_92 start
pg_ctl: another server might be running; trying to start server anyway
server starting
2016-05-31 14:59:18 EDT 574ddf06.7d10 [1-1] user=,db=,remote= LOG: loaded library "pg_stat_statements"

[postgres@postZZ postXX]$ ls -lR
.:
total 16388
-r-------- 1 postgres postgres 16777216 May 31 14:59 RECOVERYXLOG
drwx------ 2 postgres postgres 4096 May 31 14:58 archive_status

./archive_status:
total 0

Even if the permissions on the RECOVERYXLOG file are change (0777 in this case) and the file is left in the pg_xlog directory, the postgres process will recreate the file as 0400:

[postgres@postZZ postXX]$ chmod 777 RECOVERYXLOG
[postgres@postZZ postXX]$ ls -lR
.:
total 16388
-rwxrwxrwx 1 postgres postgres 16777216 May 31 14:59 RECOVERYXLOG
drwx------ 2 postgres postgres 4096 May 31 14:58 archive_status

./archive_status:
total 0

[postgres@postZZ postXX]$ pg_ctl -D /pgdata/pgsrv_cluster_data/postXX_92 start
server starting
2016-05-31 14:59:46 EDT 574ddf21.7d25 [1-1] user=,db=,remote= LOG: loaded library "pg_stat_statements"

[postgres@postZZ postXX]$ ls -lR
.:
total 16388
-r-------- 1 postgres postgres 16777216 May 31 14:59 RECOVERYXLOG
drwx------ 2 postgres postgres 4096 May 31 14:58 archive_status

./archive_status:
total 0

Interestingly enough, in the 9.2.15 version (successful restore), the file is also created 0400:

[postgres@postZZ postXX]$ pwd
/pgdata/pgsrv_xlog_data/postXX

[postgres@postZZ postXX]$ ls -lR
.:
total 4
drwx------ 2 postgres postgres 4096 May 31 14:58 archive_status

./archive_status:
total 0

[postgres@postZZ postXX]$ pg_ctl -D /pgdata/pgsrv_cluster_data/postXX_92 start
server starting
2016-05-31 15:36:27 EDT 574de7bb.7eff [1-1] user=,db=,remote= LOG: loaded library "pg_stat_statements"

[postgres@postZZ postXX]$ ls -lR
.:
total 65544
-r-------- 1 postgres postgres 16777216 May 31 14:46 000000010000000500000022
-rw------- 1 postgres postgres 56 May 31 14:46 00000002.history
-rw------- 1 postgres postgres 16777216 May 31 14:46 000000020000000500000022
-rw------- 1 postgres postgres 16777216 May 31 14:46 000000020000000500000023
-rw------- 1 postgres postgres 16777216 May 31 14:46 000000020000000500000024
drwx------ 2 postgres postgres 4096 May 31 14:46 archive_status

./archive_status:
total 0
-rw------- 1 postgres postgres 0 May 31 14:46 000000010000000500000022.done
-rw------- 1 postgres postgres 0 May 31 14:46 00000002.history.done
-rw------- 1 postgres postgres 0 May 31 14:46 000000020000000500000023.done

Some additional info. We are in the initial stages of upgrading from 9.2 to 9.5. I upgraded one of our database servers to 9.5.1, then backed up the upgraded database with our same backup process. Using both the 9.5.2 executables (where the fsync/rename patch was introduced) and the 9.5.3 executables I was able to successfully restore the 9.5.1 backup (same host/user account which produces an error under 9.2.16).

Below are various strace outputs that were captured performing the database restores. The exact same source files are used in both of the 9.2.X restores. I have 'grepped' out the rows for the RECOVERYXLOG file. Going through the output, the open modes are different (0444 versus 0600) between 9.2 and 9.5

9.2
open("pg_xlog/RECOVERYXLOG", O_WRONLY|O_CREAT|O_EXCL, 0444) = 7

9.5
open("pg_xlog/RECOVERYXLOG", O_WRONLY|O_CREAT|O_EXCL, 0600) = 4

9.2.15 backup/9.2.15 restore - successful

29262 stat("pg_xlog/RECOVERYXLOG", 0x7ffd91fc90b0) = -1 ENOENT (No such file or directory)
29263 execve("/bin/cp", ["cp", "/pgdata/postXX_walarchive/00"..., "pg_xlog/RECOVERYXLOG"], [/* 60 vars */]) = 0
29263 stat("pg_xlog/RECOVERYXLOG", 0x7ffd8697d490) = -1 ENOENT (No such file or directory)
29263 stat("pg_xlog/RECOVERYXLOG", 0x7ffd8697d2b0) = -1 ENOENT (No such file or directory)
29263 open("pg_xlog/RECOVERYXLOG", O_WRONLY|O_CREAT|O_EXCL, 0444) = 7
29262 stat("pg_xlog/RECOVERYXLOG", {st_mode=S_IFREG|0400, st_size=16777216, ...}) = 0
29262 rename("pg_xlog/RECOVERYXLOG", "pg_xlog/000000010000000500000022" <unfinished ...>
29262 stat("pg_xlog/RECOVERYXLOG", 0x7ffd91fc90e0) = -1 ENOENT (No such file or directory)
29266 execve("/bin/cp", ["cp", "/pgdata/postXX_walarchive/00"..., "pg_xlog/RECOVERYXLOG"], [/* 60 vars */]) = 0
29266 stat("pg_xlog/RECOVERYXLOG", 0x7fff9ad125e0) = -1 ENOENT (No such file or directory)
29262 stat("pg_xlog/RECOVERYXLOG", 0x7ffd91fc90e0) = -1 ENOENT (No such file or directory)
29267 execve("/bin/cp", ["cp", "/pgdata/postXX_walarchive/00"..., "pg_xlog/RECOVERYXLOG"], [/* 60 vars */]) = 0
29267 stat("pg_xlog/RECOVERYXLOG", 0x7fffca6281b0) = -1 ENOENT (No such file or directory)
29267 stat("pg_xlog/RECOVERYXLOG", 0x7fffca627fd0) = -1 ENOENT (No such file or directory)
29267 open("pg_xlog/RECOVERYXLOG", O_WRONLY|O_CREAT|O_EXCL, 0444) = 13
29262 stat("pg_xlog/RECOVERYXLOG", {st_mode=S_IFREG|0400, st_size=16777216, ...}) = 0
29262 rename("pg_xlog/RECOVERYXLOG", "pg_xlog/000000010000000500000022") = 0
29262 unlink("pg_xlog/RECOVERYXLOG") = -1 ENOENT (No such file or directory)

9.2.15 backup/9.2.16 restore - NOT successful

30153 stat("pg_xlog/RECOVERYXLOG", 0x7fff732488c0) = -1 ENOENT (No such file or directory)
30154 execve("/bin/cp", ["cp", "/pgdata/postXX_walarchive/00"..., "pg_xlog/RECOVERYXLOG"], [/* 60 vars */]) = 0
30154 stat("pg_xlog/RECOVERYXLOG", 0x7ffd3c7a96b0) = -1 ENOENT (No such file or directory)
30154 stat("pg_xlog/RECOVERYXLOG", 0x7ffd3c7a94d0) = -1 ENOENT (No such file or directory)
30154 open("pg_xlog/RECOVERYXLOG", O_WRONLY|O_CREAT|O_EXCL, 0444) = 7
30153 stat("pg_xlog/RECOVERYXLOG", {st_mode=S_IFREG|0400, st_size=16777216, ...}) = 0
30153 open("pg_xlog/RECOVERYXLOG", O_RDWR) = -1 EACCES (Permission denied)

9.5.1 backup/9.5.2 restore - successful

24615 stat("pg_xlog/RECOVERYXLOG", 0x7fff3e2f8e60) = -1 ENOENT (No such file or directory)
24619 execve("/bin/cp", ["cp", "/pgdata/postXX_95_orig_wal/0"..., "pg_xlog/RECOVERYXLOG"], [/* 60 vars */]) = 0
24619 stat("pg_xlog/RECOVERYXLOG", 0x7fffb2383ea0) = -1 ENOENT (No such file or directory)
24619 stat("pg_xlog/RECOVERYXLOG", 0x7fffb2383cc0) = -1 ENOENT (No such file or directory)
24619 open("pg_xlog/RECOVERYXLOG", O_WRONLY|O_CREAT|O_EXCL, 0600) = 4
24615 stat("pg_xlog/RECOVERYXLOG", {st_mode=S_IFREG|0600, st_size=16777216, ...}) = 0
24615 open("pg_xlog/RECOVERYXLOG", O_RDWR) = 3
24615 rename("pg_xlog/RECOVERYXLOG", "pg_xlog/00000001000000490000009A") = 0
24615 stat("pg_xlog/RECOVERYXLOG", 0x7fff3e2f8e80) = -1 ENOENT (No such file or directory)
24622 execve("/bin/cp", ["cp", "/pgdata/postXX_95_orig_wal/0"..., "pg_xlog/RECOVERYXLOG"], [/* 60 vars */]) = 0
24622 stat("pg_xlog/RECOVERYXLOG", 0x7fffb1f62350) = -1 ENOENT (No such file or directory)
24615 stat("pg_xlog/RECOVERYXLOG", 0x7fff3e2f8e80) = -1 ENOENT (No such file or directory)
24623 execve("/bin/cp", ["cp", "/pgdata/postXX_95_orig_wal/0"..., "pg_xlog/RECOVERYXLOG"], [/* 60 vars */]) = 0
24623 stat("pg_xlog/RECOVERYXLOG", 0x7ffd4a373c40) = -1 ENOENT (No such file or directory)
24623 stat("pg_xlog/RECOVERYXLOG", 0x7ffd4a373a60) = -1 ENOENT (No such file or directory)
24623 open("pg_xlog/RECOVERYXLOG", O_WRONLY|O_CREAT|O_EXCL, 0600) = 7
24615 stat("pg_xlog/RECOVERYXLOG", {st_mode=S_IFREG|0600, st_size=16777216, ...}) = 0
24615 open("pg_xlog/RECOVERYXLOG", O_RDWR) = 3
24615 rename("pg_xlog/RECOVERYXLOG", "pg_xlog/00000001000000490000009A") = 0
24615 unlink("pg_xlog/RECOVERYXLOG") = -1 ENOENT (No such file or directory)

Jeff Marshall
U.S. International Trade Commission

-----Original Message-----
From: Andres Freund [mailto:andres@anarazel.de]
Sent: Monday, May 30, 2016 5:14 PM
To: Michael Paquier
Cc: Tom Lane; Marshall, Jeffrey; PostgreSQL mailing lists
Subject: Re: [HACKERS] [GENERAL] Permission Denied Error on pg_xlog/RECOVERYXLOG file

On 2016-05-30 12:51:17 +0900, Michael Paquier wrote:

On Sat, May 28, 2016 at 9:57 AM, Andres Freund <andres@anarazel.de> wrote:

On 2016-05-27 20:54:43 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2016-05-26 12:44:51 -0400, Tom Lane wrote:
2016-04-27 17:02:06 EDT 572128cd.1811 [7-1] user=,db=,remote= FATAL: 42501:
could not open file "pg_xlog/RECOVERYXLOG": Permission denied

So, what's the permission of RECOVERYXLOG at that point? It's
pretty weird that directly after running reason_command it's not readable.

s/not readable/not writable/. I doubt that it's a good idea for
that code to think that it can fail hard on non-writable files.

But we actually sometimes write to files we've recovered; if they're
the end of the WAL after archive recovery and/or promotion. If a
restore_command restores files in a non-writable way it's buggy; I
don't see why it's worthwhile to work around that.

Not exactly, startup process does not write directly to the files of
pg_xlog while in recovery.

It does at the end of crash recovery. And the wal receiver does so at the end of archive recovery (which can repeatedly be reached).

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#3)
hackersgeneral
Re: [GENERAL] Permission Denied Error on pg_xlog/RECOVERYXLOG file

On Fri, May 27, 2016 at 8:44 PM, Andres Freund <andres@anarazel.de> wrote:

I'm not convinced of that. Hiding unexpected issues for longer, just to
continue kind-of-operating, can make the impact of problems a lot worse,
and it makes it very hard to actually learn about the issues.

So if we made this a WARNING rather than an ERROR, it wouldn't hiding
the issue, but it would be less likely to break things that worked
before. No?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#9)
hackersgeneral
Re: [GENERAL] Permission Denied Error on pg_xlog/RECOVERYXLOG file

On 2016-06-03 14:00:00 -0400, Robert Haas wrote:

On Fri, May 27, 2016 at 8:44 PM, Andres Freund <andres@anarazel.de> wrote:

I'm not convinced of that. Hiding unexpected issues for longer, just to
continue kind-of-operating, can make the impact of problems a lot worse,
and it makes it very hard to actually learn about the issues.

So if we made this a WARNING rather than an ERROR, it wouldn't hiding
the issue, but it would be less likely to break things that worked
before. No?

Except that we're then accepting the (proven!) potential for data
loss. We're talking about a single report of an restore_command setting
odd permissions. Which can easily be fixed. And the permission setting
already has downsides, e.g. for the switch between archive and streaming
recovery (which would fail).

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#10)
hackersgeneral
Re: [GENERAL] Permission Denied Error on pg_xlog/RECOVERYXLOG file

On Fri, Jun 3, 2016 at 2:12 PM, Andres Freund <andres@anarazel.de> wrote:

On 2016-06-03 14:00:00 -0400, Robert Haas wrote:

On Fri, May 27, 2016 at 8:44 PM, Andres Freund <andres@anarazel.de> wrote:

I'm not convinced of that. Hiding unexpected issues for longer, just to
continue kind-of-operating, can make the impact of problems a lot worse,
and it makes it very hard to actually learn about the issues.

So if we made this a WARNING rather than an ERROR, it wouldn't hiding
the issue, but it would be less likely to break things that worked
before. No?

Except that we're then accepting the (proven!) potential for data
loss. We're talking about a single report of an restore_command setting
odd permissions. Which can easily be fixed.

Well, I think that having restore_command start failing after a minor
release update can cause data loss, too. Or even an outage.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12David Steele
david@pgmasters.net
In reply to: Robert Haas (#11)
hackersgeneral
Re: [GENERAL] Permission Denied Error on pg_xlog/RECOVERYXLOG file

On 6/3/16 4:13 PM, Robert Haas wrote:

On Fri, Jun 3, 2016 at 2:12 PM, Andres Freund <andres@anarazel.de> wrote:

On 2016-06-03 14:00:00 -0400, Robert Haas wrote:

On Fri, May 27, 2016 at 8:44 PM, Andres Freund <andres@anarazel.de> wrote:

I'm not convinced of that. Hiding unexpected issues for longer, just to
continue kind-of-operating, can make the impact of problems a lot worse,
and it makes it very hard to actually learn about the issues.

So if we made this a WARNING rather than an ERROR, it wouldn't hiding
the issue, but it would be less likely to break things that worked
before. No?

Except that we're then accepting the (proven!) potential for data
loss. We're talking about a single report of an restore_command setting
odd permissions. Which can easily be fixed.

Well, I think that having restore_command start failing after a minor
release update can cause data loss, too. Or even an outage.

I'm mostly with Andres on this but you do make a good point, Robert.

Andres, what if on EPERM you set write privs on the file and retry?
Maybe only back patch that change and for 9.6 expect restore_command
scripts to set sane permissions.

--
-David
david@pgmasters.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers