pg_basebackup failed to read a file

Started by Mike Cardwellover 7 years ago7 messagesgeneral
Jump to latest
#1Mike Cardwell
mike.cardwell@hardenize.com

Hi,

I was just setting up streaming replication for the first time. I ran
pg_basebackup on the slave. It copied 1.5TB of data. Then it errored
out with:

```
1498215035/1498215035 kB (100%), 1/1 tablespace
pg_basebackup: could not get write-ahead log end position from server:
ERROR:  could not open file "./postgresql.conf~": Permission denied
pg_basebackup: removing data directory "/var/lib/pgsql/10/data"
bash-4.2$
```

Now, I know what this error means. There was a root owned file at
"/var/lib/pgsql/10/data/postgresql.conf~" which contained an old
version of our postgres config and was not readable by the postgres
user. I'll delete this file and try again. However, in the mean time: I
feel like it would be useful for pg_basebackup to check that it has
read access to all of the existing files in the source directory at the
start, before it begins it's copy. I'd like to submit this as a feature
request, but I'm struggling to find how to do that. So here I am... Can
anyone point me in the right direction?

Regards,

Mike

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mike Cardwell (#1)
Re: pg_basebackup failed to read a file

Mike Cardwell <mike.cardwell@hardenize.com> writes:

pg_basebackup: could not get write-ahead log end position from server:
ERROR:  could not open file "./postgresql.conf~": Permission denied

Now, I know what this error means. There was a root owned file at
"/var/lib/pgsql/10/data/postgresql.conf~" which contained an old
version of our postgres config and was not readable by the postgres
user. I'll delete this file and try again. However, in the mean time: I
feel like it would be useful for pg_basebackup to check that it has
read access to all of the existing files in the source directory at the
start, before it begins it's copy.

That seems like a pretty expensive thing to do, if there are lots of
files ... and you'd still end up failing, so it's not moving the ball
very far.

More generally, this seems closely related to bug #14999 [1]/messages/by-id/20180104200633.17004.16377@wrigleys.postgresql.org
which concerned pg_rewind's behavior in the face of unexpected file
permissions within the data directory. We ended up not doing anything
about that except documenting it, which I wasn't very satisfied with,
but the costs of doing better seemed to exceed the benefits.

It'd be nice to have a more coherent theory about what needs to be copied
or not, and not fail on files that could simply be ignored. Up to now
we've resisted having any centrally defined knowledge of what can be
inside a PG data directory, but maybe that bullet needs to be bitten.

regards, tom lane

[1]: /messages/by-id/20180104200633.17004.16377@wrigleys.postgresql.org

#3Ron
ronljohnsonjr@gmail.com
In reply to: Tom Lane (#2)
Re: pg_basebackup failed to read a file

On 08/14/2018 11:14 AM, Tom Lane wrote:

Mike Cardwell <mike.cardwell@hardenize.com> writes:

pg_basebackup: could not get write-ahead log end position from server:
ERROR:  could not open file "./postgresql.conf~": Permission denied
Now, I know what this error means. There was a root owned file at
"/var/lib/pgsql/10/data/postgresql.conf~" which contained an old
version of our postgres config and was not readable by the postgres
user. I'll delete this file and try again. However, in the mean time: I
feel like it would be useful for pg_basebackup to check that it has
read access to all of the existing files in the source directory at the
start, before it begins it's copy.

That seems like a pretty expensive thing to do, if there are lots of
files ... and you'd still end up failing, so it's not moving the ball
very far.

Why is checking a bunch of file permissions anywhere close to being as
expensive as transferring 1.5TB over a WAN link?

--
Angular momentum makes the world go 'round.

#4Dimitri Maziuk
dmaziuk@bmrb.wisc.edu
In reply to: Ron (#3)
Re: pg_basebackup failed to read a file

On 08/14/2018 12:14 PM, Ron wrote:

Why is checking a bunch of file permissions anywhere close to being as
expensive as transferring 1.5TB over a WAN link?

Normally it shouldn't be but I recently had postgres create ~13M .snap
files and just opendir() took longer than anyone would care to wait...
so it can be just as expensive.

One could just as easily ask why create mode 600 files in places where
they don't belong.

--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu

#5Stephen Frost
sfrost@snowman.net
In reply to: Ron (#3)
Re: pg_basebackup failed to read a file

Greetings,

* Ron (ronljohnsonjr@gmail.com) wrote:

On 08/14/2018 11:14 AM, Tom Lane wrote:

Mike Cardwell <mike.cardwell@hardenize.com> writes:

pg_basebackup: could not get write-ahead log end position from server:
ERROR:  could not open file "./postgresql.conf~": Permission denied
Now, I know what this error means. There was a root owned file at
"/var/lib/pgsql/10/data/postgresql.conf~" which contained an old
version of our postgres config and was not readable by the postgres
user. I'll delete this file and try again. However, in the mean time: I
feel like it would be useful for pg_basebackup to check that it has
read access to all of the existing files in the source directory at the
start, before it begins it's copy.

That seems like a pretty expensive thing to do, if there are lots of
files ... and you'd still end up failing, so it's not moving the ball
very far.

Why is checking a bunch of file permissions anywhere close to being as
expensive as transferring 1.5TB over a WAN link?

One could argue that the cost would be bourn by everyone who is using
pg_basebackup and not just those users who are transferring 1.5TB over a
WAN link.

That said, pgbackrest always builds a full manifest by scanning all of
the directories, tablespaces, files, etc, and I can't recall anyone ever
complaining about it. Certainly, failing fast would be better than
failing after a lot of time has been spent.

Thanks!

Stephen

#6Joshua D. Drake
jd@commandprompt.com
In reply to: Tom Lane (#2)
Re: pg_basebackup failed to read a file

On 08/14/2018 09:14 AM, Tom Lane wrote:

Mike Cardwell <mike.cardwell@hardenize.com> writes:

It'd be nice to have a more coherent theory about what needs to be copied
or not, and not fail on files that could simply be ignored. Up to now
we've resisted having any centrally defined knowledge of what can be
inside a PG data directory, but maybe that bullet needs to be bitten.

This is not the first time, nor the second time this issue has arisen. I
would think we would know that a coherent theory or at least
semi-coherent theory would be pretty easy to resolve. Granted, we can't
reasonably know what is going on under base but under the / of PGDATA,
we know *exactly* what files should and should not be in there.

JD

--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc
*** A fault and talent of mine is to tell it exactly how it is. ***
PostgreSQL centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://postgresconf.org
***** Unless otherwise stated, opinions are my own. *****

#7Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#2)
Re: pg_basebackup failed to read a file

On Tue, Aug 14, 2018 at 12:14:59PM -0400, Tom Lane wrote:

That seems like a pretty expensive thing to do, if there are lots of
files ... and you'd still end up failing, so it's not moving the ball
very far.

Yeah, I would think that with many small relations it is going to have a
measurable performance impact if we scan the whole data directory a
second time.

More generally, this seems closely related to bug #14999 [1]
which concerned pg_rewind's behavior in the face of unexpected file
permissions within the data directory. We ended up not doing anything
about that except documenting it, which I wasn't very satisfied with,
but the costs of doing better seemed to exceed the benefits.

Please feel free to read the end of the thread about details on the
matter. There are many things you could do, all have drawbacks.

It'd be nice to have a more coherent theory about what needs to be copied
or not, and not fail on files that could simply be ignored. Up to now
we've resisted having any centrally defined knowledge of what can be
inside a PG data directory, but maybe that bullet needs to be bitten.

Yeah, I have not really come up with a nice idea yet, especially when
things sometimes move with custom files that some users have been
deploying, so I am not completely sure that we'd need to do something
anyway, nor that it is worth the trouble. One saner strategy may be to
split your custom file into a directory out of the main data folder...
--
Michael