BUG #13350: blindly fsyncing data dir considered harmful

Started by Nonameover 10 years ago11 messages
#1Noname
andrew@tao11.riddles.org.uk

The following bug has been logged on the website:

Bug reference: 13350
Logged by: Andrew Gierth
Email address: andrew@tao11.riddles.org.uk
PostgreSQL version: 9.1.16
Operating system: Debian (and probably others)
Description:

The addition of a recursive fsync of the data dir on startup (in the absence
of a clean shutdown) causes startup to fail if the data dir contains
symlinks to files which the postgres user can't write to.

This is the standard configuration for many SSL-enabled setups, including
the standard debian packaging defaults. Accordingly, crash recovery now
ALWAYS fails on such systems without manual intervention.

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#2Stephen Frost
sfrost@snowman.net
In reply to: Noname (#1)
Re: [BUGS] BUG #13350: blindly fsyncing data dir considered harmful

All,

(sending to -core to request a release)

* andrew@tao11.riddles.org.uk (andrew@tao11.riddles.org.uk) wrote:

Operating system: Debian (and probably others)

The addition of a recursive fsync of the data dir on startup (in the absence
of a clean shutdown) causes startup to fail if the data dir contains
symlinks to files which the postgres user can't write to.

This is the standard configuration for many SSL-enabled setups, including
the standard debian packaging defaults. Accordingly, crash recovery now
ALWAYS fails on such systems without manual intervention.

Andrew did a great job summarizing the problem, don't know that there's
much to add there.

This was back-patched all the way and released with the latest round of
minor releases, and given that it means crash recovery fails for a large
number of deployed systems, I think we need to fix (or revert) the
recursive fsync change (d8ac77ab178ddb2ae043b8c463cd30c031e793d0 and
related) and do new releases very shortly.

Thanks!

Stephen

#3Magnus Hagander
magnus@hagander.net
In reply to: Stephen Frost (#2)
Re: [CORE] [BUGS] BUG #13350: blindly fsyncing data dir considered harmful

On May 25, 2015 5:12 PM, "Stephen Frost" <sfrost@snowman.net> wrote:

All,

(sending to -core to request a release)

* andrew@tao11.riddles.org.uk (andrew@tao11.riddles.org.uk) wrote:

Operating system: Debian (and probably others)

The addition of a recursive fsync of the data dir on startup (in the

absence

of a clean shutdown) causes startup to fail if the data dir contains
symlinks to files which the postgres user can't write to.

This is the standard configuration for many SSL-enabled setups,

including

the standard debian packaging defaults. Accordingly, crash recovery now
ALWAYS fails on such systems without manual intervention.

Andrew did a great job summarizing the problem, don't know that there's
much to add there.

This was back-patched all the way and released with the latest round of
minor releases, and given that it means crash recovery fails for a large
number of deployed systems, I think we need to fix (or revert) the
recursive fsync change (d8ac77ab178ddb2ae043b8c463cd30c031e793d0 and
related) and do new releases very shortly.

Agreed, this is a pretty bad regression and we need to at least do
something and out out a release asap - either revert or if we can find a
better way (see the other thread about this issue for some other ideas).

It happens to be the default shipment on Debian and Ubuntu but it's
definitely not a platform specific problem I believe, so we should put out
a "real" release and not expect packagers to carry a specific patch for it.

/Magnus

#4Stephen Frost
sfrost@snowman.net
In reply to: Magnus Hagander (#3)
Re: [CORE] [BUGS] BUG #13350: blindly fsyncing data dir considered harmful

* Magnus Hagander (magnus@hagander.net) wrote:

On May 25, 2015 5:12 PM, "Stephen Frost" <sfrost@snowman.net> wrote:

This was back-patched all the way and released with the latest round of
minor releases, and given that it means crash recovery fails for a large
number of deployed systems, I think we need to fix (or revert) the
recursive fsync change (d8ac77ab178ddb2ae043b8c463cd30c031e793d0 and
related) and do new releases very shortly.

Agreed, this is a pretty bad regression and we need to at least do
something and out out a release asap - either revert or if we can find a
better way (see the other thread about this issue for some other ideas).

It happens to be the default shipment on Debian and Ubuntu but it's
definitely not a platform specific problem I believe, so we should put out
a "real" release and not expect packagers to carry a specific patch for it.

Agreed, there are certainly other reasons why a file might exist which
can't be written to by the postgres user, we really can't have crash
recovery fail because of it.

Further, I believe that a lot of the .deb-based distributions use
the same technique of using symlinks (Ubuntu included, but even those
who aren't downstream of Debian), and they would all have to be updated
with such a patch. Sadly, suggesting to stay on a prior version (eg:
9.4.1) really isn't acceptable either, given the corruption risks
which 9.4.2 addressed.

Thanks!

Stephen

#5Greg Stark
stark@mit.edu
In reply to: Magnus Hagander (#3)
Re: [CORE] [BUGS] BUG #13350: blindly fsyncing data dir considered harmful

What exactly is failing?

Is it that fsync is returning -1 ? Should we just ignore errors from
fsync if it happens in this stage? That might be safer than
determining which files should or shouldn't be fsynced.

That would also have an impact on people starting up on a flaky file
system perhaps. I'm imagining either a database mounted on a
filesystem with corruption trying to extract what they can or perhaps
something like an NFS filesystem with dangling .nfs files. On the one
hand limping along as best we can is the general Postgres strategy
when the filesystem starts failing but on the other hand we have had
circumstances in the past when users had a database on a network
filesystem that wasn't really ready for use yet when Postgres tried to
start up and starting up anyways didn't do them any favours.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Bruce Momjian
bruce@momjian.us
In reply to: Greg Stark (#5)
Re: [CORE] [BUGS] BUG #13350: blindly fsyncing data dir considered harmful

On Mon, May 25, 2015 at 04:37:59PM +0100, Greg Stark wrote:

What exactly is failing?

Is it that fsync is returning -1 ? Should we just ignore errors from
fsync if it happens in this stage? That might be safer than
determining which files should or shouldn't be fsynced.

Interesting idea. We could skip fsync -1 failures only for symbolic
links.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Andres Freund
andres@anarazel.de
In reply to: Bruce Momjian (#6)
Re: [CORE] [BUGS] BUG #13350: blindly fsyncing data dir considered harmful

On May 25, 2015 8:52:33 AM PDT, Bruce Momjian <bruce@momjian.us> wrote:

On Mon, May 25, 2015 at 04:37:59PM +0100, Greg Stark wrote:

What exactly is failing?

Is it that fsync is returning -1 ? Should we just ignore errors from
fsync if it happens in this stage? That might be safer than
determining which files should or shouldn't be fsynced.

Interesting idea. We could skip fsync -1 failures only for symbolic
links.

Now there's already a thread on jackets about this.

Andres

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Greg Stark (#5)
Re: [CORE] [BUGS] BUG #13350: blindly fsyncing data dir considered harmful

Greg Stark <stark@mit.edu> writes:

What exactly is failing?
Is it that fsync is returning -1 ?

According to the original report from Christoph Berg, it was open()
not fsync() that was failing, at least in permissions-based cases.

I'm not sure if we should just uniformly ignore all failures in this
phase. That would have the merit of clearly not creating any new
startup failure cases compared to the previous code, but as you say
sometimes it might mean ignoring real problems.

Another idea would be to elog(LOG) such failures but press on anyway.

Lastly, there is a difference between ignoring failures in fsync_fname()
and ignoring failures in walkdir(). The latter is presumably a general
purpose subroutine and so I would expect it to throw errors if it gets a
failure in, say, opendir(). Which it does. But do we want that in this
context? Unreadable directories underneath $PGDATA are probably not a
good thing, but OTOH this would still mean that there's a new startup
failure condition compared to before.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Stephen Frost
sfrost@snowman.net
In reply to: Andres Freund (#7)
Re: [CORE] [BUGS] BUG #13350: blindly fsyncing data dir considered harmful

* Andres Freund (andres@anarazel.de) wrote:

On May 25, 2015 8:52:33 AM PDT, Bruce Momjian <bruce@momjian.us> wrote:

On Mon, May 25, 2015 at 04:37:59PM +0100, Greg Stark wrote:

What exactly is failing?

Is it that fsync is returning -1 ? Should we just ignore errors from
fsync if it happens in this stage? That might be safer than
determining which files should or shouldn't be fsynced.

Interesting idea. We could skip fsync -1 failures only for symbolic
links.

Now there's already a thread on jackets about this.

Agreed- we should probably keep the discussion about fixing it (or
addressing the original issue in some way that doesn't run into this
issue) on the other thread. This is simply to point out the user impact
and that we need to fix it quickly.

Thanks!

Stephen

#10Josh Berkus
josh@agliodbs.com
In reply to: Noname (#1)
Re: [CORE] [BUGS] BUG #13350: blindly fsyncing data dir considered harmful

All,

If it's any consolation, the folks at kernel.org are having a bad week
too: http://lwn.net/Articles/645720/

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Andres Freund
andres@anarazel.de
In reply to: Josh Berkus (#10)
Re: [CORE] [BUGS] BUG #13350: blindly fsyncing data dir considered harmful

On 2015-05-25 14:11:34 -0700, Josh Berkus wrote:

If it's any consolation, the folks at kernel.org are having a bad week
too: http://lwn.net/Articles/645720/

This doesn't seem to come even close to their problems ;). A problem
that you can fix by making permissions isn't that bad.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers