pgsql: Recursively fsync() the data directory after a crash.

Started by Robert Haasabout 11 years ago14 messagescomitters
Jump to latest
#1Robert Haas
robertmhaas@gmail.com

Recursively fsync() the data directory after a crash.

Otherwise, if there's another crash, some writes from after the first
crash might make it to disk while writes from before the crash fail
to make it to disk. This could lead to data corruption.

Back-patch to all supported versions.

Abhijit Menon-Sen, reviewed by Andres Freund and slightly revised
by me.

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/2ce439f3379aed857517c8ce207485655000fc8e

Modified Files
--------------
src/backend/access/transam/xlog.c | 42 ++++++++++++++
src/backend/storage/file/fd.c | 115 +++++++++++++++++++++++++++++++++++++
src/include/storage/fd.h | 2 +
3 files changed, 159 insertions(+)

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#2Andrew Dunstan
andrew@dunslane.net
In reply to: Robert Haas (#1)
Re: pgsql: Recursively fsync() the data directory after a crash.

On 05/04/2015 02:23 PM, Robert Haas wrote:

Recursively fsync() the data directory after a crash.

Otherwise, if there's another crash, some writes from after the first
crash might make it to disk while writes from before the crash fail
to make it to disk. This could lead to data corruption.

Back-patch to all supported versions.

This appears to have broken Windows builds. At least it's broken by
using the two argument form of open instead of the three argument form,
for which we have a #define in the win32 case, and pg_win32_is_junction
is a typo - the first _ should not be there.

cheers

andrew

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#3Robert Haas
robertmhaas@gmail.com
In reply to: Andrew Dunstan (#2)
Re: pgsql: Recursively fsync() the data directory after a crash.

On Mon, May 4, 2015 at 11:37 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

On 05/04/2015 02:23 PM, Robert Haas wrote:

Recursively fsync() the data directory after a crash.

Otherwise, if there's another crash, some writes from after the first
crash might make it to disk while writes from before the crash fail
to make it to disk. This could lead to data corruption.

Back-patch to all supported versions.

This appears to have broken Windows builds. At least it's broken by using
the two argument form of open instead of the three argument form, for which
we have a #define in the win32 case, and pg_win32_is_junction is a typo -
the first _ should not be there.

Sorry about that. Working on it now.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#4Peter Eisentraut
peter_e@gmx.net
In reply to: Robert Haas (#1)
Re: pgsql: Recursively fsync() the data directory after a crash.

On 5/4/15 2:23 PM, Robert Haas wrote:

Recursively fsync() the data directory after a crash.

Otherwise, if there's another crash, some writes from after the first
crash might make it to disk while writes from before the crash fail
to make it to disk. This could lead to data corruption.

If there a reason why in pre_sync_fname(), the error message does not
contain a %m?

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#5Robert Haas
robertmhaas@gmail.com
In reply to: Peter Eisentraut (#4)
Re: pgsql: Recursively fsync() the data directory after a crash.

On Sun, May 17, 2015 at 9:44 PM, Peter Eisentraut <peter_e@gmx.net> wrote:

On 5/4/15 2:23 PM, Robert Haas wrote:

Recursively fsync() the data directory after a crash.

Otherwise, if there's another crash, some writes from after the first
crash might make it to disk while writes from before the crash fail
to make it to disk. This could lead to data corruption.

If there a reason why in pre_sync_fname(), the error message does not
contain a %m?

For consistency with the rest of the file, I suppose. Not sure why
it's like that, but all the functions in the file do it that way.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#5)
Re: pgsql: Recursively fsync() the data directory after a crash.

Robert Haas <robertmhaas@gmail.com> writes:

On Sun, May 17, 2015 at 9:44 PM, Peter Eisentraut <peter_e@gmx.net> wrote:

If there a reason why in pre_sync_fname(), the error message does not
contain a %m?

For consistency with the rest of the file, I suppose. Not sure why
it's like that, but all the functions in the file do it that way.

Huh? All the ones that are reporting kernel call failures certainly
have %m.

regards, tom lane

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#7Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#6)
Re: pgsql: Recursively fsync() the data directory after a crash.

On Mon, May 18, 2015 at 11:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Sun, May 17, 2015 at 9:44 PM, Peter Eisentraut <peter_e@gmx.net> wrote:

If there a reason why in pre_sync_fname(), the error message does not
contain a %m?

For consistency with the rest of the file, I suppose. Not sure why
it's like that, but all the functions in the file do it that way.

Huh? All the ones that are reporting kernel call failures certainly
have %m.

Oops, you're right. I was looking at the wrong code. Yeah, that
should probably be fixed. I'm not sure it's a good idea to do that
right now though.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#7)
Re: pgsql: Recursively fsync() the data directory after a crash.

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, May 18, 2015 at 11:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Huh? All the ones that are reporting kernel call failures certainly
have %m.

Oops, you're right. I was looking at the wrong code. Yeah, that
should probably be fixed. I'm not sure it's a good idea to do that
right now though.

It's a trivial enough change that I wouldn't see a problem with doing it
now. But if you do, please get it in by say 2PM EDT.

regards, tom lane

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#9Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#8)
Re: pgsql: Recursively fsync() the data directory after a crash.

On Mon, May 18, 2015 at 12:15:03PM -0400, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, May 18, 2015 at 11:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Huh? All the ones that are reporting kernel call failures certainly
have %m.

Oops, you're right. I was looking at the wrong code. Yeah, that
should probably be fixed. I'm not sure it's a good idea to do that
right now though.

It's a trivial enough change that I wouldn't see a problem with doing it
now. But if you do, please get it in by say 2PM EDT.

Uh, doesn't that change the translated strings? Is that a problem?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#9)
Re: pgsql: Recursively fsync() the data directory after a crash.

Bruce Momjian <bruce@momjian.us> writes:

On Mon, May 18, 2015 at 12:15:03PM -0400, Tom Lane wrote:

It's a trivial enough change that I wouldn't see a problem with doing it
now. But if you do, please get it in by say 2PM EDT.

Uh, doesn't that change the translated strings? Is that a problem?

Better an untranslated message than a translated one that lacks critical
info. But anyway, there are likely other instances of that same string
*with* %m ...

regards, tom lane

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#11Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#10)
Re: pgsql: Recursively fsync() the data directory after a crash.

On Mon, May 18, 2015 at 12:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Bruce Momjian <bruce@momjian.us> writes:

On Mon, May 18, 2015 at 12:15:03PM -0400, Tom Lane wrote:

It's a trivial enough change that I wouldn't see a problem with doing it
now. But if you do, please get it in by say 2PM EDT.

Uh, doesn't that change the translated strings? Is that a problem?

Better an untranslated message than a translated one that lacks critical
info. But anyway, there are likely other instances of that same string
*with* %m ...

Probably not, because of the way the message is worded. But we could do this:

diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 6fa75d1..bed8478 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -2461,7 +2461,7 @@ pre_sync_fname(char *fname, bool isdir)
        if (fd < 0)
                ereport(FATAL,
-                               (errmsg("could not open file \"%s\"
before fsync",
+                               (errmsg("could not open file \"%s\": %m",
                                                fname)));

pg_flush_data(fd, 0, 0);

Does that sound good?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#11)
Re: pgsql: Recursively fsync() the data directory after a crash.

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, May 18, 2015 at 12:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Better an untranslated message than a translated one that lacks critical
info. But anyway, there are likely other instances of that same string
*with* %m ...

Probably not, because of the way the message is worded. But we could do this:

diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 6fa75d1..bed8478 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -2461,7 +2461,7 @@ pre_sync_fname(char *fname, bool isdir)
if (fd < 0)
ereport(FATAL,
-                               (errmsg("could not open file \"%s\"
before fsync",
+                               (errmsg("could not open file \"%s\": %m",
fname)));

pg_flush_data(fd, 0, 0);

Does that sound good?

Works for me. The file/line info for the message would provide the
"before fsync" context anyway, if someone needed it.

regards, tom lane

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#13Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#11)
Re: pgsql: Recursively fsync() the data directory after a crash.

Robert Haas wrote:

On Mon, May 18, 2015 at 12:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Uh, doesn't that change the translated strings? Is that a problem?

Better an untranslated message than a translated one that lacks critical
info. But anyway, there are likely other instances of that same string
*with* %m ...

Probably not, because of the way the message is worded. But we could do this:

diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 6fa75d1..bed8478 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -2461,7 +2461,7 @@ pre_sync_fname(char *fname, bool isdir)
if (fd < 0)
ereport(FATAL,
-                               (errmsg("could not open file \"%s\"
before fsync",
+                               (errmsg("could not open file \"%s\": %m",
fname)));

pg_flush_data(fd, 0, 0);

Does that sound good?

+1. It's not like the extra two words of context would be of much use
anyway.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#14Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#13)
Re: pgsql: Recursively fsync() the data directory after a crash.

On Mon, May 18, 2015 at 1:00 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Robert Haas wrote:

On Mon, May 18, 2015 at 12:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Uh, doesn't that change the translated strings? Is that a problem?

Better an untranslated message than a translated one that lacks critical
info. But anyway, there are likely other instances of that same string
*with* %m ...

Probably not, because of the way the message is worded. But we could do this:

diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 6fa75d1..bed8478 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -2461,7 +2461,7 @@ pre_sync_fname(char *fname, bool isdir)
if (fd < 0)
ereport(FATAL,
-                               (errmsg("could not open file \"%s\"
before fsync",
+                               (errmsg("could not open file \"%s\": %m",
fname)));

pg_flush_data(fd, 0, 0);

Does that sound good?

+1. It's not like the extra two words of context would be of much use
anyway.

Done.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers