pgsql: Speed up CREATE DATABASE by deferring the fsyncs until after

Started by Bruce Momjianabout 16 years ago29 messageshackers
Jump to latest
#1Bruce Momjian
bruce@momjian.us

Log Message:
-----------
Speed up CREATE DATABASE by deferring the fsyncs until after copying
all the data and using posix_fadvise to nudge the OS into flushing it
earlier. This also hopefully makes CREATE DATABASE avoid spamming the
cache.

Tests show a big speedup on Linux at least on some filesystems.

Idea and patch from Andres Freund.

Modified Files:
--------------
pgsql/src/backend/storage/file:
fd.c (r1.153 -> r1.154)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/storage/file/fd.c?r1=1.153&r2=1.154)
pgsql/src/include/storage:
fd.h (r1.66 -> r1.67)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/storage/fd.h?r1=1.66&r2=1.67)
pgsql/src/port:
copydir.c (r1.25 -> r1.26)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/port/copydir.c?r1=1.25&r2=1.26)

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#1)
Re: [COMMITTERS] pgsql: Speed up CREATE DATABASE by deferring the fsyncs until after

stark@postgresql.org (Greg Stark) writes:

Log Message:
-----------
Speed up CREATE DATABASE by deferring the fsyncs until after copying
all the data and using posix_fadvise to nudge the OS into flushing it
earlier. This also hopefully makes CREATE DATABASE avoid spamming the
cache.

The buildfarm indicates that this patch has got some serious issues.

regards, tom lane

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#2)
Re: [COMMITTERS] pgsql: Speed up CREATE DATABASE by deferring the fsyncs until after

I wrote:

The buildfarm indicates that this patch has got some serious issues.

Actually, looking closer, some of the Windows machines started failing
after the *earlier* patch to add directory fsyncs.

regards, tom lane

#4Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#3)
Re: [COMMITTERS] pgsql: Speed up CREATE DATABASE by deferring the fsyncs until after

On Monday 15 February 2010 08:13:32 Tom Lane wrote:

I wrote:

The buildfarm indicates that this patch has got some serious issues.

Actually, looking closer, some of the Windows machines started failing
after the *earlier* patch to add directory fsyncs.

And not only the windows machines. Seems sensible to add a configure check
whether directory-fsyncing works.
But at least I am not capable of writing good m4/configure.in/whatever without
strong supervision...

Will try if nobody else with more knowledge does and if somebody will look
over it afterwards.

Andres

#5marcin mank
marcin.mank@gmail.com
In reply to: Andres Freund (#4)
Re: [COMMITTERS] pgsql: Speed up CREATE DATABASE by deferring the fsyncs until after

On Mon, Feb 15, 2010 at 9:36 AM, Andres Freund <andres@anarazel.de> wrote:

On Monday 15 February 2010 08:13:32 Tom Lane wrote:

Actually, looking closer, some of the Windows machines started failing
after the *earlier* patch to add directory fsyncs.

And not only the windows machines. Seems sensible to add a configure check
whether directory-fsyncing works.

It looks like a thing that can be filesystem-dependent. Maybe a kind
of runtime check?

Greetings
Marcin Mańk

#6Andres Freund
andres@anarazel.de
In reply to: marcin mank (#5)
Re: [COMMITTERS] pgsql: Speed up CREATE DATABASE by deferring the fsyncs until after

Hi Marcin,

Sounds rather unlikely to me. Its likely handled at an upper layer (vfs in linux' case) and only overloaded when an optimized implementation is available.
Which os do you see implementing that only on a part of the filesystems?

A runtime check would be creating, fsyncing and deleting a directory for every directory youre fsyncing because they could be on a different fs...

Andres
--
Sent from a mobile phone - please excuse brevity and formatting.
----- Ursprüngliche Mitteilung -----

Show quoted text

On Mon, Feb 15, 2010 at 9:36 AM, Andres Freund <andres@anarazel.de> wrote:

On Monday 15 February 2010 08:13:32 Tom Lane wrote:

Actually, looking closer, some of the Windows machines started failing
after the *earlier* patch to add directory fsyncs.

And not only the windows machines. Seems sensible to add a configure check
whether directory-fsyncing works.

It looks like a thing that can be filesystem-dependent. Maybe a kind
of runtime check?

Greetings
Marcin Mańk

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Bruce Momjian
bruce@momjian.us
In reply to: Andres Freund (#6)
Re: [COMMITTERS] pgsql: Speed up CREATE DATABASE by deferring the fsyncs until after

On Mon, Feb 15, 2010 at 10:02 AM, Andres Freund <andres@anarazel.de> wrote:

Hi Marcin,

Sounds rather unlikely to me. Its likely handled at an upper layer (vfs in linux' case) and only overloaded when an optimized implementation is available.
Which os do you see implementing that only on a part of the filesystems?

A runtime check would be creating, fsyncing and deleting a directory for every directory youre fsyncing because they could be on a different fs...

We could just not check the result code of the fsync. Or print a
warning the first time and stop trying subsequently.

When do we cut the alpha? If I look at it at about 10-11pm EST is that too late?

--
greg

#8marcin mank
marcin.mank@gmail.com
In reply to: Andres Freund (#6)
Re: [COMMITTERS] pgsql: Speed up CREATE DATABASE by deferring the fsyncs until after

On Mon, Feb 15, 2010 at 11:02 AM, Andres Freund <andres@anarazel.de> wrote:

Hi Marcin,

Sounds rather unlikely to me. Its likely handled at an upper layer (vfs in linux' case) and only overloaded when an optimized implementation is available.
Which os do you see implementing that only on a part of the filesystems?

I have a Windows XP dev machine, which runs virtualbox, which runs
ubuntu, which mounts a windows directory through vboxfs

fsync does error out on directories inside that mount.

btw: 8.4.2 initdb won`t work there too, So this is not a regression.
The error is:
DEBUG: creating and filling new WAL file
LOG: could not link file "pg_xlog/xlogtemp.2367" to
"pg_xlog/000000010000000000000000" (initialization of log file 0,
segment 0): Operation not permitted
FATAL: could not open file "pg_xlog/000000010000000000000000" (log
file 0, segment 0): No such file or directory

But I would not be that sure that eg. NFS or something like that won`t complain.

Ignoring the return code seems the right choice.

Greetings
Marcin Mańk

#9Bruce Momjian
bruce@momjian.us
In reply to: marcin mank (#8)
Re: [COMMITTERS] pgsql: Speed up CREATE DATABASE by deferring the fsyncs until after

On Mon, Feb 15, 2010 at 11:34 AM, marcin mank <marcin.mank@gmail.com> wrote:

LOG:  could not link file "pg_xlog/xlogtemp.2367" to
"pg_xlog/000000010000000000000000" (initialization of log file 0,

This is not related -- it seems your filesystem doesn't support hard
links. I thought we used "junctions" on versions of Windows that
support them which I would have expected would include XP but my
knowledge of Windows is thin and obsolete.

--
greg

#10Andres Freund
andres@anarazel.de
In reply to: marcin mank (#8)
Re: [COMMITTERS] pgsql: Speed up CREATE DATABASE by deferring the fsyncs until after

On Monday 15 February 2010 12:34:44 marcin mank wrote:

On Mon, Feb 15, 2010 at 11:02 AM, Andres Freund <andres@anarazel.de> wrote:

Hi Marcin,

Sounds rather unlikely to me. Its likely handled at an upper layer (vfs
in linux' case) and only overloaded when an optimized implementation is
available. Which os do you see implementing that only on a part of the
filesystems?

I have a Windows XP dev machine, which runs virtualbox, which runs
ubuntu, which mounts a windows directory through vboxfs

btw: 8.4.2 initdb won`t work there too, So this is not a regression.
The error is:
DEBUG: creating and filling new WAL file
LOG: could not link file "pg_xlog/xlogtemp.2367" to
"pg_xlog/000000010000000000000000" (initialization of log file 0,
segment 0): Operation not permitted
FATAL: could not open file "pg_xlog/000000010000000000000000" (log
file 0, segment 0): No such file or directory

That does seem to be a different issue. Currently there are no fsyncs on
directories at all, so likely your setup is hosed anyway ;-)

But I would not be that sure that eg. NFS or something like that won`t
complain.

It does not.

Ignoring the return code seems the right choice.

And the error hiding one as well. With delayed allocation you theoretically
could error out on fsync with -ENOSPC ...

Andres

#11Andres Freund
andres@anarazel.de
In reply to: Bruce Momjian (#9)
Re: [COMMITTERS] pgsql: Speed up CREATE DATABASE by deferring the fsyncs until after

On Monday 15 February 2010 12:45:39 Greg Stark wrote:

On Mon, Feb 15, 2010 at 11:34 AM, marcin mank <marcin.mank@gmail.com> wrote:

LOG: could not link file "pg_xlog/xlogtemp.2367" to
"pg_xlog/000000010000000000000000" (initialization of log file 0,

This is not related -- it seems your filesystem doesn't support hard
links. I thought we used "junctions" on versions of Windows that
support them which I would have expected would include XP but my
knowledge of Windows is thin and obsolete.

If I understood him correctly marcin seems to mount a windows share on linux
via some vbox-proprietary pseudo filesystem. That wont get detected and thus
no junctions will be used... (I have doubts you even can create them via
vboxfs (or even smb)).
I would consider that a unsupported setup. Agreed?

Andres

#12Bruce Momjian
bruce@momjian.us
In reply to: Andres Freund (#11)
Re: [COMMITTERS] pgsql: Speed up CREATE DATABASE by deferring the fsyncs until after

On Mon, Feb 15, 2010 at 11:50 AM, Andres Freund <andres@anarazel.de> wrote:

If I understood him correctly marcin seems to mount a windows share on linux
via some vbox-proprietary pseudo filesystem. That wont get detected and thus
no junctions will be used... (I have doubts you even can create them via
vboxfs (or even smb)).
I would consider that a unsupported setup. Agreed?

I'm not sure which versions of Windows we support in general. But on
further thought I thought we only used hard links for xlog files on
systems where we knew they worked and just did a rename() on systems
without them. So I'm puzzled why we're trying to hard link on this
system. Perhaps we need to make this a run-time check instead of just
making it depend on the system.

--
greg

#13Andres Freund
andres@anarazel.de
In reply to: Bruce Momjian (#12)
Re: [COMMITTERS] pgsql: Speed up CREATE DATABASE by deferring the fsyncs until after

On Monday 15 February 2010 12:55:36 Greg Stark wrote:

On Mon, Feb 15, 2010 at 11:50 AM, Andres Freund <andres@anarazel.de> wrote:

If I understood him correctly marcin seems to mount a windows share on
linux via some vbox-proprietary pseudo filesystem. That wont get
detected and thus no junctions will be used... (I have doubts you even
can create them via vboxfs (or even smb)).
I would consider that a unsupported setup. Agreed?

I'm not sure which versions of Windows we support in general. But on
further thought I thought we only used hard links for xlog files on
systems where we knew they worked and just did a rename() on systems
without them. So I'm puzzled why we're trying to hard link on this
system. Perhaps we need to make this a run-time check instead of just
making it depend on the system.

Well, I guess linux is normally a system where hardlinking is considered safe.
And I dont really see a problem with that - for example we require ntfs on
windows as well...
In the end its only some strange filesystem whats causing the issue here...

Andres

#14marcin mank
marcin.mank@gmail.com
In reply to: Andres Freund (#10)
Re: [COMMITTERS] pgsql: Speed up CREATE DATABASE by deferring the fsyncs until after

Yes, the issue with initdb failing is unrelated (and I have no problem
about the fs being unsupported). But fsync still DOES fail on
directories from the mount.

But I would not be that sure that eg. NFS or something like that won`t
complain.

It does not.

What if someone mounts a NFS share from a system that does not support
directory fsync (per buildfarm: unixware, AIX) on Linux? I agree that
this is asking for trouble, but...

Greetings
Marcin Mańk

#15Andres Freund
andres@anarazel.de
In reply to: marcin mank (#14)
Re: [COMMITTERS] pgsql: Speed up CREATE DATABASE by deferring the fsyncs until after

On Monday 15 February 2010 14:50:03 marcin mank wrote:

Yes, the issue with initdb failing is unrelated (and I have no problem
about the fs being unsupported). But fsync still DOES fail on
directories from the mount.

But I would not be that sure that eg. NFS or something like that won`t
complain.

It does not.

What if someone mounts a NFS share from a system that does not support
directory fsync (per buildfarm: unixware, AIX) on Linux? I agree that
this is asking for trouble, but...

Then nothing. The fsync via nfs or such is a local operation. There is nothing
like a "fsync" command transported - i.e. the fsync controls the local cache
not the remote one...

Andres

#16Magnus Hagander
magnus@hagander.net
In reply to: Bruce Momjian (#9)
Re: [COMMITTERS] pgsql: Speed up CREATE DATABASE by deferring the fsyncs until after

2010/2/15 Greg Stark <stark@mit.edu>:

On Mon, Feb 15, 2010 at 11:34 AM, marcin mank <marcin.mank@gmail.com> wrote:

LOG:  could not link file "pg_xlog/xlogtemp.2367" to
"pg_xlog/000000010000000000000000" (initialization of log file 0,

This is not related -- it seems your filesystem doesn't support hard
links. I thought we used "junctions" on versions of Windows that
support them which I would have expected would include XP but my
knowledge of Windows is thin and obsolete.

Junctions are for symbolic links, and only valid for directories. NTFS
has "real" hardlinks though CreateLink(). No idea if that works on
remote filesystems though.

But AFAIK, we don't use that on Windows. But the rest of the thread
has indicated why this shows up anyway :)

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#17Andres Freund
andres@anarazel.de
In reply to: Bruce Momjian (#1)
Re: pgsql: Speed up CREATE DATABASE by deferring the fsyncs until after

On Monday 15 February 2010 01:50:57 Greg Stark wrote:

Log Message:
-----------
Speed up CREATE DATABASE by deferring the fsyncs until after copying
all the data and using posix_fadvise to nudge the OS into flushing it
earlier. This also hopefully makes CREATE DATABASE avoid spamming the
cache.

Tests show a big speedup on Linux at least on some filesystems.

Idea and patch from Andres Freund.

I just found a relatively big problem with one of your modifications on the
patch - you removed the
FreeDir(xldir);
xldir = AllocateDir(fromdir);
pair - unfortunately its crucial because otherwise the DIR does not get
rewound - that resulted in *no* files getting fsync()ed (otherwise the loop
above wouldn't have finished yet...).
I think that was also causing the problems I pointed out in " Directory fsync
and other fun"...

You removed it because you didn't want to open the directory twice? I think
doing that is simpler than using rewinddir - I have no idea how usable that
one is on windows for example

Could you add it back?

Andres

#18Bruce Momjian
bruce@momjian.us
In reply to: Andres Freund (#17)
Re: pgsql: Speed up CREATE DATABASE by deferring the fsyncs until after

On Sun, Feb 21, 2010 at 11:43 PM, Andres Freund <andres@anarazel.de> wrote:

Could you add it back?

Oops, sorry. Sigh. Done.

--
greg

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#17)
Re: pgsql: Speed up CREATE DATABASE by deferring the fsyncs until after

Andres Freund <andres@anarazel.de> writes:

I just found a relatively big problem with one of your modifications on the
patch - you removed the
FreeDir(xldir);
xldir = AllocateDir(fromdir);
pair - unfortunately its crucial because otherwise the DIR does not get
rewound - that resulted in *no* files getting fsync()ed (otherwise the loop
above wouldn't have finished yet...).
I think that was also causing the problems I pointed out in " Directory fsync
and other fun"...

Actually, that code had *multiple* problems including stat'ing the wrong
file entirely, not to mention that this last commit failed to even
compile. I also think it should scan the todir not the fromdir, just on
general principles to avoid any possibility of race conditions.

regards, tom lane

#20Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#19)
Re: pgsql: Speed up CREATE DATABASE by deferring the fsyncs until after

On Mon, Feb 22, 2010 at 2:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Actually, that code had *multiple* problems including stat'ing the wrong
file entirely, not to mention that this last commit failed to even
compile.  I also think it should scan the todir not the fromdir, just on
general principles to avoid any possibility of race conditions.

Argh. I'll be less careless in the future, I promise.

I had concluded that scanning the original directory was odd but
better because it served to double-check that all the original files
actually made it and also because if there were any unrelated files
present there was no need to fsync them. But I agree it's odd and not
very general for copydir if we decide to use it elsewhere other than
create database.

--
greg

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#20)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#21)
#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#22)
#24Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#23)
#25Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#23)
#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#25)
#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#22)
#28Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#26)
#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#28)