fallocate / posix_fallocate for new WAL file creation (etc...)
Pertinent to another thread titled
[HACKERS] corrupt pages detected by enabling checksums
I hope to explore the possibility of using fallocate (or
posix_fallocate) for new WAL file creation.
Most modern Linux filesystems support fast fallocate/posix_fallocate,
reducing extent fragmentation (where extents are used) and frequently
offering a pretty significant speed improvement. In my tests, using
posix_fallocate (followed by pg_fsync) is at least 28 times quicker
than using the current method (which writes zeroes followed by
pg_fsync).
I have written up a patch to use posix_fallocate in new WAL file
creation, including configuration by way of a GUC variable, but I've
not contributed to the PostgreSQL project before. Therefore, I'm
fairly certain the patch is not formatted properly or conforms to the
appropriate style guides. Currently, the patch is based on 9.2, and is
quite small in size - 3.6KiB.
Advice on how to proceed is appreciated.
--
Jon
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, May 13, 2013 at 08:54:39PM -0500, Jon Nelson wrote:
Pertinent to another thread titled
[HACKERS] corrupt pages detected by enabling checksums
I hope to explore the possibility of using fallocate (or
posix_fallocate) for new WAL file creation.Most modern Linux filesystems support fast fallocate/posix_fallocate,
reducing extent fragmentation (where extents are used) and frequently
offering a pretty significant speed improvement. In my tests, using
posix_fallocate (followed by pg_fsync) is at least 28 times quicker
than using the current method (which writes zeroes followed by
pg_fsync).I have written up a patch to use posix_fallocate in new WAL file
creation, including configuration by way of a GUC variable, but I've
not contributed to the PostgreSQL project before. Therefore, I'm
fairly certain the patch is not formatted properly or conforms to the
appropriate style guides. Currently, the patch is based on 9.2, and is
quite small in size - 3.6KiB.Advice on how to proceed is appreciated.
Thanks for hopping in!
Please re-base the patch vs. git master, as new features like this go
there. Please also to send along the tests you're doing so others can
riff. Tests that find any weak points are also good.
Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, May 13, 2013 at 9:54 PM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote:
Pertinent to another thread titled
[HACKERS] corrupt pages detected by enabling checksums
I hope to explore the possibility of using fallocate (or
posix_fallocate) for new WAL file creation.Most modern Linux filesystems support fast fallocate/posix_fallocate,
reducing extent fragmentation (where extents are used) and frequently
offering a pretty significant speed improvement. In my tests, using
posix_fallocate (followed by pg_fsync) is at least 28 times quicker
than using the current method (which writes zeroes followed by
pg_fsync).I have written up a patch to use posix_fallocate in new WAL file
creation, including configuration by way of a GUC variable, but I've
not contributed to the PostgreSQL project before. Therefore, I'm
fairly certain the patch is not formatted properly or conforms to the
appropriate style guides. Currently, the patch is based on 9.2, and is
quite small in size - 3.6KiB.Advice on how to proceed is appreciated.
Make sure to list it here:
https://commitfest.postgresql.org/action/commitfest_view/open
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, May 14, 2013 at 9:43 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, May 13, 2013 at 9:54 PM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote:
Pertinent to another thread titled
[HACKERS] corrupt pages detected by enabling checksums
I hope to explore the possibility of using fallocate (or
posix_fallocate) for new WAL file creation.Most modern Linux filesystems support fast fallocate/posix_fallocate,
reducing extent fragmentation (where extents are used) and frequently
offering a pretty significant speed improvement. In my tests, using
posix_fallocate (followed by pg_fsync) is at least 28 times quicker
than using the current method (which writes zeroes followed by
pg_fsync).I have written up a patch to use posix_fallocate in new WAL file
creation, including configuration by way of a GUC variable, but I've
not contributed to the PostgreSQL project before. Therefore, I'm
fairly certain the patch is not formatted properly or conforms to the
appropriate style guides. Currently, the patch is based on 9.2, and is
quite small in size - 3.6KiB.
I have re-based and reformatted the code, and basic testing shows a
reduction in WAL-file creation time of a fairly significant amount.
I ran 'make test' and did additional local testing without issue.
Therefore, I am attaching the patch. I will try to add it to the
commitfest page.
--
Jon
Attachments:
0001-enhance-GUC-and-xlog-with-wal_use_fallocate-boolean-.patchapplication/octet-stream; name=0001-enhance-GUC-and-xlog-with-wal_use_fallocate-boolean-.patchDownload+63-36
Hi,
On 2013-05-15 16:26:15 -0500, Jon Nelson wrote:
I have written up a patch to use posix_fallocate in new WAL file
creation, including configuration by way of a GUC variable, but I've
not contributed to the PostgreSQL project before. Therefore, I'm
fairly certain the patch is not formatted properly or conforms to the
appropriate style guides. Currently, the patch is based on 9.2, and is
quite small in size - 3.6KiB.I have re-based and reformatted the code, and basic testing shows a
reduction in WAL-file creation time of a fairly significant amount.
I ran 'make test' and did additional local testing without issue.
Therefore, I am attaching the patch. I will try to add it to the
commitfest page.
Some where quick comments, without thinking about this:
* needs a configure check for posix_fallocate. The current version will
e.g. fail to compile on windows or many other non linux systems. Check
how its done for posix_fadvise.
* Is wal file creation performance actually relevant? Is the performance
of a system running on fallocate()d wal files any different?
* According to the man page posix_fallocate doesn't set errno but rather
returns the error code.
* I wonder whether we ever want to actually disable this? Afair the libc
contains emulation for posix_fadvise if the filesystem doesn't support
it.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, May 15, 2013 at 4:34 PM, Andres Freund <andres@2ndquadrant.com> wrote:
Hi,
On 2013-05-15 16:26:15 -0500, Jon Nelson wrote:
I have written up a patch to use posix_fallocate in new WAL file
creation, including configuration by way of a GUC variable, but I've
not contributed to the PostgreSQL project before. Therefore, I'm
fairly certain the patch is not formatted properly or conforms to the
appropriate style guides. Currently, the patch is based on 9.2, and is
quite small in size - 3.6KiB.I have re-based and reformatted the code, and basic testing shows a
reduction in WAL-file creation time of a fairly significant amount.
I ran 'make test' and did additional local testing without issue.
Therefore, I am attaching the patch. I will try to add it to the
commitfest page.Some where quick comments, without thinking about this:
Thank you for the kind feedback.
* needs a configure check for posix_fallocate. The current version will
e.g. fail to compile on windows or many other non linux systems. Check
how its done for posix_fadvise.
I will address as soon as I am able.
* Is wal file creation performance actually relevant? Is the performance
of a system running on fallocate()d wal files any different?
In my limited testing, I noticed a drop of approx. 100ms per WAL file.
I do not have a good idea for how to really stress the WAL-file
creation area without calling pg_start_backup and pg_stop_backup over
and over (with archiving enabled).
However, a file allocated with fallocate is (supposed to be) less
fragmented than one created by the traditional means.
* According to the man page posix_fallocate doesn't set errno but rather
returns the error code.
That's true. I originally wrote the patch using fallocate(2). What
would be appropriate here? Should I switch on the return value and the
six (6) or so relevant error codes?
* I wonder whether we ever want to actually disable this? Afair the libc
contains emulation for posix_fadvise if the filesystem doesn't support
it.
I know that glibc does, but I don't know about other libc implementations.
--
Jon
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, May 15, 2013 at 4:46 PM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote:
On Wed, May 15, 2013 at 4:34 PM, Andres Freund <andres@2ndquadrant.com> wrote:
..
Some where quick comments, without thinking about this:
Thank you for the kind feedback.
* needs a configure check for posix_fallocate. The current version will
e.g. fail to compile on windows or many other non linux systems. Check
how its done for posix_fadvise.
The following patch includes the changes to configure.in.
I had to make other changes (not included here) because my local
system uses autoconf 2.69, but I did test this successfully.
That's true. I originally wrote the patch using fallocate(2). What
would be appropriate here? Should I switch on the return value and the
six (6) or so relevant error codes?
I addressed this, hopefully in a reasonable way.
--
Jon
Attachments:
fallocate.patch-v2application/octet-stream; name=fallocate.patch-v2Download+101-35
Jon Nelson escribió:
On Wed, May 15, 2013 at 4:46 PM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote:
That's true. I originally wrote the patch using fallocate(2). What
would be appropriate here? Should I switch on the return value and the
six (6) or so relevant error codes?I addressed this, hopefully in a reasonable way.
Would it work to just assign the value you got from posix_fallocate (if
nonzero) to errno and then use %m in the errmsg() call in ereport()?
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, May 15, 2013 at 10:17 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
Jon Nelson escribió:
On Wed, May 15, 2013 at 4:46 PM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote:
That's true. I originally wrote the patch using fallocate(2). What
would be appropriate here? Should I switch on the return value and the
six (6) or so relevant error codes?I addressed this, hopefully in a reasonable way.
Would it work to just assign the value you got from posix_fallocate (if
nonzero) to errno and then use %m in the errmsg() call in ereport()?
That strikes me as a better way. I'll work something up soon.
Thanks!
--
Jon
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, May 15, 2013 at 10:36 PM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote:
On Wed, May 15, 2013 at 10:17 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:Jon Nelson escribió:
On Wed, May 15, 2013 at 4:46 PM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote:
That's true. I originally wrote the patch using fallocate(2). What
would be appropriate here? Should I switch on the return value and the
six (6) or so relevant error codes?I addressed this, hopefully in a reasonable way.
Would it work to just assign the value you got from posix_fallocate (if
nonzero) to errno and then use %m in the errmsg() call in ereport()?That strikes me as a better way. I'll work something up soon.
Thanks!
Please find attached version 3.
Am I doing this the right way? Should I be posting the full patch each
time, or incremental patches?
--
Jon
Attachments:
fallocate-v3.patchapplication/octet-stream; name=fallocate-v3.patchDownload+81-35
Jon Nelson escribió:
Am I doing this the right way? Should I be posting the full patch each
time, or incremental patches?
Full patch each time is okay. Context-format patch is even better.
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 5/16/13 9:16 AM, Jon Nelson wrote:
Am I doing this the right way? Should I be posting the full patch each
time, or incremental patches?
There are guidelines for getting your patch in the right format at
https://wiki.postgresql.org/wiki/Working_with_Git#Context_diffs_with_Git
that would improve this one. You have some formatting issues with tab
spacing at lines 120 through 133 in your v3 patch. And it looks like
there was a formatting change on line 146 that is making the diff larger
than it needs to be.
The biggest thing missing from this submission is information about what
performance testing you did. Ideally performance patches are submitted
with enough information for a reviewer to duplicate the same test the
author did, as well as hard before/after performance numbers from your
test system. It often turns tricky to duplicate a performance gain, and
being able to run the same test used for initial development eliminates
a lot of the problems.
Second bit of nitpicking. There are already some GUC values that appear
or disappear based on compile time options. They're all debugging
related things though. I would prefer not to see this one go away when
it's implementation isn't available. That's going to break any scripts
that SHOW the setting to see if it's turned on or not as a first
problem. I think the right model to follow here is the IFDEF setup used
for effective_io_concurrency. I wouldn't worry about this too much
though. Having a wal_use_fallocate GUC is good for testing. But if it
works out well, when it's ready for commit I don't see why anyone would
want it turned off on platforms where it works. There are already too
many performance tweaking GUCs. Something has to be very likely to be
changed from the default before its worth adding one for it.
--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-05-15 16:46:33 -0500, Jon Nelson wrote:
* Is wal file creation performance actually relevant? Is the performance
of a system running on fallocate()d wal files any different?In my limited testing, I noticed a drop of approx. 100ms per WAL file.
I do not have a good idea for how to really stress the WAL-file
creation area without calling pg_start_backup and pg_stop_backup over
and over (with archiving enabled).
My point is that wal file creation usually isn't all that performance
sensitive. Once the cluster has enough WAL files it will usually recycle
them and thus never allocate new ones. So for this to be really
beneficial it would be interesting to show different performance during
normal running. You could also check out of how many extents a wal file
is made out of with fallocate in comparison to the old style method
(filefrag will give you that for most filesystems).
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, May 17, 2013 at 4:47 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-05-15 16:46:33 -0500, Jon Nelson wrote:
* Is wal file creation performance actually relevant? Is the performance
of a system running on fallocate()d wal files any different?In my limited testing, I noticed a drop of approx. 100ms per WAL file.
I do not have a good idea for how to really stress the WAL-file
creation area without calling pg_start_backup and pg_stop_backup over
and over (with archiving enabled).My point is that wal file creation usually isn't all that performance
sensitive. Once the cluster has enough WAL files it will usually recycle
them and thus never allocate new ones. So for this to be really
beneficial it would be interesting to show different performance during
normal running. You could also check out of how many extents a wal file
is made out of with fallocate in comparison to the old style method
(filefrag will give you that for most filesystems).
But why does it have to be *really* beneficial? We're already making
optional posix_fxxx calls and fallocate seems to do exactly what we
would want in this context. Even if the 100ms drop doesn't show up
all that often, I'd still take it just for the defragmentation
benefits and the patch is fairly tiny.
merlin
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, May 17, 2013 at 8:29 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Fri, May 17, 2013 at 4:47 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-05-15 16:46:33 -0500, Jon Nelson wrote:
* Is wal file creation performance actually relevant? Is the performance
of a system running on fallocate()d wal files any different?In my limited testing, I noticed a drop of approx. 100ms per WAL file.
I do not have a good idea for how to really stress the WAL-file
creation area without calling pg_start_backup and pg_stop_backup over
and over (with archiving enabled).My point is that wal file creation usually isn't all that performance
sensitive. Once the cluster has enough WAL files it will usually recycle
them and thus never allocate new ones. So for this to be really
beneficial it would be interesting to show different performance during
normal running. You could also check out of how many extents a wal file
is made out of with fallocate in comparison to the old style method
(filefrag will give you that for most filesystems).But why does it have to be *really* beneficial? We're already making
optional posix_fxxx calls and fallocate seems to do exactly what we
would want in this context. Even if the 100ms drop doesn't show up
all that often, I'd still take it just for the defragmentation
benefits and the patch is fairly tiny.
Here is sample output of filefrag on a somewhat busy database from our
testing environment that exactly duplicates our production workloads..
It does a lot of batch processing at night and a mix of 80%oltp 20%
olap during the day. This is on ext3. Interestingly, on ext4 servers
I never saw more than 2 extents per file (but those servers are mostly
not as busy).
[root@rpisatysw001 pg_xlog]# filefrag *
00000001000006D200000064: 490 extents found, perfection would be 1 extent
00000001000006D200000065: 33 extents found, perfection would be 1 extent
00000001000006D200000066: 43 extents found, perfection would be 1 extent
00000001000006D200000067: 71 extents found, perfection would be 1 extent
00000001000006D200000068: 43 extents found, perfection would be 1 extent
00000001000006D200000069: 156 extents found, perfection would be 1 extent
00000001000006D20000006A: 52 extents found, perfection would be 1 extent
00000001000006D20000006B: 108 extents found, perfection would be 1 extent
merlin
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-05-17 15:48:38 -0500, Merlin Moncure wrote:
On Fri, May 17, 2013 at 8:29 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Fri, May 17, 2013 at 4:47 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-05-15 16:46:33 -0500, Jon Nelson wrote:
* Is wal file creation performance actually relevant? Is the performance
of a system running on fallocate()d wal files any different?In my limited testing, I noticed a drop of approx. 100ms per WAL file.
I do not have a good idea for how to really stress the WAL-file
creation area without calling pg_start_backup and pg_stop_backup over
and over (with archiving enabled).My point is that wal file creation usually isn't all that performance
sensitive. Once the cluster has enough WAL files it will usually recycle
them and thus never allocate new ones. So for this to be really
beneficial it would be interesting to show different performance during
normal running. You could also check out of how many extents a wal file
is made out of with fallocate in comparison to the old style method
(filefrag will give you that for most filesystems).But why does it have to be *really* beneficial? We're already making
optional posix_fxxx calls and fallocate seems to do exactly what we
would want in this context. Even if the 100ms drop doesn't show up
all that often, I'd still take it just for the defragmentation
benefits and the patch is fairly tiny.
Well, it needs to be tested et al. And its a fairly critical code
path. I seem to remember that there were older glibc versions that
didn't do such a great job at emulating fallocate for example.
Here is sample output of filefrag on a somewhat busy database from our
testing environment that exactly duplicates our production workloads..
It does a lot of batch processing at night and a mix of 80%oltp 20%
olap during the day. This is on ext3. Interestingly, on ext4 servers
I never saw more than 2 extents per file (but those servers are mostly
not as busy).
Ok, that's pretty bad. 490 extents in one file? Really? I'd consider
shutting down the cluster, copying the wal files in a moment where there
is enough free space. Just don't forget to sync afterwards.
EXT4 is notably better at allocating space in growing files than ext3
due to delayed allocation (and other things), so it wouldn't surprise me
similar differences in fragmentation even if the load were comparable.
Ext3 doesn't have fallocate btw, so it wouldn't benefit from such a
patch anyway.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, May 17, 2013 at 4:18 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-05-17 15:48:38 -0500, Merlin Moncure wrote:
On Fri, May 17, 2013 at 8:29 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Fri, May 17, 2013 at 4:47 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-05-15 16:46:33 -0500, Jon Nelson wrote:
* Is wal file creation performance actually relevant? Is the performance
of a system running on fallocate()d wal files any different?In my limited testing, I noticed a drop of approx. 100ms per WAL file.
I do not have a good idea for how to really stress the WAL-file
creation area without calling pg_start_backup and pg_stop_backup over
and over (with archiving enabled).My point is that wal file creation usually isn't all that performance
sensitive. Once the cluster has enough WAL files it will usually recycle
them and thus never allocate new ones. So for this to be really
beneficial it would be interesting to show different performance during
normal running. You could also check out of how many extents a wal file
is made out of with fallocate in comparison to the old style method
(filefrag will give you that for most filesystems).But why does it have to be *really* beneficial? We're already making
optional posix_fxxx calls and fallocate seems to do exactly what we
would want in this context. Even if the 100ms drop doesn't show up
all that often, I'd still take it just for the defragmentation
benefits and the patch is fairly tiny.Well, it needs to be tested et al. And its a fairly critical code
path. I seem to remember that there were older glibc versions that
didn't do such a great job at emulating fallocate for example.Here is sample output of filefrag on a somewhat busy database from our
testing environment that exactly duplicates our production workloads..
It does a lot of batch processing at night and a mix of 80%oltp 20%
olap during the day. This is on ext3. Interestingly, on ext4 servers
I never saw more than 2 extents per file (but those servers are mostly
not as busy).Ok, that's pretty bad. 490 extents in one file? Really? I'd consider
shutting down the cluster, copying the wal files in a moment where there
is enough free space. Just don't forget to sync afterwards.
EXT4 is notably better at allocating space in growing files than ext3
due to delayed allocation (and other things), so it wouldn't surprise me
similar differences in fragmentation even if the load were comparable.Ext3 doesn't have fallocate btw, so it wouldn't benefit from such a
patch anyway.
yeah -- I see your point. The object lesson isn't so much 'improve
postgres' as it is to 'use a modern filesystem'.
merlin
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, May 16, 2013 at 7:05 PM, Greg Smith <greg@2ndquadrant.com> wrote:
On 5/16/13 9:16 AM, Jon Nelson wrote:
Am I doing this the right way? Should I be posting the full patch each
time, or incremental patches?There are guidelines for getting your patch in the right format at
https://wiki.postgresql.org/wiki/Working_with_Git#Context_diffs_with_Git
that would improve this one. You have some formatting issues with tab
spacing at lines 120 through 133 in your v3 patch. And it looks like there
was a formatting change on line 146 that is making the diff larger than it
needs to be.
I've corrected the formatting change (end-of-line whitespace was
stripped) on line 146.
The other whitespace changes are - I think - due to newly-indented
code due to a new code block.
Included please find a v4 patch which uses context diffs per the above url.
The biggest thing missing from this submission is information about what
performance testing you did. Ideally performance patches are submitted with
enough information for a reviewer to duplicate the same test the author did,
as well as hard before/after performance numbers from your test system. It
often turns tricky to duplicate a performance gain, and being able to run
the same test used for initial development eliminates a lot of the problems.
This has been a bit of a struggle. While it's true that WAL file
creation doesn't happen with great frequency, and while it's also true
that - with strace and other tests - it can be proven that
fallocate(16MB) is much quicker than writing it zeroes by hand,
proving that in the larger context of a running install has been
challenging.
Attached you'll find a small test script (t.sh) which creates a new
cluster in 'foo', changes some config values, starts the cluster, and
then times how long it takes pgbench to prepare a database. I've used
"wal_level = hot_standby" in the hopes that this generates the largest
number of WAL files (and I set the number of such files to 1024). The
hardware is an AMD 9150e with a 2-disk software RAID1 (SATA disks) on
kernel 3.9.2 and ext4 (x86_64, openSUSE 12.3). The test results are
not that surprising. The longer the test (the larger the scale factor)
the less of a difference using posix_fallocate makes. With a scale
factor of 100, I see an average of 10-11% reduction in the time taken
to initialize the database. With 300, it's about 5.5% and with 900,
it's between 0 and 1.2%. I will be doing more testing but this is what
I started with. I'm very open to suggestions.
Second bit of nitpicking. There are already some GUC values that appear or
disappear based on compile time options. They're all debugging related
things though. I would prefer not to see this one go away when it's
implementation isn't available. That's going to break any scripts that SHOW
the setting to see if it's turned on or not as a first problem. I think the
right model to follow here is the IFDEF setup used for
effective_io_concurrency. I wouldn't worry about this too much though.
Having a wal_use_fallocate GUC is good for testing. But if it works out
well, when it's ready for commit I don't see why anyone would want it turned
off on platforms where it works. There are already too many performance
tweaking GUCs. Something has to be very likely to be changed from the
default before its worth adding one for it.
Ack. I've revised the patch to always have the GUC (for now), default
to false, and if configure can't find posix_fallocate (or the user
disables it by way of pg_config_manual.h) then it remains a GUC that
simply can't be changed.
I'll also be re-running the tests.
--
Jon
On Sat, May 25, 2013 at 2:55 PM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote:
The biggest thing missing from this submission is information about what
performance testing you did. Ideally performance patches are submitted with
enough information for a reviewer to duplicate the same test the author did,
as well as hard before/after performance numbers from your test system. It
often turns tricky to duplicate a performance gain, and being able to run
the same test used for initial development eliminates a lot of the problems.This has been a bit of a struggle. While it's true that WAL file
creation doesn't happen with great frequency, and while it's also true
that - with strace and other tests - it can be proven that
fallocate(16MB) is much quicker than writing it zeroes by hand,
proving that in the larger context of a running install has been
challenging.
It's nice to be able to test things in the context of a running
install, but sometimes a microbenchmark is just as good. I mean, if
posix_fallocate() is faster, then it's just faster, right? It's
likely to be pretty hard to get reproducible numbers for how much this
actually helps in the real world because write tests are inherently
pretty variable depending on a lot of factors we don't control, so
even if Jon has got the best possible test, the numbers may bounce
around so much that you can't really measure the (probably small) gain
from this approach. But that doesn't seem like a reason not to adopt
the approach and take whatever gain there is. At least, not that I
can see.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-05-28 10:03:58 -0400, Robert Haas wrote:
On Sat, May 25, 2013 at 2:55 PM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote:
The biggest thing missing from this submission is information about what
performance testing you did. Ideally performance patches are submitted with
enough information for a reviewer to duplicate the same test the author did,
as well as hard before/after performance numbers from your test system. It
often turns tricky to duplicate a performance gain, and being able to run
the same test used for initial development eliminates a lot of the problems.This has been a bit of a struggle. While it's true that WAL file
creation doesn't happen with great frequency, and while it's also true
that - with strace and other tests - it can be proven that
fallocate(16MB) is much quicker than writing it zeroes by hand,
proving that in the larger context of a running install has been
challenging.It's nice to be able to test things in the context of a running
install, but sometimes a microbenchmark is just as good. I mean, if
posix_fallocate() is faster, then it's just faster, right?
Well, it's a bit more complex than that. Fallocate doesn't actually
initializes the disk space in most filesystems, just marks it as
allocated and zeroed which is one of the reasons it can be noticeably
faster. But that can make the runtime overhead of writing to those pages
higher.
I wonder whether noticeably upping checkpoint segments and then
a) COPY in a large table
b) a pgbench on a previously initialized table.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers