pg_test_fsync performance

Started by Bruce Momjianabout 14 years ago12 messageshackers
Jump to latest
#1Bruce Momjian
bruce@momjian.us

I have heard complaints that /contrib/pg_test_fsync is too slow. I
thought it was impossible to speed up pg_test_fsync without reducing its
accuracy.

However, now that I some consumer-grade SATA 2 drives, I noticed that
the slowness is really in the open_sync test:

Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB
in different write open_sync sizes.)
1 * 16kB open_sync write 76.421 ops/sec
2 * 8kB open_sync writes 38.689 ops/sec
4 * 4kB open_sync writes 19.140 ops/sec
8 * 2kB open_sync writes 4.938 ops/sec
16 * 1kB open_sync writes 2.480 ops/sec

These last few lines can take very long, so I developed the attached
patch that scales down the number of tests. This makes it more
reasonable to run pg_test_fsync.

I would like to apply this for PG 9.2.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Attachments:

test_fsync.difftext/x-diff; charset=us-asciiDownload+15-10
#2Robert Haas
robertmhaas@gmail.com
In reply to: Bruce Momjian (#1)
Re: pg_test_fsync performance

On Mon, Feb 13, 2012 at 7:42 PM, Bruce Momjian <bruce@momjian.us> wrote:

I have heard complaints that /contrib/pg_test_fsync is too slow.  I
thought it was impossible to speed up pg_test_fsync without reducing its
accuracy.

However, now that I some consumer-grade SATA 2 drives, I noticed that
the slowness is really in the open_sync test:

       Compare open_sync with different write sizes:
       (This is designed to compare the cost of writing 16kB
       in different write open_sync sizes.)
                1 * 16kB open_sync write          76.421 ops/sec
                2 *  8kB open_sync writes         38.689 ops/sec
                4 *  4kB open_sync writes         19.140 ops/sec
                8 *  2kB open_sync writes          4.938 ops/sec
               16 *  1kB open_sync writes          2.480 ops/sec

These last few lines can take very long, so I developed the attached
patch that scales down the number of tests.  This makes it more
reasonable to run pg_test_fsync.

I would like to apply this for PG 9.2.

On my MacOS X, it's fsync_writethrough that's insanely slow:

[rhaas pg_test_fsync]$ ./pg_test_fsync
2000 operations per test
Direct I/O is not supported on this platform.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
open_datasync 3523.267 ops/sec
fdatasync 3360.023 ops/sec
fsync 2410.048 ops/sec
fsync_writethrough 12.576 ops/sec
open_sync 3649.475 ops/sec

Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
open_datasync 1885.284 ops/sec
fdatasync 2544.652 ops/sec
fsync 3241.218 ops/sec
fsync_writethrough ^C

Instead of or in addition to a fixed number operations per test, maybe
we should cut off each test after a certain amount of wall-clock time,
like 15 seconds. It's kind of insane to run one of these tests for 3
minutes.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#2)
Re: pg_test_fsync performance

Robert Haas <robertmhaas@gmail.com> writes:

Instead of or in addition to a fixed number operations per test, maybe
we should cut off each test after a certain amount of wall-clock time,
like 15 seconds.

+1, I was about to suggest the same thing. Running any of these tests
for a fixed number of iterations will result in drastic degradation of
accuracy as soon as the machine's behavior changes noticeably from what
you were expecting. Run them for a fixed time period instead. Or maybe
do a few, then check elapsed time and estimate a number of iterations to
use, if you're worried about the cost of doing gettimeofday after each
write.

regards, tom lane

#4Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#3)
Re: pg_test_fsync performance

On Mon, Feb 13, 2012 at 08:28:03PM -0500, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Instead of or in addition to a fixed number operations per test, maybe
we should cut off each test after a certain amount of wall-clock time,
like 15 seconds.

+1, I was about to suggest the same thing. Running any of these tests
for a fixed number of iterations will result in drastic degradation of
accuracy as soon as the machine's behavior changes noticeably from what
you were expecting. Run them for a fixed time period instead. Or maybe
do a few, then check elapsed time and estimate a number of iterations to
use, if you're worried about the cost of doing gettimeofday after each
write.

Good idea, and it worked out very well. I changed the -o loops
parameter to -s seconds which calls alarm() after (default) 2 seconds,
and then once the operation completes, computes a duration per
operation.

The test now runs in 30 seconds and produces similar output to the
longer version.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Attachments:

test_fsync.difftext/x-diff; charset=us-asciiDownload+97-80
#5Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#4)
Re: pg_test_fsync performance

On Mon, Feb 13, 2012 at 09:54:06PM -0500, Bruce Momjian wrote:

On Mon, Feb 13, 2012 at 08:28:03PM -0500, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Instead of or in addition to a fixed number operations per test, maybe
we should cut off each test after a certain amount of wall-clock time,
like 15 seconds.

+1, I was about to suggest the same thing. Running any of these tests
for a fixed number of iterations will result in drastic degradation of
accuracy as soon as the machine's behavior changes noticeably from what
you were expecting. Run them for a fixed time period instead. Or maybe
do a few, then check elapsed time and estimate a number of iterations to
use, if you're worried about the cost of doing gettimeofday after each
write.

Good idea, and it worked out very well. I changed the -o loops
parameter to -s seconds which calls alarm() after (default) 2 seconds,
and then once the operation completes, computes a duration per
operation.

Update patch applied, with additional fix for usage message, and use of
macros for start/stop testing.

I like this method much better because not only does it speed up the
test, but it also allows the write test, which completes very quickly,
to run longer and report more accurate numbers.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#4)
Re: pg_test_fsync performance

Bruce Momjian <bruce@momjian.us> writes:

On Mon, Feb 13, 2012 at 08:28:03PM -0500, Tom Lane wrote:

+1, I was about to suggest the same thing. Running any of these tests
for a fixed number of iterations will result in drastic degradation of
accuracy as soon as the machine's behavior changes noticeably from what
you were expecting. Run them for a fixed time period instead. Or maybe
do a few, then check elapsed time and estimate a number of iterations to
use, if you're worried about the cost of doing gettimeofday after each
write.

Good idea, and it worked out very well. I changed the -o loops
parameter to -s seconds which calls alarm() after (default) 2 seconds,
and then once the operation completes, computes a duration per
operation.

I was kind of wondering how portable alarm() is, and the answer
according to the buildfarm is that it isn't.

regards, tom lane

#7Marko Kreen
markokr@gmail.com
In reply to: Tom Lane (#6)
Re: pg_test_fsync performance

On Tue, Feb 14, 2012 at 05:59:06PM -0500, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

On Mon, Feb 13, 2012 at 08:28:03PM -0500, Tom Lane wrote:

+1, I was about to suggest the same thing. Running any of these tests
for a fixed number of iterations will result in drastic degradation of
accuracy as soon as the machine's behavior changes noticeably from what
you were expecting. Run them for a fixed time period instead. Or maybe
do a few, then check elapsed time and estimate a number of iterations to
use, if you're worried about the cost of doing gettimeofday after each
write.

Good idea, and it worked out very well. I changed the -o loops
parameter to -s seconds which calls alarm() after (default) 2 seconds,
and then once the operation completes, computes a duration per
operation.

I was kind of wondering how portable alarm() is, and the answer
according to the buildfarm is that it isn't.

I'm using following simplistic alarm() implementation for win32:

https://github.com/markokr/libusual/blob/master/usual/signal.c#L21

this works with fake sigaction()/SIGALARM hack below - to remember
function to call.

Good enough for simple stats printing, and avoids win32-specific
code spreading around.

--
marko

#8Bruce Momjian
bruce@momjian.us
In reply to: Marko Kreen (#7)
Re: pg_test_fsync performance

On Wed, Feb 15, 2012 at 01:35:05AM +0200, Marko Kreen wrote:

On Tue, Feb 14, 2012 at 05:59:06PM -0500, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

On Mon, Feb 13, 2012 at 08:28:03PM -0500, Tom Lane wrote:

+1, I was about to suggest the same thing. Running any of these tests
for a fixed number of iterations will result in drastic degradation of
accuracy as soon as the machine's behavior changes noticeably from what
you were expecting. Run them for a fixed time period instead. Or maybe
do a few, then check elapsed time and estimate a number of iterations to
use, if you're worried about the cost of doing gettimeofday after each
write.

Good idea, and it worked out very well. I changed the -o loops
parameter to -s seconds which calls alarm() after (default) 2 seconds,
and then once the operation completes, computes a duration per
operation.

I was kind of wondering how portable alarm() is, and the answer
according to the buildfarm is that it isn't.

I'm using following simplistic alarm() implementation for win32:

https://github.com/markokr/libusual/blob/master/usual/signal.c#L21

this works with fake sigaction()/SIGALARM hack below - to remember
function to call.

Good enough for simple stats printing, and avoids win32-specific
code spreading around.

Wow, I wasn't even aware this compiled in Win32; I thought it was
ifdef'ed out. Anyway, I am looking at SetTimer as a way of making this
work. (Me wonders if the GoGrid Windows images have compilers.)

I see backend/port/win32/timer.c so I might go with a simple "create a
thread, sleep(2), set flag, exit" solution.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#9Magnus Hagander
magnus@hagander.net
In reply to: Bruce Momjian (#8)
Re: pg_test_fsync performance

On Wed, Feb 15, 2012 at 02:23, Bruce Momjian <bruce@momjian.us> wrote:

On Wed, Feb 15, 2012 at 01:35:05AM +0200, Marko Kreen wrote:

On Tue, Feb 14, 2012 at 05:59:06PM -0500, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

On Mon, Feb 13, 2012 at 08:28:03PM -0500, Tom Lane wrote:

+1, I was about to suggest the same thing.  Running any of these tests
for a fixed number of iterations will result in drastic degradation of
accuracy as soon as the machine's behavior changes noticeably from what
you were expecting.  Run them for a fixed time period instead.  Or maybe
do a few, then check elapsed time and estimate a number of iterations to
use, if you're worried about the cost of doing gettimeofday after each
write.

Good idea, and it worked out very well.  I changed the -o loops
parameter to -s seconds which calls alarm() after (default) 2 seconds,
and then once the operation completes, computes a duration per
operation.

I was kind of wondering how portable alarm() is, and the answer
according to the buildfarm is that it isn't.

I'm using following simplistic alarm() implementation for win32:

  https://github.com/markokr/libusual/blob/master/usual/signal.c#L21

this works with fake sigaction()/SIGALARM hack below - to remember
function to call.

Good enough for simple stats printing, and avoids win32-specific
code spreading around.

Wow, I wasn't even aware this compiled in Win32;  I thought it was
ifdef'ed out.  Anyway, I am looking at SetTimer as a way of making this
work.  (Me wonders if the GoGrid Windows images have compilers.)

They don't, since most of the compilers people would ask for don't
allow that kind of redistribution.

Ping me on im if you need one preconfigured, though...

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#10Bruce Momjian
bruce@momjian.us
In reply to: Magnus Hagander (#9)
Re: pg_test_fsync performance

On Wed, Feb 15, 2012 at 09:54:04AM +0100, Magnus Hagander wrote:

On Wed, Feb 15, 2012 at 02:23, Bruce Momjian <bruce@momjian.us> wrote:

On Wed, Feb 15, 2012 at 01:35:05AM +0200, Marko Kreen wrote:

On Tue, Feb 14, 2012 at 05:59:06PM -0500, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

On Mon, Feb 13, 2012 at 08:28:03PM -0500, Tom Lane wrote:

+1, I was about to suggest the same thing. �Running any of these tests
for a fixed number of iterations will result in drastic degradation of
accuracy as soon as the machine's behavior changes noticeably from what
you were expecting. �Run them for a fixed time period instead. �Or maybe
do a few, then check elapsed time and estimate a number of iterations to
use, if you're worried about the cost of doing gettimeofday after each
write.

Good idea, and it worked out very well. �I changed the -o loops
parameter to -s seconds which calls alarm() after (default) 2 seconds,
and then once the operation completes, computes a duration per
operation.

I was kind of wondering how portable alarm() is, and the answer
according to the buildfarm is that it isn't.

I'm using following simplistic alarm() implementation for win32:

� https://github.com/markokr/libusual/blob/master/usual/signal.c#L21

this works with fake sigaction()/SIGALARM hack below - to remember
function to call.

Good enough for simple stats printing, and avoids win32-specific
code spreading around.

Wow, I wasn't even aware this compiled in Win32; �I thought it was
ifdef'ed out. �Anyway, I am looking at SetTimer as a way of making this
work. �(Me wonders if the GoGrid Windows images have compilers.)

They don't, since most of the compilers people would ask for don't
allow that kind of redistribution.

Shame.

Ping me on im if you need one preconfigured, though...

How do you do that? Also, once you create a Windows VM on a public
cloud, how do you connect to it? SSH?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#11Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#8)
Re: pg_test_fsync performance

On Tue, Feb 14, 2012 at 08:23:10PM -0500, Bruce Momjian wrote:

On Wed, Feb 15, 2012 at 01:35:05AM +0200, Marko Kreen wrote:

On Tue, Feb 14, 2012 at 05:59:06PM -0500, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

On Mon, Feb 13, 2012 at 08:28:03PM -0500, Tom Lane wrote:

+1, I was about to suggest the same thing. Running any of these tests
for a fixed number of iterations will result in drastic degradation of
accuracy as soon as the machine's behavior changes noticeably from what
you were expecting. Run them for a fixed time period instead. Or maybe
do a few, then check elapsed time and estimate a number of iterations to
use, if you're worried about the cost of doing gettimeofday after each
write.

Good idea, and it worked out very well. I changed the -o loops
parameter to -s seconds which calls alarm() after (default) 2 seconds,
and then once the operation completes, computes a duration per
operation.

I was kind of wondering how portable alarm() is, and the answer
according to the buildfarm is that it isn't.

I'm using following simplistic alarm() implementation for win32:

https://github.com/markokr/libusual/blob/master/usual/signal.c#L21

this works with fake sigaction()/SIGALARM hack below - to remember
function to call.

Good enough for simple stats printing, and avoids win32-specific
code spreading around.

Wow, I wasn't even aware this compiled in Win32; I thought it was
ifdef'ed out. Anyway, I am looking at SetTimer as a way of making this
work. (Me wonders if the GoGrid Windows images have compilers.)

I see backend/port/win32/timer.c so I might go with a simple "create a
thread, sleep(2), set flag, exit" solution.

Yeah, two Windows buildfarm machines have now successfully compiled my
patches, so I guess I fixed it; patch attached.

The fix was surprisingly easy given the use of threads; scheduling the
timeout in the operating system was just too invasive.

I would like to eventually know if this fix actually produces the right
output. How would I test that? Are the buildfarm output binaries
available somewhere? Should I add this as a 9.2 TODO item?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Attachments:

test_fsync.difftext/x-diff; charset=us-asciiDownload+32-0
#12Magnus Hagander
magnus@hagander.net
In reply to: Bruce Momjian (#10)
Re: pg_test_fsync performance

On Wed, Feb 15, 2012 at 16:14, Bruce Momjian <bruce@momjian.us> wrote:

On Wed, Feb 15, 2012 at 09:54:04AM +0100, Magnus Hagander wrote:

On Wed, Feb 15, 2012 at 02:23, Bruce Momjian <bruce@momjian.us> wrote:

On Wed, Feb 15, 2012 at 01:35:05AM +0200, Marko Kreen wrote:

On Tue, Feb 14, 2012 at 05:59:06PM -0500, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

On Mon, Feb 13, 2012 at 08:28:03PM -0500, Tom Lane wrote:

+1, I was about to suggest the same thing.  Running any of these tests
for a fixed number of iterations will result in drastic degradation of
accuracy as soon as the machine's behavior changes noticeably from what
you were expecting.  Run them for a fixed time period instead.  Or maybe
do a few, then check elapsed time and estimate a number of iterations to
use, if you're worried about the cost of doing gettimeofday after each
write.

Good idea, and it worked out very well.  I changed the -o loops
parameter to -s seconds which calls alarm() after (default) 2 seconds,
and then once the operation completes, computes a duration per
operation.

I was kind of wondering how portable alarm() is, and the answer
according to the buildfarm is that it isn't.

I'm using following simplistic alarm() implementation for win32:

  https://github.com/markokr/libusual/blob/master/usual/signal.c#L21

this works with fake sigaction()/SIGALARM hack below - to remember
function to call.

Good enough for simple stats printing, and avoids win32-specific
code spreading around.

Wow, I wasn't even aware this compiled in Win32;  I thought it was
ifdef'ed out.  Anyway, I am looking at SetTimer as a way of making this
work.  (Me wonders if the GoGrid Windows images have compilers.)

They don't, since most of the compilers people would ask for don't
allow that kind of redistribution.

Shame.

Ping me on im if you need one preconfigured, though...

How do you do that?  Also, once you create a Windows VM on a public
cloud, how do you connect to it?  SSH?

rdesktop.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/