pg_test_fsync performance

Started by Bruce Momjianalmost 14 years ago12 messages
#1Bruce Momjian
bruce@momjian.us
1 attachment(s)

I have heard complaints that /contrib/pg_test_fsync is too slow. I
thought it was impossible to speed up pg_test_fsync without reducing its
accuracy.

However, now that I some consumer-grade SATA 2 drives, I noticed that
the slowness is really in the open_sync test:

Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB
in different write open_sync sizes.)
1 * 16kB open_sync write 76.421 ops/sec
2 * 8kB open_sync writes 38.689 ops/sec
4 * 4kB open_sync writes 19.140 ops/sec
8 * 2kB open_sync writes 4.938 ops/sec
16 * 1kB open_sync writes 2.480 ops/sec

These last few lines can take very long, so I developed the attached
patch that scales down the number of tests. This makes it more
reasonable to run pg_test_fsync.

I would like to apply this for PG 9.2.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Attachments:

test_fsync.difftext/x-diff; charset=us-asciiDownload
diff --git a/contrib/pg_test_fsync/pg_test_fsync.c b/contrib/pg_test_fsync/pg_test_fsync.c
new file mode 100644
index 042b02b..970deda
*** a/contrib/pg_test_fsync/pg_test_fsync.c
--- b/contrib/pg_test_fsync/pg_test_fsync.c
*************** test_open_syncs(void)
*** 358,368 ****
  	printf("(This is designed to compare the cost of writing 16kB\n");
  	printf("in different write open_sync sizes.)\n");
  
! 	test_open_sync("16kB open_sync write", 16);
! 	test_open_sync(" 8kB open_sync writes", 8);
! 	test_open_sync(" 4kB open_sync writes", 4);
! 	test_open_sync(" 2kB open_sync writes", 2);
! 	test_open_sync(" 1kB open_sync writes", 1);
  }
  
  /*
--- 358,368 ----
  	printf("(This is designed to compare the cost of writing 16kB\n");
  	printf("in different write open_sync sizes.)\n");
  
! 	test_open_sync(" 1 * 16kB open_sync write", 16);
! 	test_open_sync(" 2 *  8kB open_sync writes", 8);
! 	test_open_sync(" 4 *  4kB open_sync writes", 4);
! 	test_open_sync(" 8 *  2kB open_sync writes", 2);
! 	test_open_sync("16 *  1kB open_sync writes", 1);
  }
  
  /*
*************** test_open_sync(const char *msg, int writ
*** 376,381 ****
--- 376,385 ----
  				ops,
  				writes;
  #endif
+ 	int	save_ops_per_test = ops_per_test;
+ 
+ 	/* This test can be long, so scale down the number of tests */
+ 	ops_per_test = ops_per_test * writes_size / 16;
  
  	printf(LABEL_FORMAT, msg);
  	fflush(stdout);
*************** test_open_sync(const char *msg, int writ
*** 402,407 ****
--- 406,412 ----
  #else
  	printf(NA_FORMAT, "n/a\n");
  #endif
+ 	ops_per_test = save_ops_per_test;
  }
  
  static void
#2Robert Haas
robertmhaas@gmail.com
In reply to: Bruce Momjian (#1)
Re: pg_test_fsync performance

On Mon, Feb 13, 2012 at 7:42 PM, Bruce Momjian <bruce@momjian.us> wrote:

I have heard complaints that /contrib/pg_test_fsync is too slow.  I
thought it was impossible to speed up pg_test_fsync without reducing its
accuracy.

However, now that I some consumer-grade SATA 2 drives, I noticed that
the slowness is really in the open_sync test:

       Compare open_sync with different write sizes:
       (This is designed to compare the cost of writing 16kB
       in different write open_sync sizes.)
                1 * 16kB open_sync write          76.421 ops/sec
                2 *  8kB open_sync writes         38.689 ops/sec
                4 *  4kB open_sync writes         19.140 ops/sec
                8 *  2kB open_sync writes          4.938 ops/sec
               16 *  1kB open_sync writes          2.480 ops/sec

These last few lines can take very long, so I developed the attached
patch that scales down the number of tests.  This makes it more
reasonable to run pg_test_fsync.

I would like to apply this for PG 9.2.

On my MacOS X, it's fsync_writethrough that's insanely slow:

[rhaas pg_test_fsync]$ ./pg_test_fsync
2000 operations per test
Direct I/O is not supported on this platform.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
open_datasync 3523.267 ops/sec
fdatasync 3360.023 ops/sec
fsync 2410.048 ops/sec
fsync_writethrough 12.576 ops/sec
open_sync 3649.475 ops/sec

Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
open_datasync 1885.284 ops/sec
fdatasync 2544.652 ops/sec
fsync 3241.218 ops/sec
fsync_writethrough ^C

Instead of or in addition to a fixed number operations per test, maybe
we should cut off each test after a certain amount of wall-clock time,
like 15 seconds. It's kind of insane to run one of these tests for 3
minutes.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#2)
Re: pg_test_fsync performance

Robert Haas <robertmhaas@gmail.com> writes:

Instead of or in addition to a fixed number operations per test, maybe
we should cut off each test after a certain amount of wall-clock time,
like 15 seconds.

+1, I was about to suggest the same thing. Running any of these tests
for a fixed number of iterations will result in drastic degradation of
accuracy as soon as the machine's behavior changes noticeably from what
you were expecting. Run them for a fixed time period instead. Or maybe
do a few, then check elapsed time and estimate a number of iterations to
use, if you're worried about the cost of doing gettimeofday after each
write.

regards, tom lane

#4Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#3)
1 attachment(s)
Re: pg_test_fsync performance

On Mon, Feb 13, 2012 at 08:28:03PM -0500, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Instead of or in addition to a fixed number operations per test, maybe
we should cut off each test after a certain amount of wall-clock time,
like 15 seconds.

+1, I was about to suggest the same thing. Running any of these tests
for a fixed number of iterations will result in drastic degradation of
accuracy as soon as the machine's behavior changes noticeably from what
you were expecting. Run them for a fixed time period instead. Or maybe
do a few, then check elapsed time and estimate a number of iterations to
use, if you're worried about the cost of doing gettimeofday after each
write.

Good idea, and it worked out very well. I changed the -o loops
parameter to -s seconds which calls alarm() after (default) 2 seconds,
and then once the operation completes, computes a duration per
operation.

The test now runs in 30 seconds and produces similar output to the
longer version.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Attachments:

test_fsync.difftext/x-diff; charset=us-asciiDownload
diff --git a/contrib/pg_test_fsync/pg_test_fsync.c b/contrib/pg_test_fsync/pg_test_fsync.c
new file mode 100644
index 3fcb087..8554426
*** a/contrib/pg_test_fsync/pg_test_fsync.c
--- b/contrib/pg_test_fsync/pg_test_fsync.c
***************
*** 27,41 ****
  #define NA_FORMAT			"%18s"
  #define OPS_FORMAT			"%9.3f ops/sec"
  
  static const char *progname;
  
! static int	ops_per_test = 2000;
  static int	needs_unlink = 0;
  static char full_buf[XLOG_SEG_SIZE],
  		   *buf,
  		   *filename = FSYNC_FILENAME;
  static struct timeval start_t,
  			stop_t;
  
  
  static void handle_args(int argc, char *argv[]);
--- 27,50 ----
  #define NA_FORMAT			"%18s"
  #define OPS_FORMAT			"%9.3f ops/sec"
  
+ #define START_TIMER	\
+ do { \
+ 	alarm_triggered = false; \
+ 	alarm(secs_per_test); \
+ 	gettimeofday(&start_t, NULL); \
+ } while (0)
+ 
+ 
  static const char *progname;
  
! static int	secs_per_test = 2;
  static int	needs_unlink = 0;
  static char full_buf[XLOG_SEG_SIZE],
  		   *buf,
  		   *filename = FSYNC_FILENAME;
  static struct timeval start_t,
  			stop_t;
+ static bool alarm_triggered = false;
  
  
  static void handle_args(int argc, char *argv[]);
*************** static void test_sync(int writes_per_op)
*** 46,57 ****
  static void test_open_syncs(void);
  static void test_open_sync(const char *msg, int writes_size);
  static void test_file_descriptor_sync(void);
  static void signal_cleanup(int sig);
  
  #ifdef HAVE_FSYNC_WRITETHROUGH
  static int	pg_fsync_writethrough(int fd);
  #endif
! static void print_elapse(struct timeval start_t, struct timeval stop_t);
  static void die(const char *str);
  
  
--- 55,67 ----
  static void test_open_syncs(void);
  static void test_open_sync(const char *msg, int writes_size);
  static void test_file_descriptor_sync(void);
+ static void process_alarm(int sig);
  static void signal_cleanup(int sig);
  
  #ifdef HAVE_FSYNC_WRITETHROUGH
  static int	pg_fsync_writethrough(int fd);
  #endif
! static void print_elapse(struct timeval start_t, struct timeval stop_t, int ops);
  static void die(const char *str);
  
  
*************** main(int argc, char *argv[])
*** 65,70 ****
--- 75,81 ----
  	/* Prevent leaving behind the test file */
  	signal(SIGINT, signal_cleanup);
  	signal(SIGTERM, signal_cleanup);
+ 	signal(SIGALRM, process_alarm);
  #ifdef SIGHUP
  	/* Not defined on win32 */
  	signal(SIGHUP, signal_cleanup);
*************** handle_args(int argc, char *argv[])
*** 96,102 ****
  {
  	static struct option long_options[] = {
  		{"filename", required_argument, NULL, 'f'},
! 		{"ops-per-test", required_argument, NULL, 'o'},
  		{NULL, 0, NULL, 0}
  	};
  	int			option;			/* Command line option */
--- 107,113 ----
  {
  	static struct option long_options[] = {
  		{"filename", required_argument, NULL, 'f'},
! 		{"secs-per-test", required_argument, NULL, 's'},
  		{NULL, 0, NULL, 0}
  	};
  	int			option;			/* Command line option */
*************** handle_args(int argc, char *argv[])
*** 117,123 ****
  		}
  	}
  
! 	while ((option = getopt_long(argc, argv, "f:o:",
  								 long_options, &optindex)) != -1)
  	{
  		switch (option)
--- 128,134 ----
  		}
  	}
  
! 	while ((option = getopt_long(argc, argv, "f:s:",
  								 long_options, &optindex)) != -1)
  	{
  		switch (option)
*************** handle_args(int argc, char *argv[])
*** 126,133 ****
  				filename = strdup(optarg);
  				break;
  
! 			case 'o':
! 				ops_per_test = atoi(optarg);
  				break;
  
  			default:
--- 137,144 ----
  				filename = strdup(optarg);
  				break;
  
! 			case 's':
! 				secs_per_test = atoi(optarg);
  				break;
  
  			default:
*************** handle_args(int argc, char *argv[])
*** 148,154 ****
  		exit(1);
  	}
  
! 	printf("%d operations per test\n", ops_per_test);
  #if PG_O_DIRECT != 0
  	printf("O_DIRECT supported on this platform for open_datasync and open_sync.\n");
  #else
--- 159,165 ----
  		exit(1);
  	}
  
! 	printf("%d seconds per test\n", secs_per_test);
  #if PG_O_DIRECT != 0
  	printf("O_DIRECT supported on this platform for open_datasync and open_sync.\n");
  #else
*************** test_sync(int writes_per_op)
*** 220,227 ****
  	{
  		if ((tmpfile = open(filename, O_RDWR | O_DSYNC | PG_O_DIRECT, 0)) == -1)
  			die("could not open output file");
! 		gettimeofday(&start_t, NULL);
! 		for (ops = 0; ops < ops_per_test; ops++)
  		{
  			for (writes = 0; writes < writes_per_op; writes++)
  				if (write(tmpfile, buf, XLOG_BLCKSZ) != XLOG_BLCKSZ)
--- 231,238 ----
  	{
  		if ((tmpfile = open(filename, O_RDWR | O_DSYNC | PG_O_DIRECT, 0)) == -1)
  			die("could not open output file");
! 		START_TIMER;
! 		for (ops = 0; alarm_triggered == false; ops++)
  		{
  			for (writes = 0; writes < writes_per_op; writes++)
  				if (write(tmpfile, buf, XLOG_BLCKSZ) != XLOG_BLCKSZ)
*************** test_sync(int writes_per_op)
*** 231,237 ****
  		}
  		gettimeofday(&stop_t, NULL);
  		close(tmpfile);
! 		print_elapse(start_t, stop_t);
  	}
  #else
  	printf(NA_FORMAT, "n/a\n");
--- 242,248 ----
  		}
  		gettimeofday(&stop_t, NULL);
  		close(tmpfile);
! 		print_elapse(start_t, stop_t, ops);
  	}
  #else
  	printf(NA_FORMAT, "n/a\n");
*************** test_sync(int writes_per_op)
*** 246,253 ****
  #ifdef HAVE_FDATASYNC
  	if ((tmpfile = open(filename, O_RDWR, 0)) == -1)
  		die("could not open output file");
! 	gettimeofday(&start_t, NULL);
! 	for (ops = 0; ops < ops_per_test; ops++)
  	{
  		for (writes = 0; writes < writes_per_op; writes++)
  			if (write(tmpfile, buf, XLOG_BLCKSZ) != XLOG_BLCKSZ)
--- 257,264 ----
  #ifdef HAVE_FDATASYNC
  	if ((tmpfile = open(filename, O_RDWR, 0)) == -1)
  		die("could not open output file");
! 	START_TIMER;
! 	for (ops = 0; alarm_triggered == false; ops++)
  	{
  		for (writes = 0; writes < writes_per_op; writes++)
  			if (write(tmpfile, buf, XLOG_BLCKSZ) != XLOG_BLCKSZ)
*************** test_sync(int writes_per_op)
*** 258,264 ****
  	}
  	gettimeofday(&stop_t, NULL);
  	close(tmpfile);
! 	print_elapse(start_t, stop_t);
  #else
  	printf(NA_FORMAT, "n/a\n");
  #endif
--- 269,275 ----
  	}
  	gettimeofday(&stop_t, NULL);
  	close(tmpfile);
! 	print_elapse(start_t, stop_t, ops);
  #else
  	printf(NA_FORMAT, "n/a\n");
  #endif
*************** test_sync(int writes_per_op)
*** 271,278 ****
  
  	if ((tmpfile = open(filename, O_RDWR, 0)) == -1)
  		die("could not open output file");
! 	gettimeofday(&start_t, NULL);
! 	for (ops = 0; ops < ops_per_test; ops++)
  	{
  		for (writes = 0; writes < writes_per_op; writes++)
  			if (write(tmpfile, buf, XLOG_BLCKSZ) != XLOG_BLCKSZ)
--- 282,289 ----
  
  	if ((tmpfile = open(filename, O_RDWR, 0)) == -1)
  		die("could not open output file");
! 	START_TIMER;
! 	for (ops = 0; alarm_triggered == false; ops++)
  	{
  		for (writes = 0; writes < writes_per_op; writes++)
  			if (write(tmpfile, buf, XLOG_BLCKSZ) != XLOG_BLCKSZ)
*************** test_sync(int writes_per_op)
*** 284,290 ****
  	}
  	gettimeofday(&stop_t, NULL);
  	close(tmpfile);
! 	print_elapse(start_t, stop_t);
  
  /*
   * If fsync_writethrough is available, test as well
--- 295,301 ----
  	}
  	gettimeofday(&stop_t, NULL);
  	close(tmpfile);
! 	print_elapse(start_t, stop_t, ops);
  
  /*
   * If fsync_writethrough is available, test as well
*************** test_sync(int writes_per_op)
*** 295,302 ****
  #ifdef HAVE_FSYNC_WRITETHROUGH
  	if ((tmpfile = open(filename, O_RDWR, 0)) == -1)
  		die("could not open output file");
! 	gettimeofday(&start_t, NULL);
! 	for (ops = 0; ops < ops_per_test; ops++)
  	{
  		for (writes = 0; writes < writes_per_op; writes++)
  			if (write(tmpfile, buf, XLOG_BLCKSZ) != XLOG_BLCKSZ)
--- 306,313 ----
  #ifdef HAVE_FSYNC_WRITETHROUGH
  	if ((tmpfile = open(filename, O_RDWR, 0)) == -1)
  		die("could not open output file");
! 	START_TIMER;
! 	for (ops = 0; alarm_triggered == false; ops++)
  	{
  		for (writes = 0; writes < writes_per_op; writes++)
  			if (write(tmpfile, buf, XLOG_BLCKSZ) != XLOG_BLCKSZ)
*************** test_sync(int writes_per_op)
*** 308,314 ****
  	}
  	gettimeofday(&stop_t, NULL);
  	close(tmpfile);
! 	print_elapse(start_t, stop_t);
  #else
  	printf(NA_FORMAT, "n/a\n");
  #endif
--- 319,325 ----
  	}
  	gettimeofday(&stop_t, NULL);
  	close(tmpfile);
! 	print_elapse(start_t, stop_t, ops);
  #else
  	printf(NA_FORMAT, "n/a\n");
  #endif
*************** test_sync(int writes_per_op)
*** 327,334 ****
  	}
  	else
  	{
! 		gettimeofday(&start_t, NULL);
! 		for (ops = 0; ops < ops_per_test; ops++)
  		{
  			for (writes = 0; writes < writes_per_op; writes++)
  				if (write(tmpfile, buf, XLOG_BLCKSZ) != XLOG_BLCKSZ)
--- 338,345 ----
  	}
  	else
  	{
! 		START_TIMER;
! 		for (ops = 0; alarm_triggered == false; ops++)
  		{
  			for (writes = 0; writes < writes_per_op; writes++)
  				if (write(tmpfile, buf, XLOG_BLCKSZ) != XLOG_BLCKSZ)
*************** test_sync(int writes_per_op)
*** 338,344 ****
  		}
  		gettimeofday(&stop_t, NULL);
  		close(tmpfile);
! 		print_elapse(start_t, stop_t);
  	}
  #else
  	printf(NA_FORMAT, "n/a\n");
--- 349,355 ----
  		}
  		gettimeofday(&stop_t, NULL);
  		close(tmpfile);
! 		print_elapse(start_t, stop_t, ops);
  	}
  #else
  	printf(NA_FORMAT, "n/a\n");
*************** test_open_sync(const char *msg, int writ
*** 385,392 ****
  		printf(NA_FORMAT, "n/a*\n");
  	else
  	{
! 		gettimeofday(&start_t, NULL);
! 		for (ops = 0; ops < ops_per_test; ops++)
  		{
  			for (writes = 0; writes < 16 / writes_size; writes++)
  				if (write(tmpfile, buf, writes_size * 1024) !=
--- 396,403 ----
  		printf(NA_FORMAT, "n/a*\n");
  	else
  	{
! 		START_TIMER;
! 		for (ops = 0; alarm_triggered == false; ops++)
  		{
  			for (writes = 0; writes < 16 / writes_size; writes++)
  				if (write(tmpfile, buf, writes_size * 1024) !=
*************** test_open_sync(const char *msg, int writ
*** 397,403 ****
  		}
  		gettimeofday(&stop_t, NULL);
  		close(tmpfile);
! 		print_elapse(start_t, stop_t);
  	}
  #else
  	printf(NA_FORMAT, "n/a\n");
--- 408,414 ----
  		}
  		gettimeofday(&stop_t, NULL);
  		close(tmpfile);
! 		print_elapse(start_t, stop_t, ops);
  	}
  #else
  	printf(NA_FORMAT, "n/a\n");
*************** test_file_descriptor_sync(void)
*** 427,434 ****
  	printf(LABEL_FORMAT, "write, fsync, close");
  	fflush(stdout);
  
! 	gettimeofday(&start_t, NULL);
! 	for (ops = 0; ops < ops_per_test; ops++)
  	{
  		if ((tmpfile = open(filename, O_RDWR, 0)) == -1)
  			die("could not open output file");
--- 438,445 ----
  	printf(LABEL_FORMAT, "write, fsync, close");
  	fflush(stdout);
  
! 	START_TIMER;
! 	for (ops = 0; alarm_triggered == false; ops++)
  	{
  		if ((tmpfile = open(filename, O_RDWR, 0)) == -1)
  			die("could not open output file");
*************** test_file_descriptor_sync(void)
*** 447,453 ****
  		close(tmpfile);
  	}
  	gettimeofday(&stop_t, NULL);
! 	print_elapse(start_t, stop_t);
  
  	/*
  	 * Now open, write, close, open again and fsync This simulates processes
--- 458,464 ----
  		close(tmpfile);
  	}
  	gettimeofday(&stop_t, NULL);
! 	print_elapse(start_t, stop_t, ops);
  
  	/*
  	 * Now open, write, close, open again and fsync This simulates processes
*************** test_file_descriptor_sync(void)
*** 456,463 ****
  	printf(LABEL_FORMAT, "write, close, fsync");
  	fflush(stdout);
  
! 	gettimeofday(&start_t, NULL);
! 	for (ops = 0; ops < ops_per_test; ops++)
  	{
  		if ((tmpfile = open(filename, O_RDWR, 0)) == -1)
  			die("could not open output file");
--- 467,474 ----
  	printf(LABEL_FORMAT, "write, close, fsync");
  	fflush(stdout);
  
! 	START_TIMER;
! 	for (ops = 0; alarm_triggered == false; ops++)
  	{
  		if ((tmpfile = open(filename, O_RDWR, 0)) == -1)
  			die("could not open output file");
*************** test_file_descriptor_sync(void)
*** 472,478 ****
  		close(tmpfile);
  	}
  	gettimeofday(&stop_t, NULL);
! 	print_elapse(start_t, stop_t);
  
  }
  
--- 483,489 ----
  		close(tmpfile);
  	}
  	gettimeofday(&stop_t, NULL);
! 	print_elapse(start_t, stop_t, ops);
  
  }
  
*************** test_non_sync(void)
*** 489,496 ****
  	printf(LABEL_FORMAT, "write");
  	fflush(stdout);
  
! 	gettimeofday(&start_t, NULL);
! 	for (ops = 0; ops < ops_per_test; ops++)
  	{
  		if ((tmpfile = open(filename, O_RDWR, 0)) == -1)
  			die("could not open output file");
--- 500,507 ----
  	printf(LABEL_FORMAT, "write");
  	fflush(stdout);
  
! 	START_TIMER;
! 	for (ops = 0; alarm_triggered == false; ops++)
  	{
  		if ((tmpfile = open(filename, O_RDWR, 0)) == -1)
  			die("could not open output file");
*************** test_non_sync(void)
*** 499,505 ****
  		close(tmpfile);
  	}
  	gettimeofday(&stop_t, NULL);
! 	print_elapse(start_t, stop_t);
  }
  
  static void
--- 510,516 ----
  		close(tmpfile);
  	}
  	gettimeofday(&stop_t, NULL);
! 	print_elapse(start_t, stop_t, ops);
  }
  
  static void
*************** pg_fsync_writethrough(int fd)
*** 533,548 ****
   * print out the writes per second for tests
   */
  static void
! print_elapse(struct timeval start_t, struct timeval stop_t)
  {
  	double		total_time = (stop_t.tv_sec - start_t.tv_sec) +
  	(stop_t.tv_usec - start_t.tv_usec) * 0.000001;
! 	double		per_second = ops_per_test / total_time;
  
  	printf(OPS_FORMAT "\n", per_second);
  }
  
  static void
  die(const char *str)
  {
  	fprintf(stderr, "%s: %s\n", str, strerror(errno));
--- 544,565 ----
   * print out the writes per second for tests
   */
  static void
! print_elapse(struct timeval start_t, struct timeval stop_t, int ops)
  {
  	double		total_time = (stop_t.tv_sec - start_t.tv_sec) +
  	(stop_t.tv_usec - start_t.tv_usec) * 0.000001;
! 	double		per_second = ops / total_time;
  
  	printf(OPS_FORMAT "\n", per_second);
  }
  
  static void
+ process_alarm(int sig)
+ {
+ 	alarm_triggered = true;
+ }
+ 
+ static void
  die(const char *str)
  {
  	fprintf(stderr, "%s: %s\n", str, strerror(errno));
diff --git a/doc/src/sgml/pgtestfsync.sgml b/doc/src/sgml/pgtestfsync.sgml
new file mode 100644
index 2889059..ba9f0c1
*** a/doc/src/sgml/pgtestfsync.sgml
--- b/doc/src/sgml/pgtestfsync.sgml
*************** pg_test_fsync [options]
*** 47,59 ****
       </varlistentry>
  
       <varlistentry>
!       <term><option>-o</option></term>
!       <term><option>--ops-per-test</option></term>
        <listitem>
         <para>
!         Specifies the number of operations per test.  The more operations
          per test, the greater the test's accuracy, but the longer it takes
!         to run.  The default is 2000.
         </para>
        </listitem>
       </varlistentry>
--- 47,59 ----
       </varlistentry>
  
       <varlistentry>
!       <term><option>-s</option></term>
!       <term><option>--secs-per-test</option></term>
        <listitem>
         <para>
!         Specifies the number of seconds for each test.  The more time
          per test, the greater the test's accuracy, but the longer it takes
!         to run.  The default is 2 seconds.
         </para>
        </listitem>
       </varlistentry>
#5Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#4)
Re: pg_test_fsync performance

On Mon, Feb 13, 2012 at 09:54:06PM -0500, Bruce Momjian wrote:

On Mon, Feb 13, 2012 at 08:28:03PM -0500, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Instead of or in addition to a fixed number operations per test, maybe
we should cut off each test after a certain amount of wall-clock time,
like 15 seconds.

+1, I was about to suggest the same thing. Running any of these tests
for a fixed number of iterations will result in drastic degradation of
accuracy as soon as the machine's behavior changes noticeably from what
you were expecting. Run them for a fixed time period instead. Or maybe
do a few, then check elapsed time and estimate a number of iterations to
use, if you're worried about the cost of doing gettimeofday after each
write.

Good idea, and it worked out very well. I changed the -o loops
parameter to -s seconds which calls alarm() after (default) 2 seconds,
and then once the operation completes, computes a duration per
operation.

Update patch applied, with additional fix for usage message, and use of
macros for start/stop testing.

I like this method much better because not only does it speed up the
test, but it also allows the write test, which completes very quickly,
to run longer and report more accurate numbers.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#4)
Re: pg_test_fsync performance

Bruce Momjian <bruce@momjian.us> writes:

On Mon, Feb 13, 2012 at 08:28:03PM -0500, Tom Lane wrote:

+1, I was about to suggest the same thing. Running any of these tests
for a fixed number of iterations will result in drastic degradation of
accuracy as soon as the machine's behavior changes noticeably from what
you were expecting. Run them for a fixed time period instead. Or maybe
do a few, then check elapsed time and estimate a number of iterations to
use, if you're worried about the cost of doing gettimeofday after each
write.

Good idea, and it worked out very well. I changed the -o loops
parameter to -s seconds which calls alarm() after (default) 2 seconds,
and then once the operation completes, computes a duration per
operation.

I was kind of wondering how portable alarm() is, and the answer
according to the buildfarm is that it isn't.

regards, tom lane

#7Marko Kreen
markokr@gmail.com
In reply to: Tom Lane (#6)
Re: pg_test_fsync performance

On Tue, Feb 14, 2012 at 05:59:06PM -0500, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

On Mon, Feb 13, 2012 at 08:28:03PM -0500, Tom Lane wrote:

+1, I was about to suggest the same thing. Running any of these tests
for a fixed number of iterations will result in drastic degradation of
accuracy as soon as the machine's behavior changes noticeably from what
you were expecting. Run them for a fixed time period instead. Or maybe
do a few, then check elapsed time and estimate a number of iterations to
use, if you're worried about the cost of doing gettimeofday after each
write.

Good idea, and it worked out very well. I changed the -o loops
parameter to -s seconds which calls alarm() after (default) 2 seconds,
and then once the operation completes, computes a duration per
operation.

I was kind of wondering how portable alarm() is, and the answer
according to the buildfarm is that it isn't.

I'm using following simplistic alarm() implementation for win32:

https://github.com/markokr/libusual/blob/master/usual/signal.c#L21

this works with fake sigaction()/SIGALARM hack below - to remember
function to call.

Good enough for simple stats printing, and avoids win32-specific
code spreading around.

--
marko

#8Bruce Momjian
bruce@momjian.us
In reply to: Marko Kreen (#7)
Re: pg_test_fsync performance

On Wed, Feb 15, 2012 at 01:35:05AM +0200, Marko Kreen wrote:

On Tue, Feb 14, 2012 at 05:59:06PM -0500, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

On Mon, Feb 13, 2012 at 08:28:03PM -0500, Tom Lane wrote:

+1, I was about to suggest the same thing. Running any of these tests
for a fixed number of iterations will result in drastic degradation of
accuracy as soon as the machine's behavior changes noticeably from what
you were expecting. Run them for a fixed time period instead. Or maybe
do a few, then check elapsed time and estimate a number of iterations to
use, if you're worried about the cost of doing gettimeofday after each
write.

Good idea, and it worked out very well. I changed the -o loops
parameter to -s seconds which calls alarm() after (default) 2 seconds,
and then once the operation completes, computes a duration per
operation.

I was kind of wondering how portable alarm() is, and the answer
according to the buildfarm is that it isn't.

I'm using following simplistic alarm() implementation for win32:

https://github.com/markokr/libusual/blob/master/usual/signal.c#L21

this works with fake sigaction()/SIGALARM hack below - to remember
function to call.

Good enough for simple stats printing, and avoids win32-specific
code spreading around.

Wow, I wasn't even aware this compiled in Win32; I thought it was
ifdef'ed out. Anyway, I am looking at SetTimer as a way of making this
work. (Me wonders if the GoGrid Windows images have compilers.)

I see backend/port/win32/timer.c so I might go with a simple "create a
thread, sleep(2), set flag, exit" solution.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#9Magnus Hagander
magnus@hagander.net
In reply to: Bruce Momjian (#8)
Re: pg_test_fsync performance

On Wed, Feb 15, 2012 at 02:23, Bruce Momjian <bruce@momjian.us> wrote:

On Wed, Feb 15, 2012 at 01:35:05AM +0200, Marko Kreen wrote:

On Tue, Feb 14, 2012 at 05:59:06PM -0500, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

On Mon, Feb 13, 2012 at 08:28:03PM -0500, Tom Lane wrote:

+1, I was about to suggest the same thing.  Running any of these tests
for a fixed number of iterations will result in drastic degradation of
accuracy as soon as the machine's behavior changes noticeably from what
you were expecting.  Run them for a fixed time period instead.  Or maybe
do a few, then check elapsed time and estimate a number of iterations to
use, if you're worried about the cost of doing gettimeofday after each
write.

Good idea, and it worked out very well.  I changed the -o loops
parameter to -s seconds which calls alarm() after (default) 2 seconds,
and then once the operation completes, computes a duration per
operation.

I was kind of wondering how portable alarm() is, and the answer
according to the buildfarm is that it isn't.

I'm using following simplistic alarm() implementation for win32:

  https://github.com/markokr/libusual/blob/master/usual/signal.c#L21

this works with fake sigaction()/SIGALARM hack below - to remember
function to call.

Good enough for simple stats printing, and avoids win32-specific
code spreading around.

Wow, I wasn't even aware this compiled in Win32;  I thought it was
ifdef'ed out.  Anyway, I am looking at SetTimer as a way of making this
work.  (Me wonders if the GoGrid Windows images have compilers.)

They don't, since most of the compilers people would ask for don't
allow that kind of redistribution.

Ping me on im if you need one preconfigured, though...

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#10Bruce Momjian
bruce@momjian.us
In reply to: Magnus Hagander (#9)
Re: pg_test_fsync performance

On Wed, Feb 15, 2012 at 09:54:04AM +0100, Magnus Hagander wrote:

On Wed, Feb 15, 2012 at 02:23, Bruce Momjian <bruce@momjian.us> wrote:

On Wed, Feb 15, 2012 at 01:35:05AM +0200, Marko Kreen wrote:

On Tue, Feb 14, 2012 at 05:59:06PM -0500, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

On Mon, Feb 13, 2012 at 08:28:03PM -0500, Tom Lane wrote:

+1, I was about to suggest the same thing. �Running any of these tests
for a fixed number of iterations will result in drastic degradation of
accuracy as soon as the machine's behavior changes noticeably from what
you were expecting. �Run them for a fixed time period instead. �Or maybe
do a few, then check elapsed time and estimate a number of iterations to
use, if you're worried about the cost of doing gettimeofday after each
write.

Good idea, and it worked out very well. �I changed the -o loops
parameter to -s seconds which calls alarm() after (default) 2 seconds,
and then once the operation completes, computes a duration per
operation.

I was kind of wondering how portable alarm() is, and the answer
according to the buildfarm is that it isn't.

I'm using following simplistic alarm() implementation for win32:

� https://github.com/markokr/libusual/blob/master/usual/signal.c#L21

this works with fake sigaction()/SIGALARM hack below - to remember
function to call.

Good enough for simple stats printing, and avoids win32-specific
code spreading around.

Wow, I wasn't even aware this compiled in Win32; �I thought it was
ifdef'ed out. �Anyway, I am looking at SetTimer as a way of making this
work. �(Me wonders if the GoGrid Windows images have compilers.)

They don't, since most of the compilers people would ask for don't
allow that kind of redistribution.

Shame.

Ping me on im if you need one preconfigured, though...

How do you do that? Also, once you create a Windows VM on a public
cloud, how do you connect to it? SSH?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#11Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#8)
1 attachment(s)
Re: pg_test_fsync performance

On Tue, Feb 14, 2012 at 08:23:10PM -0500, Bruce Momjian wrote:

On Wed, Feb 15, 2012 at 01:35:05AM +0200, Marko Kreen wrote:

On Tue, Feb 14, 2012 at 05:59:06PM -0500, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

On Mon, Feb 13, 2012 at 08:28:03PM -0500, Tom Lane wrote:

+1, I was about to suggest the same thing. Running any of these tests
for a fixed number of iterations will result in drastic degradation of
accuracy as soon as the machine's behavior changes noticeably from what
you were expecting. Run them for a fixed time period instead. Or maybe
do a few, then check elapsed time and estimate a number of iterations to
use, if you're worried about the cost of doing gettimeofday after each
write.

Good idea, and it worked out very well. I changed the -o loops
parameter to -s seconds which calls alarm() after (default) 2 seconds,
and then once the operation completes, computes a duration per
operation.

I was kind of wondering how portable alarm() is, and the answer
according to the buildfarm is that it isn't.

I'm using following simplistic alarm() implementation for win32:

https://github.com/markokr/libusual/blob/master/usual/signal.c#L21

this works with fake sigaction()/SIGALARM hack below - to remember
function to call.

Good enough for simple stats printing, and avoids win32-specific
code spreading around.

Wow, I wasn't even aware this compiled in Win32; I thought it was
ifdef'ed out. Anyway, I am looking at SetTimer as a way of making this
work. (Me wonders if the GoGrid Windows images have compilers.)

I see backend/port/win32/timer.c so I might go with a simple "create a
thread, sleep(2), set flag, exit" solution.

Yeah, two Windows buildfarm machines have now successfully compiled my
patches, so I guess I fixed it; patch attached.

The fix was surprisingly easy given the use of threads; scheduling the
timeout in the operating system was just too invasive.

I would like to eventually know if this fix actually produces the right
output. How would I test that? Are the buildfarm output binaries
available somewhere? Should I add this as a 9.2 TODO item?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Attachments:

test_fsync.difftext/x-diff; charset=us-asciiDownload
diff --git a/contrib/pg_test_fsync/pg_test_fsync.c b/contrib/pg_test_fsync/pg_test_fsync.c
new file mode 100644
index 02a9e21..7f92bc8
*** a/contrib/pg_test_fsync/pg_test_fsync.c
--- b/contrib/pg_test_fsync/pg_test_fsync.c
***************
*** 28,39 ****
--- 28,54 ----
  #define OPS_FORMAT			"%9.3f ops/sec"
  
  /* These are macros to avoid timing the function call overhead. */
+ #ifndef WIN32
  #define START_TIMER	\
  do { \
  	alarm_triggered = false; \
  	alarm(secs_per_test); \
  	gettimeofday(&start_t, NULL); \
  } while (0)
+ #else
+ /* WIN32 doesn't support alarm, so we create a thread and sleep there */
+ #define START_TIMER	\
+ do { \
+ 	alarm_triggered = false; \
+ 	if (CreateThread(NULL, 0, process_alarm, NULL, 0, NULL) == \
+ 		INVALID_HANDLE_VALUE) \
+ 	{ \
+ 		fprintf(stderr, "Cannot create thread for alarm\n"); \
+ 		exit(1); \
+ 	} \
+ 	gettimeofday(&start_t, NULL); \
+ } while (0)
+ #endif
  
  #define STOP_TIMER	\
  do { \
*************** static void test_sync(int writes_per_op)
*** 62,68 ****
--- 77,87 ----
  static void test_open_syncs(void);
  static void test_open_sync(const char *msg, int writes_size);
  static void test_file_descriptor_sync(void);
+ #ifndef WIN32
  static void process_alarm(int sig);
+ #else
+ static DWORD WINAPI process_alarm(LPVOID param);
+ #endif
  static void signal_cleanup(int sig);
  
  #ifdef HAVE_FSYNC_WRITETHROUGH
*************** main(int argc, char *argv[])
*** 82,88 ****
--- 101,109 ----
  	/* Prevent leaving behind the test file */
  	signal(SIGINT, signal_cleanup);
  	signal(SIGTERM, signal_cleanup);
+ #ifndef WIN32
  	signal(SIGALRM, process_alarm);
+ #endif
  #ifdef SIGHUP
  	/* Not defined on win32 */
  	signal(SIGHUP, signal_cleanup);
*************** print_elapse(struct timeval start_t, str
*** 550,560 ****
--- 571,592 ----
  	printf(OPS_FORMAT "\n", per_second);
  }
  
+ #ifndef WIN32
  static void
  process_alarm(int sig)
  {
  	alarm_triggered = true;
  }
+ #else
+ static DWORD WINAPI
+ process_alarm(LPVOID param)
+ {
+ 	/* WIN32 doesn't support alarm, so we create a thread and sleep here */
+ 	Sleep(secs_per_test * 1000);
+ 	alarm_triggered = true;
+ 	ExitThread(0);
+ }
+ #endif
  
  static void
  die(const char *str)
#12Magnus Hagander
magnus@hagander.net
In reply to: Bruce Momjian (#10)
Re: pg_test_fsync performance

On Wed, Feb 15, 2012 at 16:14, Bruce Momjian <bruce@momjian.us> wrote:

On Wed, Feb 15, 2012 at 09:54:04AM +0100, Magnus Hagander wrote:

On Wed, Feb 15, 2012 at 02:23, Bruce Momjian <bruce@momjian.us> wrote:

On Wed, Feb 15, 2012 at 01:35:05AM +0200, Marko Kreen wrote:

On Tue, Feb 14, 2012 at 05:59:06PM -0500, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

On Mon, Feb 13, 2012 at 08:28:03PM -0500, Tom Lane wrote:

+1, I was about to suggest the same thing.  Running any of these tests
for a fixed number of iterations will result in drastic degradation of
accuracy as soon as the machine's behavior changes noticeably from what
you were expecting.  Run them for a fixed time period instead.  Or maybe
do a few, then check elapsed time and estimate a number of iterations to
use, if you're worried about the cost of doing gettimeofday after each
write.

Good idea, and it worked out very well.  I changed the -o loops
parameter to -s seconds which calls alarm() after (default) 2 seconds,
and then once the operation completes, computes a duration per
operation.

I was kind of wondering how portable alarm() is, and the answer
according to the buildfarm is that it isn't.

I'm using following simplistic alarm() implementation for win32:

  https://github.com/markokr/libusual/blob/master/usual/signal.c#L21

this works with fake sigaction()/SIGALARM hack below - to remember
function to call.

Good enough for simple stats printing, and avoids win32-specific
code spreading around.

Wow, I wasn't even aware this compiled in Win32;  I thought it was
ifdef'ed out.  Anyway, I am looking at SetTimer as a way of making this
work.  (Me wonders if the GoGrid Windows images have compilers.)

They don't, since most of the compilers people would ask for don't
allow that kind of redistribution.

Shame.

Ping me on im if you need one preconfigured, though...

How do you do that?  Also, once you create a Windows VM on a public
cloud, how do you connect to it?  SSH?

rdesktop.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/