effects of posix_fadvise on WAL logs

Started by Mark Wongover 16 years ago2 messages
#1Mark Wong
markwkm@gmail.com
1 attachment(s)

Hi all,

Does anyone have any tests that showcase benefits from the
posix_fadvise changes in xlog.c? I tried running some tests with dbt2
to see if any performance changes could be seen with 8.4beta2. I
thought an OLTP type test with a lot of inserts and updates would be a
good test. Unfortunately, I don't think I see anything interesting.
I was hoping to see less page cache activity, but maybe I'm not
looking correctly. Maybe there isn't enough activity to the WAL
relative to the rest of the database to show anything interesting?
Here are the tests I ran:

Baseline on 8.4beta2, using wal_sync_method=fsync:
http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.2/report/

Next set wal_sync_method=open_sync for postix_fadvise:
http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.osync1/report/

Now using the attached patch, with wal_sync_method=open_sync:
http://207.173.203.223/~markwkm/community6/dbt2/m1500-8.4beta2/m1500.8.4beta2.osync2/report/

I created the patch because currently posix_fadvise is used right
before the file handle to the WAL log is closed. I think
posix_fadvise needs to be called when the file is opened.

Regards,
Mark Wong

Attachments:

pgsql-xlog-posix_fadvise-20090425.patchapplication/octet-stream; name=pgsql-xlog-posix_fadvise-20090425.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 820b439..7d5c277 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2111,6 +2111,12 @@ XLogNeedsFlush(XLogRecPtr record)
  * inside a critical section (eg, during checkpoint there is no reason to
  * take down the system on failure).  They will promote to PANIC if we are
  * in a critical section.
+ *
+ * WAL segment files will not be re-read in normal operation, so we advise
+ * the OS to release any cached pages.  But do not do so if WAL archiving
+ * is active, because archiver process could use the cache to read the WAL
+ * segment.  Also, don't bother with it if we are using O_DIRECT, since
+ * the kernel is presumably not caching in that case.
  */
 static int
 XLogFileInit(uint32 log, uint32 seg,
@@ -2143,7 +2149,14 @@ XLogFileInit(uint32 log, uint32 seg,
 								path, log, seg)));
 		}
 		else
+		{
+#if defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_DONTNEED)
+			if (!XLogArchivingActive() &&
+					(get_sync_bit(sync_method) & PG_O_DIRECT) == 0)
+				(void) posix_fadvise(fd, 0, 0, POSIX_FADV_DONTNEED);
+#endif
 			return fd;
+		}
 	}
 
 	/*
@@ -2166,6 +2179,12 @@ XLogFileInit(uint32 log, uint32 seg,
 				(errcode_for_file_access(),
 				 errmsg("could not create file \"%s\": %m", tmppath)));
 
+#if defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_DONTNEED)
+	if (!XLogArchivingActive() &&
+			(get_sync_bit(sync_method) & PG_O_DIRECT) == 0)
+		(void) posix_fadvise(fd, 0, 0, POSIX_FADV_DONTNEED);
+#endif
+
 	/*
 	 * Zero-fill the file.	We have to do this the hard way to ensure that all
 	 * the file space has really been allocated --- on platforms that allow
@@ -2244,6 +2263,12 @@ XLogFileInit(uint32 log, uint32 seg,
 		   errmsg("could not open file \"%s\" (log file %u, segment %u): %m",
 				  path, log, seg)));
 
+#if defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_DONTNEED)
+	if (!XLogArchivingActive() &&
+			(get_sync_bit(sync_method) & PG_O_DIRECT) == 0)
+		(void) posix_fadvise(fd, 0, 0, POSIX_FADV_DONTNEED);
+#endif
+
 	return fd;
 }
 
@@ -2563,19 +2588,6 @@ XLogFileClose(void)
 {
 	Assert(openLogFile >= 0);
 
-	/*
-	 * WAL segment files will not be re-read in normal operation, so we advise
-	 * the OS to release any cached pages.  But do not do so if WAL archiving
-	 * is active, because archiver process could use the cache to read the WAL
-	 * segment.  Also, don't bother with it if we are using O_DIRECT, since
-	 * the kernel is presumably not caching in that case.
-	 */
-#if defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_DONTNEED)
-	if (!XLogArchivingActive() &&
-		(get_sync_bit(sync_method) & PG_O_DIRECT) == 0)
-		(void) posix_fadvise(openLogFile, 0, 0, POSIX_FADV_DONTNEED);
-#endif
-
 	if (close(openLogFile))
 		ereport(PANIC,
 				(errcode_for_file_access(),
#2Greg Smith
gsmith@gregsmith.com
In reply to: Mark Wong (#1)
Re: effects of posix_fadvise on WAL logs

On Tue, 26 May 2009, Mark Wong wrote:

Maybe there isn't enough activity to the WAL relative to the rest of the
database to show anything interesting?

Maybe you could reduce checkpoint_segments and focus on UPDATEs? That's
how I've been able to generate the most WAL activity relative to database
writes in the past, because of the full_page_writes behavior. Quoth the
docs: "To ensure data page consistency, the first modification of a data
page after each checkpoint results in logging the entire page content. In
that case, a smaller checkpoint interval increases the volume of output to
the WAL log, partially negating the goal of using a smaller interval, and
in any case causing more disk I/O."

You've got checkpoint_segments set to 3000 in your tests and
checkpoint_time to 1 hour, which means the tests you ran are really
generating minimal WAL volume.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD