Uh, I change my mind about commit_delay + commit_siblings (sort of)

Started by Peter Geogheganover 13 years ago45 messages
#1Peter Geoghegan
peter@2ndquadrant.com
1 attachment(s)

The attached very simple patch moves the commit_delay +
commit_siblings sleep into XLogFlush, where the leader alone sleeps.
This appears to be a much more effective site for a delay.

Benchmark results, with and without a delay of 3000ms (commit_siblings
is 0 in both cases) are available from:

http://leadercommitdelay.staticloud.com

The only way this differed from my usual setup for these benchmarks
was that, as it happened, shared_buffers was set to 256MB, which would
of course have affected wal_buffers actual value, which was 8MB.

I surmise that the reason this works so well is that it increases the
rate of batching at lower client counts, without needlessly delaying
each and every follower beyond when their transaction is likely to
have committed. A more detailed analysis will have to wait for
tomorrow.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Attachments:

move_delay_2012_05_29.v1.patchapplication/octet-stream; name=move_delay_2012_05_29.v1.patchDownload
diff src/backend/access/transam/xact.c
index c71a10e..9b85441
*** a/src/backend/access/transam/xact.c
--- b/src/backend/access/transam/xact.c
*************** RecordTransactionCommit(void)
*** 1118,1139 ****
  	if ((wrote_xlog && synchronous_commit > SYNCHRONOUS_COMMIT_OFF) ||
  		forceSyncCommit || nrels > 0)
  	{
- 		/*
- 		 * Synchronous commit case:
- 		 *
- 		 * Sleep before flush! So we can flush more than one commit records
- 		 * per single fsync.  (The idea is some other backend may do the
- 		 * XLogFlush while we're sleeping.  This needs work still, because on
- 		 * most Unixen, the minimum select() delay is 10msec or more, which is
- 		 * way too long.)
- 		 *
- 		 * We do not sleep if enableFsync is not turned on, nor if there are
- 		 * fewer than CommitSiblings other backends with active transactions.
- 		 */
- 		if (CommitDelay > 0 && enableFsync &&
- 			MinimumActiveBackends(CommitSiblings))
- 			pg_usleep(CommitDelay);
-
  		XLogFlush(XactLastRecEnd);

  		/*
--- 1118,1123 ----
diff src/backend/access/transam/xlog.c
index d3650bd..6715618
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
*************** int			wal_level = WAL_LEVEL_MINIMAL;
*** 85,90 ****
--- 85,94 ----
  bool		XLOG_DEBUG = false;
  #endif

+ /* This should appear in a header */
+ extern int	CommitDelay;
+ extern int	CommitSiblings;
+
  /*
   * XLOGfileslop is the maximum number of preallocated future XLOG segments.
   * When we are done with an old XLOG segment file, we will recycle it as a
*************** XLogFlush(XLogRecPtr record)
*** 2111,2116 ****
--- 2115,2137 ----
  			 */
  			continue;
  		}
+
+ 		/*
+ 		 * Synchronous commit case:
+ 		 *
+ 		 * Sleep before flush! So we can flush more than one commit records
+ 		 * per single fsync.  (The idea is some other backend may do the
+ 		 * XLogFlush while we're sleeping.  This needs work still, because on
+ 		 * most Unixen, the minimum select() delay is 10msec or more, which is
+ 		 * way too long.)
+ 		 *
+ 		 * We do not sleep if enableFsync is not turned on, nor if there are
+ 		 * fewer than CommitSiblings other backends with active transactions.
+ 		 */
+ 		if (CommitDelay > 0 && enableFsync &&
+ 			MinimumActiveBackends(CommitSiblings))
+ 			pg_usleep(CommitDelay);
+
  		/* Got the lock */
  		LogwrtResult = XLogCtl->LogwrtResult;
  		if (!XLByteLE(record, LogwrtResult.Flush))
#2Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Peter Geoghegan (#1)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

On 29.05.2012 04:18, Peter Geoghegan wrote:

The attached very simple patch moves the commit_delay +
commit_siblings sleep into XLogFlush, where the leader alone sleeps.
This appears to be a much more effective site for a delay.

Benchmark results, with and without a delay of 3000ms (commit_siblings
is 0 in both cases) are available from:

http://leadercommitdelay.staticloud.com

Can you elaborate how to read those results? What do "insert delay test"
and "patch, commit_delay=0" mean? Which graph should I be looking at?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#3Peter Geoghegan
peter@2ndquadrant.com
In reply to: Heikki Linnakangas (#2)
1 attachment(s)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

On 29 May 2012 07:16, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

On 29.05.2012 04:18, Peter Geoghegan wrote:

Benchmark results, with and without a delay of 3000ms (commit_siblings
is 0 in both cases) are available from:

http://leadercommitdelay.staticloud.com

Sorry, I do of course mean a commit_delay of 3000 (microseconds, not
milliseconds).

Can you elaborate how to read those results? What do "insert delay test" and
"patch, commit_delay=0" mean? Which graph should I be looking at?

It's very simple. It's an insert.sql based workload, for the patch,
with and without commit_delay set to 3000 (and a commit_delay of 0
makes the commit_delay code inert, just as before). Setting
commit_delay appears to be far more effective with the patch.

Here is a new benchmark:

http://leadermastercommitdelay.staticloud.com/

This is more or less my usual group commit benchmark - median of 3
runs of insert.sql across a wide range of client counts. The
pgbench-tools config is attached for your reference.

On this occasion, I set shared_buffers to 1GB once more (and
consequently, wal_buffers was 16MB).

The green line should look familiar to you. With commit_delay = 0,
it's equivalent to commit_delay=0 on master. In other words, it's very
similar to a number of benchmarks that have already been performed. I
believe that the numbers will line up pretty well with my original
group commit benchmark from back in January, for example, as that was
also performed on my Thinkpad.

The effect that Jeff Janes observed on master is apparent here at low
client counts - master with commit_delay = 3000 is almost as effective
at 8 clients as it is for the patch. After that, the effect is almost
entirely eliminated, which is consistent with my earlier observations
about commit_delay post new group commit.

There obviously appears to be a further significant increase in
transaction throughput with this configuration and patch. Even the
average latency of the patch with commit_delay=3000 is a bit better
than either master with group_commit = 3000 or baseline (as I've said
baseline could have equally well been on master or with master + the
patch).

I suppose my main beef with commit_delay before, apart from the simple
fact that it wasn't actually useful to any significant degree, was
that the code didn't usefully interact with the group commit
improvements, and we didn't revisit commit_delay for 9.2 even in light
of its assumptions not necessarily continuing to hold. Perhaps we
should revisit commit_delay for 9.2 now - we never adequately answered
the question of how useful it was in light of new group commit, so
revisiting that question can be justified as closing an open item,
rather than adding new functionality.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Attachments:

configapplication/octet-stream; name=configDownload
#4Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#1)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

On Mon, May 28, 2012 at 9:18 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:

The attached very simple patch moves the commit_delay +
commit_siblings sleep into XLogFlush, where the leader alone sleeps.
This appears to be a much more effective site for a delay.

This is a clever idea, but I think it needs some fine-tuning: as
written, this will sleep for any flush, not just a flush of a commit
record. One idea might be to add a flag to XLogFlush indicating
whether a commit-delay sleep is permissible. Callers other than
RecordTransactionCommit() could pass false; RecordTransactionCommit()
could pass true.

The comments need some updating, too.

Like Heikki, I find the test results that you posted pretty hard to
understand. This is sort of a general beef I have with pgbench-tools:
there are no explanations anywhere about what the graphs actually
mean. The first three graphs seem fairly useless, and the third one
doesn't even appear to contain any data. The fourth and fifth graphs
seem like the most useful part of the output, but the lack of an
explanation of what's being graphed really hinders understanding.
Apparently, the fourth graph is TPS vs. scale factor (at an
unspecified number of clients) and the fifth graph is TPS vs. clients
(at an unspecified scale factor). Maybe we're just averaging across
the unspecified parameter in each case, but if so that doesn't seem
very useful.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#5Peter Geoghegan
peter@2ndquadrant.com
In reply to: Robert Haas (#4)
1 attachment(s)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

On 29 May 2012 17:10, Robert Haas <robertmhaas@gmail.com> wrote:

This is a clever idea,

Thanks.

but I think it needs some fine-tuning: as
written, this will sleep for any flush, not just a flush of a commit
record.  One idea might be to add a flag to XLogFlush indicating
whether a commit-delay sleep is permissible.  Callers other than
RecordTransactionCommit() could pass false; RecordTransactionCommit()
could pass true.

The comments need some updating, too.

Uh, yeah, I posted that at 2am, which was about an hour after I
initially had the idea. Attached patch revises the comments and
documentation in a way that I think is appropriate, though it is still
essentially the same patch. I did consider the fact that the delay
might occur from one of a number of XLogFlush() callsites that did not
previously delay. However, even if I do what you suggest, it is just
dumb luck as to whether or not that makes any difference, since those
other sites may well be exactly as delayed, since they're still going
to have to wait behind the WALWriteLock, which the leader now holds
while sleeping anyway. I imagined that the fact that those callsites
were excluded before had something to do with ameliorating historic
commit_delay problems, but those problems have already been
significantly reduced.

Why do you think that doing this for all XLogFlush() callsites might
be problematic?

Like Heikki, I find the test results that you posted pretty hard to
understand.  This is sort of a general beef I have with pgbench-tools:
there are no explanations anywhere about what the graphs actually
mean. The first three graphs seem fairly useless, and the third one
doesn't even appear to contain any data.  The fourth and fifth graphs
seem like the most useful part of the output, but the lack of an
explanation of what's being graphed really hinders understanding.
Apparently, the fourth graph is TPS vs. scale factor (at an
unspecified number of clients) and the fifth graph is TPS vs. clients
(at an unspecified scale factor).  Maybe we're just averaging across
the unspecified parameter in each case, but if so that doesn't seem
very useful.

I'm sure that Greg will be happy to consider a github pull request to
improve the tool in these areas. I myself am used to interpreting
pgbench-tools output, but I suppose it is a little confusing in
various ways. As you probably realise, I was directing your attention
towards clients-set.png . I am in the habit of hacking pgbench-tools
to output much bigger gnu-plot images.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Attachments:

move_delay_2012_05_29.v2.patchapplication/octet-stream; name=move_delay_2012_05_29.v2.patchDownload
diff doc/src/sgml/config.sgml
index 074afee..8ff21a9
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
*************** SET ENABLE_SEQSCAN TO OFF;
*** 1869,1888 ****
          When the commit data for a transaction is flushed to disk, any
          additional commits ready at that time are also flushed out.
          <varname>commit_delay</varname> adds a time delay, set in
!         microseconds, before a transaction attempts to
!         flush the WAL buffer out to disk.  A nonzero delay can allow more
!         transactions to be committed with only one flush operation, if
!         system load is high enough that additional transactions become
!         ready to commit within the given interval. But the delay is
!         just wasted if no other transactions become ready to
!         commit. Therefore, the delay is only performed if at least
!         <varname>commit_siblings</varname> other transactions are
!         active at the instant that a server process has written its
!         commit record.
          The default <varname>commit_delay</> is zero (no delay).
-         Since all pending commit data will be written at every flush
-         regardless of this setting, it is rare that adding delay
-         by increasing this parameter will actually improve performance.
         </para>
        </listitem>
       </varlistentry>
--- 1869,1888 ----
          When the commit data for a transaction is flushed to disk, any
          additional commits ready at that time are also flushed out.
          <varname>commit_delay</varname> adds a time delay, set in
!         microseconds, before a leading transaction participating in
!         group commit attempts to flush the WAL buffer out to disk.
!         This can add an additional latency of of up to
!         <varname>commit_delay</varname> microseconds for each transaction.
!         A nonzero delay can allow more transactions to be committed with
!         only one flush operation, if system load is high enough that
!         additional transactions become ready to commit within the
!         given interval. However, the delay is just wasted if no other
!         transactions become ready to commit. Therefore, the delay
!         is only performed if at least <varname>commit_siblings</varname>
!         other transactions are active immediately before the leader
!         backend participating in group commit proceeds with flushing
!         WAL.
          The default <varname>commit_delay</> is zero (no delay).
         </para>
        </listitem>
       </varlistentry>
diff doc/src/sgml/wal.sgml
index 0afb9d6..a98132d
*** a/doc/src/sgml/wal.sgml
--- b/doc/src/sgml/wal.sgml
***************
*** 376,384 ****
     <acronym>WAL</acronym> to disk, in the hope that a single flush
     executed by one such transaction can also serve other transactions
     committing at about the same time.  Setting <varname>commit_delay</varname>
!    can only help when there are many concurrently committing transactions,
!    and it is difficult to tune it to a value that actually helps rather
!    than hurt throughput.
    </para>

   </sect1>
--- 376,382 ----
     <acronym>WAL</acronym> to disk, in the hope that a single flush
     executed by one such transaction can also serve other transactions
     committing at about the same time.  Setting <varname>commit_delay</varname>
!    can only help when there are many concurrently committing transactions.
    </para>

   </sect1>
diff src/backend/access/transam/xact.c
index c71a10e..513172b
*** a/src/backend/access/transam/xact.c
--- b/src/backend/access/transam/xact.c
*************** bool		XactDeferrable;
*** 67,75 ****

  int			synchronous_commit = SYNCHRONOUS_COMMIT_ON;

- int			CommitDelay = 0;	/* precommit delay in microseconds */
- int			CommitSiblings = 5; /* # concurrent xacts needed to sleep */
-
  /*
   * MyXactAccessedTempRel is set when a temporary relation is accessed.
   * We don't allow PREPARE TRANSACTION in that case.  (This is global
--- 67,72 ----
*************** RecordTransactionCommit(void)
*** 1118,1139 ****
  	if ((wrote_xlog && synchronous_commit > SYNCHRONOUS_COMMIT_OFF) ||
  		forceSyncCommit || nrels > 0)
  	{
- 		/*
- 		 * Synchronous commit case:
- 		 *
- 		 * Sleep before flush! So we can flush more than one commit records
- 		 * per single fsync.  (The idea is some other backend may do the
- 		 * XLogFlush while we're sleeping.  This needs work still, because on
- 		 * most Unixen, the minimum select() delay is 10msec or more, which is
- 		 * way too long.)
- 		 *
- 		 * We do not sleep if enableFsync is not turned on, nor if there are
- 		 * fewer than CommitSiblings other backends with active transactions.
- 		 */
- 		if (CommitDelay > 0 && enableFsync &&
- 			MinimumActiveBackends(CommitSiblings))
- 			pg_usleep(CommitDelay);
-
  		XLogFlush(XactLastRecEnd);

  		/*
--- 1115,1120 ----
diff src/backend/access/transam/xlog.c
index d3650bd..3ef8d88
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
*************** bool		fullPageWrites = true;
*** 80,85 ****
--- 80,87 ----
  bool		log_checkpoints = false;
  int			sync_method = DEFAULT_SYNC_METHOD;
  int			wal_level = WAL_LEVEL_MINIMAL;
+ int			CommitDelay = 0;	/* precommit delay in microseconds */
+ int			CommitSiblings = 5; /* # concurrent xacts needed to sleep */

  #ifdef WAL_DEBUG
  bool		XLOG_DEBUG = false;
*************** XLogFlush(XLogRecPtr record)
*** 2111,2116 ****
--- 2113,2132 ----
  			 */
  			continue;
  		}
+
+ 		/*
+ 		 * Sleep before flush! By adding a delay here, we may give further
+ 		 * backends the opportunity to join the backlog of group commit
+ 		 * followers; this can significantly improve transaction throughput, at
+ 		 * the risk of increasing transaction latency.
+ 		 *
+ 		 * We do not sleep if enableFsync is not turned on, nor if there are
+ 		 * fewer than CommitSiblings other backends with active transactions.
+ 		 */
+ 		if (CommitDelay > 0 && enableFsync &&
+ 			MinimumActiveBackends(CommitSiblings))
+ 			pg_usleep(CommitDelay);
+
  		/* Got the lock */
  		LogwrtResult = XLogCtl->LogwrtResult;
  		if (!XLByteLE(record, LogwrtResult.Flush))
diff src/backend/utils/misc/guc.c
index d75ab43..9b86ac1
*** a/src/backend/utils/misc/guc.c
--- b/src/backend/utils/misc/guc.c
*************** static struct config_int ConfigureNamesI
*** 2031,2037 ****
  	{
  		{"commit_delay", PGC_USERSET, WAL_SETTINGS,
  			gettext_noop("Sets the delay in microseconds between transaction commit and "
! 						 "flushing WAL to disk."),
  			NULL
  		},
  		&CommitDelay,
--- 2031,2037 ----
  	{
  		{"commit_delay", PGC_USERSET, WAL_SETTINGS,
  			gettext_noop("Sets the delay in microseconds between transaction commit and "
! 						 "flushing WAL to disk for the group commit leader."),
  			NULL
  		},
  		&CommitDelay,
#6Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#5)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

On Tue, May 29, 2012 at 12:47 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:

Why do you think that doing this for all XLogFlush() callsites might
be problematic?

Well, consider the one in the background writer, for example. That's
just a periodic flush, so I see no benefit in having it acquire the
lock and then wait some more. It already did wait. And what about
the case where we're flushing while holding WALInsertLock because the
buffer's full? Clearly waiting is useless in that case - nobody can
join the group commit for exactly the same reason that we're doing the
flush in the first place: no buffer space.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#7Peter Geoghegan
peter@2ndquadrant.com
In reply to: Robert Haas (#6)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

On 29 May 2012 17:58, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, May 29, 2012 at 12:47 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:

Why do you think that doing this for all XLogFlush() callsites might
be problematic?

Well, consider the one in the background writer, for example.  That's
just a periodic flush, so I see no benefit in having it acquire the
lock and then wait some more.  It already did wait.

No benefit for the BGWriter, no, but a benefit for the cluster as a
whole, which can see not only an improvement in transaction
throughput, but also of average and worst-case latency. The whole idea
of a group commit feature is that one backend altruistically takes
other backends with it. Besides, for the background writer or
checkpointer, that operate in terms of minutes, seconds and
milliseconds, having their transaction block an extra 2 milliseconds
is hardly a real problem. Both the BGWriter and checkpointer do an
amount of work that is adaptive per cycle anyway.

And what about the case where we're flushing while holding WALInsertLock because the
buffer's full?  Clearly waiting is useless in that case - nobody can
join the group commit for exactly the same reason that we're doing the
flush in the first place: no buffer space.

Maybe you could have the leader check that condition, and not wait
accordingly. If the buffer is full, chances are very high that there
already is a leader, either sleeping or following through with an
XLogWrite(), whose work is well underway.

The general solution that you've proposed - adding an actually_wait
bool parameter to XLogFlush() - won't work, unless, for example, the
"buffer is full" call happens to become the leader, which is generally
quite unlikely. Otherwise it's stuck waiting behind WALWriteLock along
with everyone else, on average for a duration of half (CommitDelay +
avg flushtime), while the leader sleeps and then works. If something
is only effective a small minority of the time, why bother? I doubt
that you can reasonably have the BGWriter or whatever cut in line
ahead of an ever-growing queue of backends calling XLogFlush(), if
that's what you're thinking - now all those backends have to wait an
additional period of up to however long the flush takes.

In any case, the standard for changing the exact behaviour of
commit_delay/commit_siblings ought to be quite low, because those
settings are already so heavily laden with problems that it is
somewhat doubtful that they've ever been used by someone in production
sensibly. Obviously having a delay that turns out to be in some way
suboptimal is obviously something I'd hope to avoid entirely, but I've
done a good job of considerably improving that situation. If we get
this into 9.2, perhaps a more adaptive implementation can follow in
9.3.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

#8Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#6)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

On 29 May 2012 17:58, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, May 29, 2012 at 12:47 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:

Why do you think that doing this for all XLogFlush() callsites might
be problematic?

Well, consider the one in the background writer, for example.  That's
just a periodic flush, so I see no benefit in having it acquire the
lock and then wait some more.  It already did wait.  And what about
the case where we're flushing while holding WALInsertLock because the
buffer's full?  Clearly waiting is useless in that case - nobody can
join the group commit for exactly the same reason that we're doing the
flush in the first place: no buffer space.

When I read this the first time, I was in full agreement.

On closer inspection neither point is valid, though both points were
worth considering.

Well, consider the one in the background writer, for example.  That's
just a periodic flush, so I see no benefit in having it acquire the
lock and then wait some more.  It already did wait.

We use XLogBackgroundFlush() not XLogFlush() from background processes.

And what about
the case where we're flushing while holding WALInsertLock because the
buffer's full?  Clearly waiting is useless in that case - nobody can
join the group commit for exactly the same reason that we're doing the
flush in the first place: no buffer space.

We don't flush WAL in that case, we just write it.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

#9Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#8)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

On Wed, May 30, 2012 at 4:36 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

When I read this the first time, I was in full agreement.

On closer inspection neither point is valid, though both points were
worth considering.

Well, consider the one in the background writer, for example.  That's
just a periodic flush, so I see no benefit in having it acquire the
lock and then wait some more.  It already did wait.

We use XLogBackgroundFlush() not XLogFlush() from background processes.

And what about
the case where we're flushing while holding WALInsertLock because the
buffer's full?  Clearly waiting is useless in that case - nobody can
join the group commit for exactly the same reason that we're doing the
flush in the first place: no buffer space.

We don't flush WAL in that case, we just write it.

OK, but there are a lot of places where we call XLogFlush(), and it's
far from obvious that it's a win to do this in all of those cases. At
least, nobody's done that analysis. XLogFlush is called from:

- WriteTruncateXlogRec(), which is called from TruncateCLOG()
- SlruPhysicalWritePage()
- EndPrepare()
- RecordTransactionCommitPrepared()
- RecordTransactionCommit()
- xact_redo_commit_internal()
- CreateCheckPoint()
- RelationTruncate()
- FlushBuffer()
- write_relmap_file()

Most of those actually do look like reasonable places to try to get
grouped flushing behavior, but:

1. It seems wrong to do it in xact_redo_commit_internal(). It won't
matter if commit_siblings>0 since there won't be any other backends
with transaction IDs anyway, but if commit_siblings==0 then we'll
sleep for no possible benefit.

2. Doing it in FlushBuffer() seems slightly iffy since we might be
sitting on a buffer lock. But maybe it's a win anyway, or just not
worth worrying about.

Another thing to think about is that if we do this across the board
rather than just for commits, then commit_delay and commit_siblings
will really be totally misnamed - they really ought to be flush_delay
and flush_siblings at that point.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#10Peter Geoghegan
peter@2ndquadrant.com
In reply to: Robert Haas (#9)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

On 30 May 2012 13:24, Robert Haas <robertmhaas@gmail.com> wrote:

Most of those actually do look like reasonable places to try to get
grouped flushing behavior, but:

1. It seems wrong to do it in xact_redo_commit_internal().  It won't
matter if commit_siblings>0 since there won't be any other backends
with transaction IDs anyway, but if commit_siblings==0 then we'll
sleep for no possible benefit.

2. Doing it in FlushBuffer() seems slightly iffy since we might be
sitting on a buffer lock.  But maybe it's a win anyway, or just not
worth worrying about.

Typical values of commit_delay are usually a fraction of spinning rust
latency, so I don't think that FlushBuffer() has any business really
caring.

These problems seem rather minor compared to the existing problems
with the settings. As I've already outlined, I doubt it's possible to
really remove the delays here without an over-engineered solution. In
short, I suspect it isn't worth it. We must trust DBAs to set
commit_siblings appropriately if they've set commit_delay.

Another thing to think about is that if we do this across the board
rather than just for commits, then commit_delay and commit_siblings
will really be totally misnamed - they really ought to be flush_delay
and flush_siblings at that point.

Seems reasonable. It would also have the advantage of avoiding having
the new implementation tarred with the same brush as commit_delay.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

#11Simon Riggs
simon@2ndquadrant.com
In reply to: Robert Haas (#9)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

On 30 May 2012 13:24, Robert Haas <robertmhaas@gmail.com> wrote:

OK, but there are a lot of places where we call XLogFlush(), and it's
far from obvious that it's a win to do this in all of those cases.  At
least, nobody's done that analysis.  XLogFlush is called from:

- WriteTruncateXlogRec(), which is called from TruncateCLOG()
- SlruPhysicalWritePage()
- EndPrepare()
- RecordTransactionCommitPrepared()
- RecordTransactionCommit()
- xact_redo_commit_internal()
- CreateCheckPoint()
- RelationTruncate()
- FlushBuffer()
- write_relmap_file()

Most of those actually do look like reasonable places to try to get
grouped flushing behavior, but:

1. It seems wrong to do it in xact_redo_commit_internal().  It won't
matter if commit_siblings>0 since there won't be any other backends
with transaction IDs anyway, but if commit_siblings==0 then we'll
sleep for no possible benefit.

Agreed

2. Doing it in FlushBuffer() seems slightly iffy since we might be
sitting on a buffer lock.  But maybe it's a win anyway, or just not
worth worrying about.

Agreed.

The remaining cases aren't worth worrying about, apart from
SlruPhysicalWritePage() which happens during visibility checks and
needs to happen as quickly as possible also.

I would say the additional contention from waiting outweighs the
benefit of the wait in those 3 places, so skipping the wait is wise.

Should be implemented as

#define XLogFlush(lsn) XLogFlushInternal(lsn, true)
#define XLogFlushNoWait(lsn) XLogFlushInternal(lsn, false)

so we can drop the no wait version in without touching the other call points

Another thing to think about is that if we do this across the board
rather than just for commits, then commit_delay and commit_siblings
will really be totally misnamed - they really ought to be flush_delay
and flush_siblings at that point.

I think if we were to rename the parameters, then they should be called

group_commit_delay
group_commit_siblings

Since "Group Commit" is the accepted term for this behaviour, even
though, as you point out that the behaviour isn't just about commit.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

#12Peter Geoghegan
peter@2ndquadrant.com
In reply to: Simon Riggs (#11)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

On 30 May 2012 15:25, Simon Riggs <simon@2ndquadrant.com> wrote:

1. It seems wrong to do it in xact_redo_commit_internal().  It won't
matter if commit_siblings>0 since there won't be any other backends
with transaction IDs anyway, but if commit_siblings==0 then we'll
sleep for no possible benefit.

Agreed

2. Doing it in FlushBuffer() seems slightly iffy since we might be
sitting on a buffer lock.  But maybe it's a win anyway, or just not
worth worrying about.

Agreed.

As I've pointed out, we cannot meaningfully skip the wait for
auxiliary processes alone (except perhaps by having commit_siblings
set sufficiently high).

The remaining cases aren't worth worrying about, apart from
SlruPhysicalWritePage() which happens during visibility checks and
needs to happen as quickly as possible also.

I'm not so sure. It says in that function:

/*
* We must determine the largest async-commit LSN for the page. This
* is a bit tedious, but since this entire function is a slow path
* anyway, it seems better to do this here than to maintain a per-page
* LSN variable (which'd need an extra comparison in the
* transaction-commit path).
*/

I would say the additional contention from waiting outweighs the
benefit of the wait in those 3 places, so skipping the wait is wise.

MinimumActiveBackends() reports the "count backends (other than
myself) that are in active transactions", so unnecessary calls will
have to occur when we have active transactions >= CommitSiblings, not
connections >= CommitSiblings as was previously the case.

What if we were to skip the wait during recovery only, by specially
setting CommitDelay to 0 in the start-up process? Would that satisfy
everyone's concerns about unhelpful delays? I'm not sure how this
might interact with hot standby.

The only concrete way I can see of avoiding a possibly useless wait
(setting commit_siblings prudently alone is a very big help) would be
semantics that are not generally acceptable for XLogFlush(): a way of
calling XLogFlush() such that we either become the leader and don't
wait OR join the queue and return immediately under the assumption
that the leader will successfully flush up to our LSN. It is true that
part of the premise of having a leader backend was that "XLogFlush is
pretty much all critical section, so if it fails we're going to PANIC
anyway".

That said, at least some of the problematic XLogFlush() call sites
named require that we verify that we've flushed up to their LSN before
control returns to them. FlushBuffer() does for sure.

I think if we were to rename the parameters, then they should be called

group_commit_delay
group_commit_siblings

Since "Group Commit" is the accepted term for this behaviour, even
though, as you point out that the behaviour isn't just about commit.

+1. I understand that that is the precise definition of group commit
in Oracle, for example, where the feature is explicitly framed as a
trade-off between throughput and latency (any of the other things that
we might have called group commit at various times are not). The fact
that group commit implies that non-commit flushes will also be batched
seems rather academic, on reflection.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

#13Simon Riggs
simon@2ndQuadrant.com
In reply to: Peter Geoghegan (#12)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

On 30 May 2012 17:19, Peter Geoghegan <peter@2ndquadrant.com> wrote:

On 30 May 2012 15:25, Simon Riggs <simon@2ndquadrant.com> wrote:

1. It seems wrong to do it in xact_redo_commit_internal().  It won't
matter if commit_siblings>0 since there won't be any other backends
with transaction IDs anyway, but if commit_siblings==0 then we'll
sleep for no possible benefit.

Agreed

I've looked at this more closely now and I can see that the call to
XLogFlush() that is made from xact_redo_commit_internal() doesn't ever
actually flush WAL, so whether we delay or not is completely
irrelevant.

So un-agreed. No change required to patch there.

2. Doing it in FlushBuffer() seems slightly iffy since we might be
sitting on a buffer lock.  But maybe it's a win anyway, or just not
worth worrying about.

Agreed.

As I've pointed out, we cannot meaningfully skip the wait for
auxiliary processes alone (except perhaps by having commit_siblings
set sufficiently high).

The remaining cases aren't worth worrying about, apart from
SlruPhysicalWritePage() which happens during visibility checks and
needs to happen as quickly as possible also.

I'm not so sure. It says in that function:

               /*
                * We must determine the largest async-commit LSN for the page. This
                * is a bit tedious, but since this entire function is a slow path
                * anyway, it seems better to do this here than to maintain a per-page
                * LSN variable (which'd need an extra comparison in the
                * transaction-commit path).
                */

I would say the additional contention from waiting outweighs the
benefit of the wait in those 3 places, so skipping the wait is wise.

MinimumActiveBackends() reports the "count backends (other than
myself) that are in active transactions", so unnecessary calls will
have to occur when we have active transactions >= CommitSiblings, not
connections >= CommitSiblings as was previously the case.

What if we were to skip the wait during recovery only, by specially
setting CommitDelay to 0 in the start-up process? Would that satisfy
everyone's concerns about unhelpful delays? I'm not sure how this
might interact with hot standby.

Hmm, that was a good idea, but as of my comments above, that isn't
required or useful.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

#14Peter Geoghegan
peter@2ndquadrant.com
In reply to: Simon Riggs (#13)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

On 31 May 2012 11:19, Simon Riggs <simon@2ndquadrant.com> wrote:

I've looked at this more closely now and I can see that the call to
XLogFlush() that is made from xact_redo_commit_internal() doesn't ever
actually flush WAL, so whether we delay or not is completely
irrelevant.

So un-agreed. No change required to patch there.

So, does that clear up the question of it being acceptable to add a
delay to every existing XLogFlush() call site? I think so.

Aside from the outstanding question of what to rename
commit_delay/commit_siblings to, and how we might want to reframe
those settings in the docs, I think that's everything.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

#15Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#13)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

On Thu, May 31, 2012 at 6:19 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

I've looked at this more closely now and I can see that the call to
XLogFlush() that is made from xact_redo_commit_internal() doesn't ever
actually flush WAL, so whether we delay or not is completely
irrelevant.

So un-agreed. No change required to patch there.

I think Peter's suggestion of forcibly setting the delay to 0 in the
startup process is a good one, though. It's one line of code, and if
it isn't strictly necessary today, it still seems like good
future-proofing.

I am not very happy about the idea of renaming commit_* to
group_commit_*. It's basically a cosmetic renaming, and breaking
existing configuration files for cosmetic purposes does not seem
warranted to me, especially when the old and new names are so close.
I certainly don't think we can do that in 9.2, now that beta1 has
already shipped. Modifying the default contents of postgresql.conf
after we've shipped beta has been a historical no-no for reasons that
escape me at the moment, but IIRC they're not stupid reasons.

Frankly, I think this whole thing should be pushed to 9.3. The
commit_delay and commit_siblings knobs suck, but they've sucked for a
long time, and it won't kill anybody to wait another release cycle to
fix them. We have plenty of more important things queued up for 9.3
already, and I don't believe there's any compelling reason to think
that this particular thing needs preferential treatment.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#16Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#15)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

On 31 May 2012 13:16, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, May 31, 2012 at 6:19 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

I've looked at this more closely now and I can see that the call to
XLogFlush() that is made from xact_redo_commit_internal() doesn't ever
actually flush WAL, so whether we delay or not is completely
irrelevant.

So un-agreed. No change required to patch there.

I think Peter's suggestion of forcibly setting the delay to 0 in the
startup process is a good one, though.  It's one line of code, and if
it isn't strictly necessary today, it still seems like good
future-proofing.

Adding a line that does nothing is not a good idea. The Startup
process flushes very, very few WAL messages, so the setting is
irrelevant.

I am not very happy about the idea of renaming commit_* to
group_commit_*.  It's basically a cosmetic renaming, and breaking
existing configuration files for cosmetic purposes does not seem
warranted to me, especially when the old and new names are so close.
I certainly don't think we can do that in 9.2, now that beta1 has
already shipped.  Modifying the default contents of postgresql.conf
after we've shipped beta has been a historical no-no for reasons that
escape me at the moment, but IIRC they're not stupid reasons.

Frankly, I think this whole thing should be pushed to 9.3.  The
commit_delay and commit_siblings knobs suck, but they've sucked for a
long time, and it won't kill anybody to wait another release cycle to
fix them.  We have plenty of more important things queued up for 9.3
already, and I don't believe there's any compelling reason to think
that this particular thing needs preferential treatment.

No problem with pushing a variable rename through to 9.3. To be
honest, I don't care whether we rename them or not.

What matters is that we have a patch that provides a massive
performance gain in write performance in just a few lines of code, and
that should be committed to 9.2.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

#17Peter Geoghegan
peter@2ndquadrant.com
In reply to: Robert Haas (#15)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

On 31 May 2012 13:16, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, May 31, 2012 at 6:19 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
Frankly, I think this whole thing should be pushed to 9.3.  The
commit_delay and commit_siblings knobs suck, but they've sucked for a
long time, and it won't kill anybody to wait another release cycle to
fix them.  We have plenty of more important things queued up for 9.3
already, and I don't believe there's any compelling reason to think
that this particular thing needs preferential treatment.

Why do you think that? Those knobs are now quite ineffective, though
we never even considered that when the group commit delay patch was
committed. The entire body of research and commentary that exists on
commit_delay has been invalidated for 9.2. If that isn't something
that needs to be addressed before release, I don't know what is. The
fact that the patch can sometimes double transaction throughput for an
absolutely trivial change, moving 2 lines of code, is also a good
reason to not bump this for another year.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

#18Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#17)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

On Thu, May 31, 2012 at 8:38 AM, Peter Geoghegan <peter@2ndquadrant.com> wrote:

On 31 May 2012 13:16, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, May 31, 2012 at 6:19 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
Frankly, I think this whole thing should be pushed to 9.3.  The
commit_delay and commit_siblings knobs suck, but they've sucked for a
long time, and it won't kill anybody to wait another release cycle to
fix them.  We have plenty of more important things queued up for 9.3
already, and I don't believe there's any compelling reason to think
that this particular thing needs preferential treatment.

Why do you think that? Those knobs are now quite ineffective, though
we never even considered that when the group commit delay patch was
committed.  The entire body of research and commentary that exists on
commit_delay has been invalidated for 9.2. If that isn't something
that needs to be addressed before release, I don't know what is. The
fact that the patch can sometimes double transaction throughput for an
absolutely trivial change, moving 2 lines of code, is also a good
reason to not bump this for another year.

Fixing regressions before release is essential; improving performance
is not - especially when the improvement relates to a little-used
feature that you were proposing to get rid of two weeks ago. It can't
simultaneously be so unimportant that we should remove it altogether
and so important that it's must-fix-before-release, and if one test
can completely overturn your view of which category this falls into,
that seems like a reason for taking some more time to think it over
and, perhaps, run more tests. We don't have a lot of latitude to
maneuver at this point - anything we do now is going to go straight
out into the wild. Caution is appropriate.

However, rather than arguing about it, let's see if anyone else has an opinion.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#16)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

Simon Riggs <simon@2ndQuadrant.com> writes:

On 31 May 2012 13:16, Robert Haas <robertmhaas@gmail.com> wrote:

Frankly, I think this whole thing should be pushed to 9.3.

What matters is that we have a patch that provides a massive
performance gain in write performance in just a few lines of code, and
that should be committed to 9.2.

I agree with Robert on this. This patch hasn't had *nearly* enough
testing to justify cramming it into 9.2 at this point. AFAIK the
claim of "massive performance gain" is based on a single test case run
by a single person, which doesn't even give me any confidence that it
doesn't break anything, much less that it's a win across the board.

If we want to finish the beta cycle in a reasonable time period and get
back to actual development, we have to refrain from adding more
possibly-destabilizing development work to 9.2. And that is what
this is.

Add it to the upcoming CF, please.

regards, tom lane

#20Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#19)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

On 31 May 2012 15:00, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon@2ndQuadrant.com> writes:

On 31 May 2012 13:16, Robert Haas <robertmhaas@gmail.com> wrote:

Frankly, I think this whole thing should be pushed to 9.3.

What matters is that we have a patch that provides a massive
performance gain in write performance in just a few lines of code, and
that should be committed to 9.2.

I agree with Robert on this.  This patch hasn't had *nearly* enough
testing to justify cramming it into 9.2 at this point.  AFAIK the
claim of "massive performance gain" is based on a single test case run
by a single person, which doesn't even give me any confidence that it
doesn't break anything, much less that it's a win across the board.

I agree with you. You would be mistaken if you thought that I think
Peter's laptop was sufficient proof for anyone to commit something and
I've already said exactly that to him.

My description of "massive performance gain" is appropriate based on
the measurements so far.

If we want to finish the beta cycle in a reasonable time period and get
back to actual development, we have to refrain from adding more
possibly-destabilizing development work to 9.2.  And that is what
this is.

In what way is it possibly destabilising? I see nothing in the patch
to merit that claim, so presumably you haven't read the patch yet?

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#20)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

Simon Riggs <simon@2ndQuadrant.com> writes:

On 31 May 2012 15:00, Tom Lane <tgl@sss.pgh.pa.us> wrote:

If we want to finish the beta cycle in a reasonable time period and get
back to actual development, we have to refrain from adding more
possibly-destabilizing development work to 9.2. �And that is what
this is.

In what way is it possibly destabilising?

I'm prepared to believe that it only affects performance, but it could
be destabilizing to that. It needs proper review and testing, and the
next CF is the right environment for that to happen.

regards, tom lane

#22Peter Geoghegan
peter@2ndquadrant.com
In reply to: Robert Haas (#18)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

On 31 May 2012 14:58, Robert Haas <robertmhaas@gmail.com> wrote:

Fixing regressions before release is essential; improving performance
is not - especially when the improvement relates to a little-used
feature that you were proposing to get rid of two weeks ago.

Yes, the fact that I wanted to get rid of commit_delay is well
established - I called for its deprecation in a dedicated thread, and
during my talk at pgCon. Bruce's confusion as to how that interacted
with what I've been calling "new group commit" was actually what
crystallised my position here: it is trying, for the most part, to do
the same thing as new group commit, but in an entirely orthogonal way.
Bruce's confusion actually reflected the confusion of the code. So I'm
in a sense removing the overlap between commit_delay used to do but
now but shouldn't try to do anymore (make commits coincide, giving
good benchmark results) and what new group commit now does, while
preserving commit_delay's ability to trade off latency for throughput.

I didn't have an answer to the question of how we might continue to
offer a throughput/latency trade-off to users before, but knew that
with 9.2, commit_delay was totally ineffective anyway. The realisation
that it could be made effective by working with rather than against
new group commit changed my mind.

It can't simultaneously be so unimportant that we should remove it altogether
and so important that it's must-fix-before-release, and if one test
can completely overturn your view of which category this falls into,
that seems like a reason for taking some more time to think it over
and, perhaps, run more tests.  We don't have a lot of latitude to
maneuver at this point - anything we do now is going to go straight
out into the wild.  Caution is appropriate.

The patch can be justified as a way of removing the tension between
new group commit and commit_delay. Removing commit_delay would also do
this, but then there'd be no way to make the aforementioned trade-off
that we previously offered. I suspect that if it restored the peaks
and valleys of commit_delay's changes to throughput in 9.1, over and
above a new group commit baseline, this would be more readily
accepted. I hope the patch isn't being punished for being effective.
Yes, it does offer a large boost to performance, but that happens to
be incidental, unlikely though that sounds.

You've called this a clever idea. I actually don't agree. I was fairly
surprised that no one noticed this earlier. It is rather obviously the
case that a delay that hopes to maximise the batching of commits at
the expense of latency should occur only in a single leader backend
that will proceed with the flush for the batch, and not within each
and every backend as it commits.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

#23Peter Geoghegan
peter@2ndquadrant.com
In reply to: Tom Lane (#21)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

On 31 May 2012 16:23, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon@2ndQuadrant.com> writes:

In what way is it possibly destabilising?

I'm prepared to believe that it only affects performance, but it could
be destabilizing to that.  It needs proper review and testing, and the
next CF is the right environment for that to happen.

It couldn't possibly be as destabilising to performance as
commit_delay was in 9.1.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

#24Peter Geoghegan
peter@2ndquadrant.com
In reply to: Peter Geoghegan (#23)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

On 31 May 2012 16:26, Peter Geoghegan <peter@2ndquadrant.com> wrote:

On 31 May 2012 16:23, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon@2ndQuadrant.com> writes:

In what way is it possibly destabilising?

I'm prepared to believe that it only affects performance, but it could
be destabilizing to that.  It needs proper review and testing, and the
next CF is the right environment for that to happen.

It couldn't possibly be as destabilising to performance as
commit_delay was in 9.1.

Furthermore, it couldn't possibly affect performance in any way unless
commit_delay is set. I've just moved the delay site so that its only
executed by the group commit leader. The leader would execute the code
anyway, but now the followers don't.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

#25Peter Geoghegan
peter@2ndquadrant.com
In reply to: Tom Lane (#21)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

This is a benchmark I performed on the same hardware, for tpc-b.sql,
with a commit_delay of 0 and 4000 for the patch:

http://tpcbdelay.staticloud.com/

There is a rather large improvement in throughput here. Robert
previously complained that our group commit implementation didn't do
very well on that benchmark. This seems to make all the difference.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

#26Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Peter Geoghegan (#25)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

Tom Lane wrote:

Simon Riggs writes:

On 31 May 2012 15:00, Tom Lane wrote:

If we want to finish the beta cycle in a reasonable time period
and get back to actual development, we have to refrain from
adding more possibly-destabilizing development work to 9.2. And
that is what this is.

In what way is it possibly destabilising?

I'm prepared to believe that it only affects performance, but it
could be destabilizing to that. It needs proper review and testing,
and the next CF is the right environment for that to happen.

+1

This is not a bug fix or even a fix for a performance regression.
The train has left the station; the next one will be along shortly.

-Kevin

#27Robert Haas
robertmhaas@gmail.com
In reply to: Kevin Grittner (#26)
1 attachment(s)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

On Sat, Jun 2, 2012 at 10:44 AM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

Tom Lane  wrote:

Simon Riggs  writes:

On 31 May 2012 15:00, Tom Lane  wrote:

If we want to finish the beta cycle in a reasonable time period
and get back to actual development, we have to refrain from
adding more possibly-destabilizing development work to 9.2. And
that is what this is.

In what way is it possibly destabilising?

I'm prepared to believe that it only affects performance, but it
could be destabilizing to that. It needs proper review and testing,
and the next CF is the right environment for that to happen.

+1

This is not a bug fix or even a fix for a performance regression.
The train has left the station; the next one will be along shortly.

And here it is. There are a couple of outstanding issues here upon
which it would be helpful to get input from a few more people:

1. Are there any call sites from which this should be disabled, either
because the optimization won't help, or because the caller is holding
a lock that we don't want them sitting on for a moment longer than
necessary? I think the worst case is that we're doing something like
splitting the root page of a btree, and now nobody can walk the btree
until the flush finishes, and here we are doing an "unnecessary"
sleep. I am not sure where I can construct a workload where this is a
real as opposed to a theoretical problem, though. So maybe we should
just not worry about it. It suffices for this to be an improvement
over the status quo; it doesn't have to be perfect. Thoughts?

2. Should we rename the GUCs, since this patch will cause them to
control WAL flush in general, as opposed to commit specifically?
Peter Geoghegan and Simon were arguing that we should retitle it to
group_commit_delay rather than just commit_delay, but that doesn't
seem to be much of an improvement in describing what the new behavior
will actually be, and I am doubtful that it is worth creating a naming
incompatibility with previous releases for a cosmetic change. I
suggested wal_flush_delay, but there's no consensus on that.
Opinions?

Also, I think I see a straightforward, if minor, improvement. Right
now, the patch has this:

* Sleep before flush! By adding a delay here, we may give further
* backends the opportunity to join the backlog of group commit
* followers; this can significantly improve transaction throughput, at
* the risk of increasing transaction latency.
*
* We do not sleep if enableFsync is not turned on, nor if there are
* fewer than CommitSiblings other backends with active transactions.
*/
if (CommitDelay > 0 && enableFsync &&
MinimumActiveBackends(CommitSiblings))
pg_usleep(CommitDelay);

/* Got the lock */
LogwrtResult = XLogCtl->LogwrtResult;
if (!XLByteLE(record, LogwrtResult.Flush))
{
/* try to write/flush later additions to XLOG as well */
if (LWLockConditionalAcquire(WALInsertLock, LW_EXCLUSIVE))

The XLByteLE test four lines from the bottom should happen before we
consider whether to sleep, because it's possible that we'll discover
that our flush request has already been satisfied and exit without
doing anything, in which case the sleep is a waste. The attached
version adjusts the logic accordingly.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

move_delay_2012_06_27.patchapplication/octet-stream; name=move_delay_2012_06_27.patchDownload
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
***************
*** 1869,1888 **** SET ENABLE_SEQSCAN TO OFF;
          When the commit data for a transaction is flushed to disk, any
          additional commits ready at that time are also flushed out.
          <varname>commit_delay</varname> adds a time delay, set in
!         microseconds, before a transaction attempts to
!         flush the WAL buffer out to disk.  A nonzero delay can allow more
!         transactions to be committed with only one flush operation, if
!         system load is high enough that additional transactions become
!         ready to commit within the given interval. But the delay is
!         just wasted if no other transactions become ready to
!         commit. Therefore, the delay is only performed if at least
!         <varname>commit_siblings</varname> other transactions are
!         active at the instant that a server process has written its
!         commit record.
          The default <varname>commit_delay</> is zero (no delay).
-         Since all pending commit data will be written at every flush
-         regardless of this setting, it is rare that adding delay
-         by increasing this parameter will actually improve performance.
         </para>
        </listitem>
       </varlistentry>
--- 1869,1888 ----
          When the commit data for a transaction is flushed to disk, any
          additional commits ready at that time are also flushed out.
          <varname>commit_delay</varname> adds a time delay, set in
!         microseconds, before a leading transaction participating in
!         group commit attempts to flush the WAL buffer out to disk.
!         This can add an additional latency of of up to
!         <varname>commit_delay</varname> microseconds for each transaction.
!         A nonzero delay can allow more transactions to be committed with
!         only one flush operation, if system load is high enough that
!         additional transactions become ready to commit within the
!         given interval. However, the delay is just wasted if no other
!         transactions become ready to commit. Therefore, the delay
!         is only performed if at least <varname>commit_siblings</varname>
!         other transactions are active immediately before the leader
!         backend participating in group commit proceeds with flushing
!         WAL.
          The default <varname>commit_delay</> is zero (no delay).
         </para>
        </listitem>
       </varlistentry>
*** a/doc/src/sgml/wal.sgml
--- b/doc/src/sgml/wal.sgml
***************
*** 376,384 ****
     <acronym>WAL</acronym> to disk, in the hope that a single flush
     executed by one such transaction can also serve other transactions
     committing at about the same time.  Setting <varname>commit_delay</varname>
!    can only help when there are many concurrently committing transactions,
!    and it is difficult to tune it to a value that actually helps rather
!    than hurt throughput.
    </para>
  
   </sect1>
--- 376,382 ----
     <acronym>WAL</acronym> to disk, in the hope that a single flush
     executed by one such transaction can also serve other transactions
     committing at about the same time.  Setting <varname>commit_delay</varname>
!    can only help when there are many concurrently committing transactions.
    </para>
  
   </sect1>
*** a/src/backend/access/transam/xact.c
--- b/src/backend/access/transam/xact.c
***************
*** 68,76 **** bool		XactDeferrable;
  
  int			synchronous_commit = SYNCHRONOUS_COMMIT_ON;
  
- int			CommitDelay = 0;	/* precommit delay in microseconds */
- int			CommitSiblings = 5; /* # concurrent xacts needed to sleep */
- 
  /*
   * MyXactAccessedTempRel is set when a temporary relation is accessed.
   * We don't allow PREPARE TRANSACTION in that case.  (This is global
--- 68,73 ----
***************
*** 1123,1144 **** RecordTransactionCommit(void)
  	if ((wrote_xlog && synchronous_commit > SYNCHRONOUS_COMMIT_OFF) ||
  		forceSyncCommit || nrels > 0)
  	{
- 		/*
- 		 * Synchronous commit case:
- 		 *
- 		 * Sleep before flush! So we can flush more than one commit records
- 		 * per single fsync.  (The idea is some other backend may do the
- 		 * XLogFlush while we're sleeping.  This needs work still, because on
- 		 * most Unixen, the minimum select() delay is 10msec or more, which is
- 		 * way too long.)
- 		 *
- 		 * We do not sleep if enableFsync is not turned on, nor if there are
- 		 * fewer than CommitSiblings other backends with active transactions.
- 		 */
- 		if (CommitDelay > 0 && enableFsync &&
- 			MinimumActiveBackends(CommitSiblings))
- 			pg_usleep(CommitDelay);
- 
  		XLogFlush(XactLastRecEnd);
  
  		/*
--- 1120,1125 ----
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***************
*** 80,85 **** bool		fullPageWrites = true;
--- 80,87 ----
  bool		log_checkpoints = false;
  int			sync_method = DEFAULT_SYNC_METHOD;
  int			wal_level = WAL_LEVEL_MINIMAL;
+ int			CommitDelay = 0;	/* precommit delay in microseconds */
+ int			CommitSiblings = 5; /* # concurrent xacts needed to sleep */
  
  #ifdef WAL_DEBUG
  bool		XLOG_DEBUG = false;
***************
*** 2085,2118 **** XLogFlush(XLogRecPtr record)
  			 */
  			continue;
  		}
! 		/* Got the lock */
  		LogwrtResult = XLogCtl->LogwrtResult;
! 		if (!XLByteLE(record, LogwrtResult.Flush))
  		{
! 			/* try to write/flush later additions to XLOG as well */
! 			if (LWLockConditionalAcquire(WALInsertLock, LW_EXCLUSIVE))
! 			{
! 				XLogCtlInsert *Insert = &XLogCtl->Insert;
! 				uint32		freespace = INSERT_FREESPACE(Insert);
  
! 				if (freespace == 0)		/* buffer is full */
! 					WriteRqstPtr = XLogCtl->xlblocks[Insert->curridx];
! 				else
! 				{
! 					WriteRqstPtr = XLogCtl->xlblocks[Insert->curridx];
! 					WriteRqstPtr -= freespace;
! 				}
! 				LWLockRelease(WALInsertLock);
! 				WriteRqst.Write = WriteRqstPtr;
! 				WriteRqst.Flush = WriteRqstPtr;
! 			}
  			else
  			{
! 				WriteRqst.Write = WriteRqstPtr;
! 				WriteRqst.Flush = record;
  			}
! 			XLogWrite(WriteRqst, false, false);
  		}
  		LWLockRelease(WALWriteLock);
  		/* done */
  		break;
--- 2087,2135 ----
  			 */
  			continue;
  		}
! 
! 		/* Got the lock; recheck whether request is satisfied */
  		LogwrtResult = XLogCtl->LogwrtResult;
! 		if (XLByteLE(record, LogwrtResult.Flush))
! 			break;
! 
! 		/*
! 		 * Sleep before flush! By adding a delay here, we may give further
! 		 * backends the opportunity to join the backlog of group commit
! 		 * followers; this can significantly improve transaction throughput, at
! 		 * the risk of increasing transaction latency.
! 		 *
! 		 * We do not sleep if enableFsync is not turned on, nor if there are
! 		 * fewer than CommitSiblings other backends with active transactions.
! 		 */
! 		if (CommitDelay > 0 && enableFsync &&
! 			MinimumActiveBackends(CommitSiblings))
! 			pg_usleep(CommitDelay);
! 
! 		/* try to write/flush later additions to XLOG as well */
! 		if (LWLockConditionalAcquire(WALInsertLock, LW_EXCLUSIVE))
  		{
! 			XLogCtlInsert *Insert = &XLogCtl->Insert;
! 			uint32		freespace = INSERT_FREESPACE(Insert);
  
! 			if (freespace == 0)		/* buffer is full */
! 				WriteRqstPtr = XLogCtl->xlblocks[Insert->curridx];
  			else
  			{
! 				WriteRqstPtr = XLogCtl->xlblocks[Insert->curridx];
! 				WriteRqstPtr -= freespace;
  			}
! 			LWLockRelease(WALInsertLock);
! 			WriteRqst.Write = WriteRqstPtr;
! 			WriteRqst.Flush = WriteRqstPtr;
  		}
+ 		else
+ 		{
+ 			WriteRqst.Write = WriteRqstPtr;
+ 			WriteRqst.Flush = record;
+ 		}
+ 		XLogWrite(WriteRqst, false, false);
+ 
  		LWLockRelease(WALWriteLock);
  		/* done */
  		break;
#28Peter Geoghegan
peter@2ndquadrant.com
In reply to: Robert Haas (#27)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

On 27 June 2012 23:01, Robert Haas <robertmhaas@gmail.com> wrote:

1. Are there any call sites from which this should be disabled, either
because the optimization won't help, or because the caller is holding
a lock that we don't want them sitting on for a moment longer than
necessary?  I think the worst case is that we're doing something like
splitting the root page of a btree, and now nobody can walk the btree
until the flush finishes, and here we are doing an "unnecessary"
sleep.  I am not sure where I can construct a workload where this is a
real as opposed to a theoretical problem, though.  So maybe we should
just not worry about it.  It suffices for this to be an improvement
over the status quo; it doesn't have to be perfect.  Thoughts?

I also find it hard to believe this is a concern. As I've said before
though, it hasn't been adequately explained just how you're supposed
to skip the delay if you're in this situation. It's highly unlikely
that you'll be the one doing the delaying anyway. Skipping the delay
iff you're the leader when this might possibly be a problem will only
*conceivably* be effective in a tiny minority of cases, so this seems
to make little sense to me.

Look at the original benchmark - there are latency numbers there too.
The patch actually performs slightly better than baseline in terms of
latency (worst and average cases) as well as throughput. To get back
to bus analogies again, if people start treating buses like Taxis, you
get slightly better minimum latency figures (not included in any
benchmark performed), because one or two people immediately
high-jacked a bus and left on an hour long journey rather than waiting
another 15 minutes for more passengers to come along. That doesn't
seem like it should be something to concern ourselves with. More to
the point, I think that there are "laws of physics" reasons why these
backends that might be particularly adversely affected by a delay (due
to some eventuality like the one you described) cannot *really* avoid
a delay. Of course this isn't an absolute, and certainly won't apply
when there are relatively few clients, when the delay really is
useless, but then we trust the dba to set commit_siblings (and indeed
commit_delay) correctly, and it's hardly that big of a deal, since
it's only typically a fraction of a single fsync() longer at worst -
certainly not a total wait that is longer than the total wait the
backend could reasonably expect to have.

Incidentally, I'm guessing someone is going to come up with a
rule-of-thumb for setting commit_delay that sounds something like
"half the latency of your wal_sync_method as reported by
pg_test_fsync()".

2. Should we rename the GUCs, since this patch will cause them to
control WAL flush in general, as opposed to commit specifically?
Peter Geoghegan and Simon were arguing that we should retitle it to
group_commit_delay rather than just commit_delay, but that doesn't
seem to be much of an improvement in describing what the new behavior
will actually be, and I am doubtful that it is worth creating a naming
incompatibility with previous releases for a cosmetic change.  I
suggested wal_flush_delay, but there's no consensus on that.
Opinions?

My main problem with the existing name is that every single article on
commit_delay since it was introduce over ten years ago has been on
balance very negative. Greg Smith's book, the docs, my talk at PgCon,
and any other material in existence about commit_delay that I'm aware
of says the same thing.

commit_delay in master is essentially the same now as it was in 2000;
it's just that commit_siblings is a bit smarter, in that it is now
based on number of active transactions specifically.

Now, based on the fact that the doc changes concerning the practical
details of setting commit_delay survived your editorialisations, I
take it that you accept my view that this is going to fundamentally
alter the advice that surrounds commit_delay, from "just don't touch
it" to something like "this is something that you should probably
consider, that may be very compelling in certain situations".

Let's not shoot ourselves in the foot by keeping the old, disreputable
name. You felt that wal_flush_delay more accurately reflected what the
new setting actually does. You were right, and I'd be perfectly happy
to go along with that.

The XLByteLE test four lines from the bottom should happen before we
consider whether to sleep, because it's possible that we'll discover
that our flush request has already been satisfied and exit without
doing anything, in which case the sleep is a waste.  The attached
version adjusts the logic accordingly.

Seems like a minor point, unlikely to make any appreciable difference,
but there's no reason to think that it will have any negative effect
compared to what I've proposed either, so that's fine with me.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

#29Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#27)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

On 27 June 2012 23:01, Robert Haas <robertmhaas@gmail.com> wrote:

1. Are there any call sites from which this should be disabled, either
because the optimization won't help, or because the caller is holding
a lock that we don't want them sitting on for a moment longer than
necessary?  I think the worst case is that we're doing something like
splitting the root page of a btree, and now nobody can walk the btree
until the flush finishes, and here we are doing an "unnecessary"
sleep.  I am not sure where I can construct a workload where this is a
real as opposed to a theoretical problem, though.  So maybe we should
just not worry about it.  It suffices for this to be an improvement
over the status quo; it doesn't have to be perfect.  Thoughts?

Let's put this in now, without too much fiddling. There is already a
GUC to disable it, so measurements can be made to see if this causes
any problems.

If there are problems, we fix them. We can't second guess everything.

2. Should we rename the GUCs, since this patch will cause them to
control WAL flush in general, as opposed to commit specifically?
Peter Geoghegan and Simon were arguing that we should retitle it to
group_commit_delay rather than just commit_delay, but that doesn't
seem to be much of an improvement in describing what the new behavior
will actually be, and I am doubtful that it is worth creating a naming
incompatibility with previous releases for a cosmetic change.  I
suggested wal_flush_delay, but there's no consensus on that.
Opinions?

Again, leave the naming of that for later. The idea of a rename came
from yourself, IIRC.

The XLByteLE test four lines from the bottom should happen before we
consider whether to sleep, because it's possible that we'll discover
that our flush request has already been satisfied and exit without
doing anything, in which case the sleep is a waste.  The attached
version adjusts the logic accordingly.

I thought there already was a test like that earlier in the flush.

I agree there needs to be one.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

#30Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#29)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

On Thu, Jun 28, 2012 at 4:21 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

Let's put this in now, without too much fiddling. There is already a
GUC to disable it, so measurements can be made to see if this causes
any problems.

If there are problems, we fix them. We can't second guess everything.

Fair enough.

2. Should we rename the GUCs, since this patch will cause them to
control WAL flush in general, as opposed to commit specifically?
Peter Geoghegan and Simon were arguing that we should retitle it to
group_commit_delay rather than just commit_delay, but that doesn't
seem to be much of an improvement in describing what the new behavior
will actually be, and I am doubtful that it is worth creating a naming
incompatibility with previous releases for a cosmetic change.  I
suggested wal_flush_delay, but there's no consensus on that.
Opinions?

Again, leave the naming of that for later. The idea of a rename came
from yourself, IIRC.

Deciding on a name is not such a hard thing that leaving it till later
solves any problem. Let's just make a decision and be done with it.

The XLByteLE test four lines from the bottom should happen before we
consider whether to sleep, because it's possible that we'll discover
that our flush request has already been satisfied and exit without
doing anything, in which case the sleep is a waste.  The attached
version adjusts the logic accordingly.

I thought there already was a test like that earlier in the flush.

I agree there needs to be one.

There are several of them floating around; the point here is just to
make sure that the sleep is after all of them.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#31Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Robert Haas (#30)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

On 28.06.2012 15:18, Robert Haas wrote:

On Thu, Jun 28, 2012 at 4:21 AM, Simon Riggs<simon@2ndquadrant.com> wrote:

2. Should we rename the GUCs, since this patch will cause them to
control WAL flush in general, as opposed to commit specifically?
Peter Geoghegan and Simon were arguing that we should retitle it to
group_commit_delay rather than just commit_delay, but that doesn't
seem to be much of an improvement in describing what the new behavior
will actually be, and I am doubtful that it is worth creating a naming
incompatibility with previous releases for a cosmetic change. I
suggested wal_flush_delay, but there's no consensus on that.
Opinions?

Again, leave the naming of that for later. The idea of a rename came
from yourself, IIRC.

Deciding on a name is not such a hard thing that leaving it till later
solves any problem. Let's just make a decision and be done with it.

FWIW, I think commit_delay is just fine. In practice, it's mostly
commits that are affected, anyway. If we try to be more exact, I think
it's just going to be more confusing to users.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#32Peter Geoghegan
peter@2ndquadrant.com
In reply to: Heikki Linnakangas (#31)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

On 28 June 2012 18:57, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

FWIW, I think commit_delay is just fine. In practice, it's mostly commits
that are affected, anyway. If we try to be more exact, I think it's just
going to be more confusing to users.

You think it will confuse users less if we start telling them to use
something that we have a very long history of telling them not to use?
The docs say "it is rare that adding delay by increasing this
parameter will actually improve performance". The book PostgreSQL
high-performance says of commit_delay (and commit_siblings) "These are
not effective parameters to tune in most cases. It is extremely
difficult to show any speed-up by adjusting them, and quite easy to
slow every transaction down by tweaking them". Who would be brave
enough to use that in production? Unless you still think these
statements are true, and I don't see why you would, it seems like a
really bad judgement to not change the name.

Is anyone aware of a non-zero commit_delay in the wild today? I
personally am not.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

#33Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Peter Geoghegan (#32)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

Peter Geoghegan <peter@2ndquadrant.com> wrote:

Is anyone aware of a non-zero commit_delay in the wild today? I
personally am not.

http://archives.postgresql.org/pgsql-performance/2011-11/msg00083.php

-Kevin

#34Peter Geoghegan
peter@2ndquadrant.com
In reply to: Kevin Grittner (#33)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

On 28 June 2012 19:25, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:

Peter Geoghegan <peter@2ndquadrant.com> wrote:

Is anyone aware of a non-zero commit_delay in the wild today? I
personally am not.

http://archives.postgresql.org/pgsql-performance/2011-11/msg00083.php

In that thread, Robert goes on to say to the OP that has set commit_delay:

On Fri, Nov 4, 2011 at 2:45 PM, Claudio Freire <klaussfreire@gmail.com> wrote:

I don't think 1 second can be such a big difference for the bgwriter,
but I might be wrong.

Well, the default value is 200 ms. And I've never before heard of
anyone tuning it up, except maybe to save on power consumption on a
system with very low utilization. Nearly always you want to reduce
it.

The wal_writer makes me doubt, though. If logged activity was higher
than 8MB/s, then that setting would block it all.
I guess I really should lower it.

Here again, you've set it to ten times the default value. That
doesn't seem like a good idea. I would start with the default and
tune down.

So, let me rephrase my question: Is anyone aware of a non-zero
commit_delay in the wild today with sensible reasoning behind it?

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

#35Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#32)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

On Thu, Jun 28, 2012 at 2:18 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:

You think it will confuse users less if we start telling them to use
something that we have a very long history of telling them not to use?

I don't buy this line of reasoning at all. If we're going to rename
the GUC, it should be for accuracy, not PR value. If we start
renaming something every time we improve it, we're going to go nuts.
We improved lots of things in 9.2; they didn't all get renamed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#36Peter Geoghegan
peter@2ndquadrant.com
In reply to: Robert Haas (#35)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

On 28 June 2012 19:55, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Jun 28, 2012 at 2:18 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:

You think it will confuse users less if we start telling them to use
something that we have a very long history of telling them not to use?

I don't buy this line of reasoning at all.  If we're going to rename
the GUC, it should be for accuracy, not PR value.  If we start
renaming something every time we improve it, we're going to go nuts.
We improved lots of things in 9.2; they didn't all get renamed.

That is a false equivalence, and you know it.

Who said anything about PR value? I'm concerned with not confusing users.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

#37Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#35)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Jun 28, 2012 at 2:18 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:

You think it will confuse users less if we start telling them to use
something that we have a very long history of telling them not to use?

I don't buy this line of reasoning at all. If we're going to rename
the GUC, it should be for accuracy, not PR value. If we start
renaming something every time we improve it, we're going to go nuts.
We improved lots of things in 9.2; they didn't all get renamed.

See VACUUM FULL for a recent counterexample --- we basically jacked it
up and drove a new implementation underneath, but we didn't change the
name, despite the fact that we were obsoleting a whole lot more folk
knowledge than exists around commit_delay.

Of course, there were application-compatibility reasons not to rename
that command, which wouldn't apply so much to commit_delay. But still,
we have precedent for expecting that we can fix external documentation
rather than trying to code around whatever it says.

regards, tom lane

#38Peter Geoghegan
peter@2ndquadrant.com
In reply to: Tom Lane (#37)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

On 28 June 2012 20:00, Tom Lane <tgl@sss.pgh.pa.us> wrote:

See VACUUM FULL for a recent counterexample --- we basically jacked it
up and drove a new implementation underneath, but we didn't change the
name, despite the fact that we were obsoleting a whole lot more folk
knowledge than exists around commit_delay.

Of course, there were application-compatibility reasons not to rename
that command, which wouldn't apply so much to commit_delay.  But still,
we have precedent for expecting that we can fix external documentation
rather than trying to code around whatever it says.

I'm sure you're right to some degree. We can rely on some, maybe even
most users to go and learn about these things, or hear about them
somehow. But why should we do it that way, when what I've proposed is
so much simpler, and has no plausible downside?

http://archives.postgresql.org/pgsql-hackers/2005-06/msg01463.php

Here, you say:

""
The issue here is not "is group commit a good idea in the abstract?".
It is "is the commit_delay implementation of the idea worth a dime?"
... and the evidence we have all points to the answer "NO". We should
not let theoretical arguments blind us to this.
""

What happens when someone reads this, or many other such statements
were made before or since, when they are considering setting
commit_delay now?

The VACUUM FULL implementation is now very close to being nothing more
than a synonym of CLUSTER. The only reason it happened that way was
because it was easier than just deprecating VACUUM FULL. The old
VACUUM FULL actually had something to recommend it. It seems unlikely
that the same thing can be said for commit_delay.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

#39Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#36)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

On Thu, Jun 28, 2012 at 2:58 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:

On 28 June 2012 19:55, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Jun 28, 2012 at 2:18 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:

You think it will confuse users less if we start telling them to use
something that we have a very long history of telling them not to use?

I don't buy this line of reasoning at all.  If we're going to rename
the GUC, it should be for accuracy, not PR value.  If we start
renaming something every time we improve it, we're going to go nuts.
We improved lots of things in 9.2; they didn't all get renamed.

That is a false equivalence, and you know it.

A false equivalence between what and what?

Who said anything about PR value? I'm concerned with not confusing users.

My point here is that we don't normally rename things because they
work better than they used to. Whether that is called PR value or not
confusing users, we don't normally do it.

We sometimes rename things because they do something *different* than
what they used to do. But not just because they work better.

Anyway, it seems that no one other than you and I is very excited
about renaming this for whatever reason, so maybe we should leave it
at that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#40Peter Geoghegan
peter@2ndquadrant.com
In reply to: Robert Haas (#39)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

On 28 June 2012 20:55, Robert Haas <robertmhaas@gmail.com> wrote:

I don't buy this line of reasoning at all.  If we're going to rename
the GUC, it should be for accuracy, not PR value.  If we start
renaming something every time we improve it, we're going to go nuts.
We improved lots of things in 9.2; they didn't all get renamed.

That is a false equivalence, and you know it.

A false equivalence between what and what?

Changing the name in order to create some buzz about it, or to
highlight an improvement, is quite different to changing the name
because the practical advice surrounding the setting has suddenly
altered, almost diametrically, and after a very long period.

Who said anything about PR value? I'm concerned with not confusing users.

My point here is that we don't normally rename things because they
work better than they used to.  Whether that is called PR value or not
confusing users, we don't normally do it.

That makes sense. I don't dispute that. We aren't talking about an
incremental improvement here, are we? We're talking about night and
day. That is probably an unprecedented state of affairs, so I don't
see any reason to invoke precedent.

The reason the VACUUM FULL situation isn't at all comparable here is because:

For VACUUM FULL:

User googles "VACUUM FULL", gets something from the 8.* docs, or
something from that era. (This happens more often than not for any
given Postgres term, as you rightly complained about at PgCon).

They see something that says "you should probably use cluster
instead". They do so, and are no worse off for it.

For commit_delay:

User googles commit_delay, and sees something from that same period
that says "you don't want to use this". They naturally assume that
they don't. They lose whatever benefit they might have had from the
new implementation.

We sometimes rename things because they do something *different* than
what they used to do.  But not just because they work better.

It does do something different. That's why you proposed changing the name.

Anyway, it seems that no one other than you and I is very excited
about renaming this for whatever reason, so maybe we should leave it
at that.

I think not changing the name is a really bad decision, and I am
personally unhappy about it. I immediately agreed to your proposed
adjustment to the patch without really thinking that it mattered,
because it had no plausible downside, so why not?

That's all I have to say about the matter. If it isn't obvious that
the name should be changed, based on what I've already said, it never
will be.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

#41Daniel Farina
daniel@heroku.com
In reply to: Peter Geoghegan (#40)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

On Thu, Jun 28, 2012 at 1:32 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:

Anyway, it seems that no one other than you and I is very excited
about renaming this for whatever reason, so maybe we should leave it
at that.

I think not changing the name is a really bad decision, and I am
personally unhappy about it. I immediately agreed to your proposed
adjustment to the patch without really thinking that it mattered,
because it had no plausible downside, so why not?

That's all I have to say about the matter. If it isn't obvious that
the name should be changed, based on what I've already said, it never
will be.

All in all, I don't think this can be a very productive discussion
unless someone just pitches a equal or better name overall in terms of
conciseness and descriptiveness. I'd rather optimize for those
attributes. Old advice is old; that's the nature of the beast.

The one benefit I can think of for name shuffling is avoiding
accidental negation of the optimization when porting Postgres
configuration from older clusters, and even then I don't think the
rename-on-change strategy is a scalable one; a tuning-linting
tool is probably the only scalable option if one is concerned about
making sure that those forward-porting their configurations or
managing multiple postgres versions don't shoot themselves in the
foot.

--
fdr

#42Peter Geoghegan
peter@2ndquadrant.com
In reply to: Daniel Farina (#41)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

On 28 June 2012 22:22, Daniel Farina <daniel@heroku.com> wrote:

All in all, I don't think this can be a very productive discussion
unless someone just pitches a equal or better name overall in terms of
conciseness and descriptiveness.  I'd rather optimize for those
attributes.  Old advice is old; that's the nature of the beast.

Robert suggested wal_flush_delay, which does more accurately describe
what happens now.

Old advice is old, but we frequently go to reasonable lengths to
protect users from making predictable mistakes. The position that we
shouldn't change the name might make sense if there was any downside
to doing so, but there isn't.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

#43Daniel Farina
daniel@heroku.com
In reply to: Peter Geoghegan (#42)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

On Thu, Jun 28, 2012 at 2:32 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:

On 28 June 2012 22:22, Daniel Farina <daniel@heroku.com> wrote:

All in all, I don't think this can be a very productive discussion
unless someone just pitches a equal or better name overall in terms of
conciseness and descriptiveness. I'd rather optimize for those
attributes. Old advice is old; that's the nature of the beast.

Robert suggested wal_flush_delay, which does more accurately describe
what happens now.

Well, I learned something from reading this name, having not followed
the mechanism too closely. I like it.

--
fdr

#44Robert Haas
robertmhaas@gmail.com
In reply to: Daniel Farina (#43)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

On Thu, Jun 28, 2012 at 6:01 PM, Daniel Farina <daniel@heroku.com> wrote:

On Thu, Jun 28, 2012 at 2:32 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:

On 28 June 2012 22:22, Daniel Farina <daniel@heroku.com> wrote:

All in all, I don't think this can be a very productive discussion
unless someone just pitches a equal or better name overall in terms of
conciseness and descriptiveness. I'd rather optimize for those
attributes. Old advice is old; that's the nature of the beast.

Robert suggested wal_flush_delay, which does more accurately describe
what happens now.

Well, I learned something from reading this name, having not followed
the mechanism too closely. I like it.

I've committed this now. In the absence of a clear consensus to
rename the GUC, I contented myself with a further overhaul of the
documentation, which will hopefully make things clear at least for
people who read the documentation.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#45Bruce Momjian
bruce@momjian.us
In reply to: Peter Geoghegan (#38)
Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)

On Thu, Jun 28, 2012 at 08:17:53PM +0100, Peter Geoghegan wrote:

On 28 June 2012 20:00, Tom Lane <tgl@sss.pgh.pa.us> wrote:

See VACUUM FULL for a recent counterexample --- we basically jacked it
up and drove a new implementation underneath, but we didn't change the
name, despite the fact that we were obsoleting a whole lot more folk
knowledge than exists around commit_delay.

Of course, there were application-compatibility reasons not to rename
that command, which wouldn't apply so much to commit_delay. �But still,
we have precedent for expecting that we can fix external documentation
rather than trying to code around whatever it says.

I'm sure you're right to some degree. We can rely on some, maybe even
most users to go and learn about these things, or hear about them
somehow. But why should we do it that way, when what I've proposed is
so much simpler, and has no plausible downside?

FYI, the release notes are the big place we should talk about this new,
better behavior.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +