commit_delay, siblings

Started by Josh Berkusover 20 years ago21 messages
#1Josh Berkus
josh@agliodbs.com

Hackers:

I've been trying to get a test result for 8.1 that shows that we can eliminate
commit_delay and commit_siblings, as I believe that these settings no longer
have any real effect on performance. However, the checkpointing performance
issues have so far prevented me from getting a good test result for this.

Just a warning, because I might bring it up after feature freeze.

--
Josh Berkus
Aglio Database Solutions
San Francisco

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#1)
Re: commit_delay, siblings

Josh Berkus <josh@agliodbs.com> writes:

I've been trying to get a test result for 8.1 that shows that we can eliminate
commit_delay and commit_siblings, as I believe that these settings no longer
have any real effect on performance.

I don't think they ever did :-(. The theory is good, but useful values
for commit_delay would probably be under a millisecond, and there isn't
any portable way to sleep for such short periods. We've been leaving
them there just in case somebody can find a use for 'em, but I wouldn't
object to taking them out.

regards, tom lane

#3Hans-Jürgen Schönig
postgres@cybertec.at
In reply to: Tom Lane (#2)
Re: commit_delay, siblings

Tom Lane wrote:

Josh Berkus <josh@agliodbs.com> writes:

I've been trying to get a test result for 8.1 that shows that we can eliminate
commit_delay and commit_siblings, as I believe that these settings no longer
have any real effect on performance.

I don't think they ever did :-(. The theory is good, but useful values
for commit_delay would probably be under a millisecond, and there isn't
any portable way to sleep for such short periods. We've been leaving
them there just in case somebody can find a use for 'em, but I wouldn't
object to taking them out.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

We have done extensive testing some time ago.
We could not see any difference on any platform we have tested (AIX,
Linux, Solaris). I don't think that there is one at all - at least not
on common systems.

best regards,

hans

--
Cybertec Geschwinde u Schoenig
Schoengrabern 134, A-2020 Hollabrunn, Austria
Tel: +43/664/393 39 74
www.cybertec.at, www.postgresql.at

#4Josh Berkus
josh@agliodbs.com
In reply to: Hans-Jürgen Schönig (#3)
Re: commit_delay, siblings

Hans, Tom,

We have done extensive testing some time ago.
We could not see any difference on any platform we have tested (AIX,
Linux, Solaris). I don't think that there is one at all - at least not
on common systems.

Keen then. Any objections to removing the GUC? We desperately need means
to cut down on GUC options.

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

#5Greg Stark
gsstark@mit.edu
In reply to: Hans-Jürgen Schönig (#3)
Re: commit_delay, siblings

Hans-J�rgen Sch�nig <postgres@cybertec.at> writes:

The theory is good, but useful values for commit_delay would probably be
under a millisecond, and there isn't any portable way to sleep for such
short periods.

Just because there's no "portable" way to be sure it'll work doesn't mean
there's no point in trying. If one user sets it to 5ms and it's effective for
him there's no reason to take out the option for him just because it doesn't
work out as well on all platforms.

Linux, for example has moved to higher clock speeds precisely because things
like movie and music players need to be able to control their timing to much
more precision than 10ms.

--
greg

#6Qingqing Zhou
zhouqq@cs.toronto.edu
In reply to: Josh Berkus (#1)
Re: commit_delay, siblings

"Josh Berkus" <josh@agliodbs.com> writes

Hackers:

I've been trying to get a test result for 8.1 that shows that we can

eliminate

commit_delay and commit_siblings, as I believe that these settings no

longer

have any real effect on performance. However, the checkpointing

performance

issues have so far prevented me from getting a good test result for this.

In my understadning, the commit_delay/commit_siblings combination simulate
the background xlog writer mechanisms in some database like Oracle.

This might be separate issue. We have code in xlogflush() like:

/* done already? */
if (!XLByteLE(record, LogwrtResult.Flush))
{
/* now wait for the write lock */
LWLockAcquire(WALWriteLock, LW_EXCLUSIVE);
if (XLByteLE(record, LogwrtResult.Flush))
LWLockRelease(WALWriteLock); /* if done already, then release the
lock */
else
/* do it */

If the testing results turns out the "LWLockRelease(WALWriteLock)" actually
happened often, then it indicates that we waste some time on acquiring
WALWriteLock. Would commit_delay/commit_siblings helps or we need a
background xlog writer and notify us the completion of xlogflush is better
(so we don't compete for this lock)?

Regards,
Qingqing

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Qingqing Zhou (#6)
Re: commit_delay, siblings

"Qingqing Zhou" <zhouqq@cs.toronto.edu> writes:

Would commit_delay/commit_siblings helps or we need a
background xlog writer and notify us the completion of xlogflush is better
(so we don't compete for this lock)?

The existing bgwriter already does a certain amount of xlog flushing
(since it must flush WAL at least as far as the LSN of any dirty page it
wants to write out).  However I'm not sure that this is very effective
--- in a few strace tests that I've done, it seemed that committing
backends still ended up doing the bulk of the xlog writes, especially
if they were doing small transactions.  It'd be interesting to look into
making the bgwriter (or a new dedicated xlog bgwriter) responsible for
all xlog writes.  You could imagine a loop like

forever do
if (something new in xlog)
write and flush it;
else
sleep 10 msec;
done

together with some kind of IPC to waken backends once xlog was flushed
past the point they needed. (Designing that is the hard part.)

But in any case, the existing commit_delay doesn't seem like it's got
anything to do with a path to a better answer, so this is not an
argument against removing it.

regards, tom lane

#8Qingqing Zhou
zhouqq@cs.toronto.edu
In reply to: Josh Berkus (#1)
Re: commit_delay, siblings

"Tom Lane" <tgl@sss.pgh.pa.us> writes

together with some kind of IPC to waken backends once xlog was flushed
past the point they needed. (Designing that is the hard part.)

I think we could use ProcSendSignal()/ProcWaitForSignal() mechanism to cope
with the problem, because they won't lost any wake-ups.

So there will be a MaxBackend sized shared memory arrary with each cell is a

XLogRecPtr recptr; /* record request */
bool status; /* execution results */

structure. The initial value of the cell is <(0, 0), *doesn't matter*>.
Also, we need a spinlock to protect "recptr" value since it is not a
sig_atomic_t value.

A backend requests a xlogflush will do:
spinlock_acquire;
fill in the XLogRecPtr value;
spinlock_release;
ProcWaitForSignal();
After waken up, it will examine the "status" value and acts accordingly.

The xlog-writer is the only one who does real xlog write in postmaster mode.
It does not work in standalone mode or recovery mode. It works based on a
periodical loop + waken up when the xlog buffer is 70% full. A cancel/die
interrupts could happen during wait, so we will plug in a
ProcCancelWaitForSignal() at AbortTransaction() or error handling in
xlog-writer loop. There also could be various error conditions in its life.
Any error happened during xlogflush will be PANIC. Some small errors in the
loop will be hopefully recoverable. If everything is good, it would scan the
arrary, for each cell do:

spinlock_acquire;
make a local copy of XLogRecPtr;
spinlock_release;

if (recptr is (0, 0))
nothing to do; /* no request at all */

if (recptr is satisfied)
set XLogRecPtr to (0, 0);
status = true; /* successfully done */
ProcSendSignal(targetbackendid);
else
check if the recptr is passed the end of xlog file, if so
set XLogRecPtr to (0, 0);
set status = false; /* bad request */
ProcSendSignal(targetbackendid);

I am not sure how to check bad recptr. Currently we could do this by
comparing request and real flush point after xlogwrite(request). However,
seems this is not a solution for the xlog writer case.

Regards,
Qingqing

#9Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Josh Berkus (#1)
Re: commit_delay, siblings

Josh Berkus wrote:

Hackers:

I've been trying to get a test result for 8.1 that shows that we can eliminate
commit_delay and commit_siblings, as I believe that these settings no longer
have any real effect on performance. However, the checkpointing performance
issues have so far prevented me from getting a good test result for this.

Just a warning, because I might bring it up after feature freeze.

If we yank them ( and I agree) I think we have to do it before feature
freeze.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#10Josh Berkus
josh@agliodbs.com
In reply to: Bruce Momjian (#9)
Re: commit_delay, siblings

Bruce,

Just a warning, because I might bring it up after feature freeze.

If we yank them ( and I agree) I think we have to do it before feature
freeze.

I believe that we have consensus to yank them. Hans says that he did
extensive testing back as far as 7.4 and the options had no effect.

--
Josh Berkus
Aglio Database Solutions
San Francisco

#11Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Josh Berkus (#10)
Re: commit_delay, siblings

Just a warning, because I might bring it up after feature freeze.

If we yank them ( and I agree) I think we have to do it before feature
freeze.

I believe that we have consensus to yank them. Hans says that he did
extensive testing back as far as 7.4 and the options had no effect.

My opinion is, we'd better test with at least 8.0, or even better with
current. I think I can do the testing after Jul 1 if those features
are remained. I have a dual Xeon system with a 15000RPM SCSI disk
system in my office.
--
Tatsuo Ishii

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tatsuo Ishii (#11)
Re: commit_delay, siblings

Tatsuo Ishii <t-ishii@sra.co.jp> writes:

If we yank them ( and I agree) I think we have to do it before feature
freeze.

I believe that we have consensus to yank them. Hans says that he did
extensive testing back as far as 7.4 and the options had no effect.

My opinion is, we'd better test with at least 8.0, or even better with
current. I think I can do the testing after Jul 1 if those features
are remained. I have a dual Xeon system with a 15000RPM SCSI disk
system in my office.

Well, the proposal is on the table, and the implementation is pretty
obvious. If you want to be sticky about the feature freeze rule,
someone could generate a diff to remove the variables and post it to
-patches before July 1, and then it would be fully per-rules to evaluate
it after July 1. I vote not to require ourselves to go through that
pushup.

If Tatsuo can do some testing next week, I'm happy to hold off removing
the variables until then.

regards, tom lane

#13Alvaro Herrera
alvherre@surnet.cl
In reply to: Tom Lane (#12)
Re: commit_delay, siblings

On Tue, Jun 28, 2005 at 10:35:43AM -0400, Tom Lane wrote:

Tatsuo Ishii <t-ishii@sra.co.jp> writes:

If we yank them ( and I agree) I think we have to do it before feature
freeze.

I believe that we have consensus to yank them. Hans says that he did
extensive testing back as far as 7.4 and the options had no effect.

My opinion is, we'd better test with at least 8.0, or even better with
current. I think I can do the testing after Jul 1 if those features
are remained. I have a dual Xeon system with a 15000RPM SCSI disk
system in my office.

Well, the proposal is on the table, and the implementation is pretty
obvious. If you want to be sticky about the feature freeze rule,
someone could generate a diff to remove the variables and post it to
-patches before July 1, and then it would be fully per-rules to evaluate
it after July 1.

That'd be needlessly legalistic ... I propose we stick to the "spirit"
of the rules, rather than the letter.

I vote not to require ourselves to go through that pushup.

I agree.

--
Alvaro Herrera (<alvherre[a]surnet.cl>)
"Cada quien es cada cual y baja las escaleras como quiere" (JMSerrat)

#14Josh Berkus
josh@agliodbs.com
In reply to: Tom Lane (#12)
Re: commit_delay, siblings

Tom,

Incidentally, I have tests in the queue. It's just that the STP has been
very unreliable for the last month so I've not been able to get definitive
test results.

More important than commit_*, is, of course the WAL/CRC stuff for
checkpoint cost, which I'm also getting impatient to test. Will be
setting up my own test machines today ...

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

#15Simon Riggs
simon@2ndquadrant.com
In reply to: Josh Berkus (#4)
Re: commit_delay, siblings

On Wed, 2005-06-22 at 11:11 -0700, Josh Berkus wrote:

Hans, Tom,

We have done extensive testing some time ago.
We could not see any difference on any platform we have tested (AIX,
Linux, Solaris). I don't think that there is one at all - at least not
on common systems.

Keen then. Any objections to removing the GUC? We desperately need means
to cut down on GUC options.

Group commit is a well-documented technique for improving performance,
but the gains only show themselves on very busy systems. It is possible
in earlier testing any apparent value was actually hidden by the
BufMgrLock issues we have now resolved in 8.1. We now see XLogInsert as
being very nearly the highest routine on the oprofile. That tells me
that it could now be time for group commit to show us some value, if any
exists.

DB2 and Berkeley-DB use group commit, while other rdbms use log writer
processes which effectively provide the same thing. It would surprise me
if we were unable to make use of such a technique, and worry me too.

I would ask that we hold off on their execution, at least for the
complete 8.1 beta performance test cycle. We may yet see gains albeit,
as Tom points out, that benefit may only be possible on only some
platforms.

Best Regards, Simon Riggs

#16Michael Paesold
mpaesold@gmx.at
In reply to: Josh Berkus (#1)
Re: commit_delay, siblings

Simon Riggs wrote:

Group commit is a well-documented technique for improving performance,
but the gains only show themselves on very busy systems. It is possible
in earlier testing any apparent value was actually hidden by the
BufMgrLock issues we have now resolved in 8.1. We now see XLogInsert as
being very nearly the highest routine on the oprofile. That tells me
that it could now be time for group commit to show us some value, if any
exists.

DB2 and Berkeley-DB use group commit, while other rdbms use log writer
processes which effectively provide the same thing. It would surprise me
if we were unable to make use of such a technique, and worry me too.

I would ask that we hold off on their execution, at least for the
complete 8.1 beta performance test cycle. We may yet see gains albeit,
as Tom points out, that benefit may only be possible on only some
platforms.

I don't remember the details exactly, but isn't it so that postgres has some
kind of group commits even without the commit_delay option? I.e. when
several backends are waiting for commit concurrently, the one to get to
commit will actually commit wal for all waiting transactions to disk?

I remember the term "ganged wal writes" or something similar. Tom, can you
elaborate on this? Please tell me if I am totally off track. ;-)

Best Regards,
Michael Paesold

#17Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Simon Riggs (#15)
Re: commit_delay, siblings

Simon Riggs wrote:

On Wed, 2005-06-22 at 11:11 -0700, Josh Berkus wrote:

Hans, Tom,

We have done extensive testing some time ago.
We could not see any difference on any platform we have tested (AIX,
Linux, Solaris). I don't think that there is one at all - at least not
on common systems.

Keen then. Any objections to removing the GUC? We desperately need means
to cut down on GUC options.

Group commit is a well-documented technique for improving performance,
but the gains only show themselves on very busy systems. It is possible
in earlier testing any apparent value was actually hidden by the
BufMgrLock issues we have now resolved in 8.1. We now see XLogInsert as
being very nearly the highest routine on the oprofile. That tells me
that it could now be time for group commit to show us some value, if any
exists.

DB2 and Berkeley-DB use group commit, while other rdbms use log writer
processes which effectively provide the same thing. It would surprise me
if we were unable to make use of such a technique, and worry me too.

I would ask that we hold off on their execution, at least for the
complete 8.1 beta performance test cycle. We may yet see gains albeit,
as Tom points out, that benefit may only be possible on only some
platforms.

Interesting. I didn't know other databases used group commits. Your
idea of keeping it for the 8.1 testing cycle has merit.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#18Kenneth Marshall
ktm@it.is.rice.edu
In reply to: Simon Riggs (#15)
Re: commit_delay, siblings

On Wed, Jun 29, 2005 at 08:14:36AM +0100, Simon Riggs wrote:

Group commit is a well-documented technique for improving performance,
but the gains only show themselves on very busy systems. It is possible
in earlier testing any apparent value was actually hidden by the
BufMgrLock issues we have now resolved in 8.1. We now see XLogInsert as
being very nearly the highest routine on the oprofile. That tells me
that it could now be time for group commit to show us some value, if any
exists.

DB2 and Berkeley-DB use group commit, while other rdbms use log writer
processes which effectively provide the same thing. It would surprise me
if we were unable to make use of such a technique, and worry me too.

I would ask that we hold off on their execution, at least for the
complete 8.1 beta performance test cycle. We may yet see gains albeit,
as Tom points out, that benefit may only be possible on only some
platforms.

Best Regards, Simon Riggs

---------------------------(end of broadcast)---------------------------

I would like to wiegh in on Simon's side on this issue. The fact that
no benefit has been seen from the group commint yet may be in part do
to the current WAL fsync structure where a page at a time is sync'd.
I saw a patch/test just recently mentioned that showed dramatic
performance improvements, up to the level of "fsync = off", by writing
multiple blocks with a gather algorithm. I would hope that with a
similar patch, we should begin to see the benefit of the commit_delay
GUC.

Ken Marshall

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#15)
Re: commit_delay, siblings

Simon Riggs <simon@2ndquadrant.com> writes:

Group commit is a well-documented technique for improving performance,

The issue here is not "is group commit a good idea in the abstract?".
It is "is the commit_delay implementation of the idea worth a dime?"
... and the evidence we have all points to the answer "NO". We should
not let theoretical arguments blind us to this.

I posted an analysis some time ago showing that under heavy load,
we already have the effect of ganged commits, without commit_delay:
http://archives.postgresql.org/pgsql-hackers/2002-10/msg00331.php

It's likely that there is more we can and should do, but that doesn't
mean that commit_delay is the right answer. commit_delay doesn't do
anything to encourage ganging of writes, it just inserts an arbitrary
delay that's not synchronized to anything, and is probably an order
of magnitude too large anyway on most platforms.

I would ask that we hold off on their execution, at least for the
complete 8.1 beta performance test cycle.

I'm willing to wait a week while Tatsuo runs some fresh tests. I'm
not willing to wait indefinitely for evidence that I'm privately
certain will not be forthcoming.

regards, tom lane

#20Simon Riggs
simon@2ndquadrant.com
In reply to: Tom Lane (#19)
Re: commit_delay, siblings

On Wed, 2005-06-29 at 10:16 -0400, Tom Lane wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

Group commit is a well-documented technique for improving performance,

The issue here is not "is group commit a good idea in the abstract?".
It is "is the commit_delay implementation of the idea worth a dime?"
... and the evidence we have all points to the answer "NO". We should
not let theoretical arguments blind us to this.

OK, sometimes I sound too theoretical when I do my World History of
RDBMS notes, :-) ... all I meant was "lets hold off till we've measured
it".

I would ask that we hold off on their execution, at least for the
complete 8.1 beta performance test cycle.

I'm willing to wait a week while Tatsuo runs some fresh tests. I'm
not willing to wait indefinitely for evidence that I'm privately
certain will not be forthcoming.

I'm inclined to agree with you, but I see no need to move quickly. The
code's been there a while now.

Best Regards, Simon Riggs

#21Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Tom Lane (#19)
Re: commit_delay, siblings

Hi,

Simon Riggs <simon@2ndquadrant.com> writes:

Group commit is a well-documented technique for improving performance,

The issue here is not "is group commit a good idea in the abstract?".
It is "is the commit_delay implementation of the idea worth a dime?"
... and the evidence we have all points to the answer "NO". We should
not let theoretical arguments blind us to this.

I posted an analysis some time ago showing that under heavy load,
we already have the effect of ganged commits, without commit_delay:
http://archives.postgresql.org/pgsql-hackers/2002-10/msg00331.php

It's likely that there is more we can and should do, but that doesn't
mean that commit_delay is the right answer. commit_delay doesn't do
anything to encourage ganging of writes, it just inserts an arbitrary
delay that's not synchronized to anything, and is probably an order
of magnitude too large anyway on most platforms.

I would ask that we hold off on their execution, at least for the
complete 8.1 beta performance test cycle.

I'm willing to wait a week while Tatsuo runs some fresh tests. I'm
not willing to wait indefinitely for evidence that I'm privately
certain will not be forthcoming.

regards, tom lane

Here are the results from testings I did this morning.

Summary:
The effect of commit_delay cannot be ignored. I got almost 3 times
performance differnce among different commit_delay settings.

Details:

Xeon 2.8GHz x2, HT on, mem 2GB, Ultra 320 SCSI, 15000RPM, HT on
Redhat AS 3/kernel 2.4.21( 2.4.21-9.30AXsmp)
PostgreSQL current (July 2 12:18 JST)

FS:
/dev/cciss/c0d0p3 28G 2.1G 25G 8% /
/dev/cciss/c0d0p1 985M 28M 907M 3% /boot
/dev/cciss/c0d1p1 67G 1.7G 62G 3% /data1
/dev/cciss/c0d2p1 67G 33M 64G 1% /data2
/dev/cciss/c0d3p1 67G 33M 64G 1% /data3
none 1.3G 0 1.3G 0% /dev/shm

OS & PostgreSQL binaries are on /. data is on /data1.

All postgresql.conf directives are set to defaults except:

max_connections = 512
shared_buffers = 10000

Benchmarking is done using pgbench. The test database was initialized
by following commands:
pgbench -i -s 100 test (10,000,000 rows in accounts table)

case 1: commit_delay = 0
$ time pgbench -N -c 128 -t 100 test (128 concurrent uses)
starting vacuum...end.
transaction type: Update only accounts
scaling factor: 100
number of clients: 128
number of transactions per client: 100
number of transactions actually processed: 12800/12800
tps = 47.400291 (including connections establishing)
tps = 47.509689 (excluding connections establishing)

real 4m30.065s
user 0m3.530s
sys 0m11.210s

case 2: commit_delay = 10
starting vacuum...end.
transaction type: Update only accounts
scaling factor: 100
number of clients: 128
number of transactions per client: 100
number of transactions actually processed: 12800/12800
tps = 140.024294 (including connections establishing)
tps = 141.038901 (excluding connections establishing)

real 1m31.431s
user 0m2.340s
sys 0m5.850s

case 3: commit_delay = 50
starting vacuum...end.
transaction type: Update only accounts
scaling factor: 100
number of clients: 128
number of transactions per client: 100
number of transactions actually processed: 12800/12800
tps = 137.207500 (including connections establishing)
tps = 138.083489 (excluding connections establishing)

real 1m33.312s
user 0m2.790s
sys 0m6.490s

case 4: commit_delay = 100
starting vacuum...end.
transaction type: Update only accounts
scaling factor: 100
number of clients: 128
number of transactions per client: 100
number of transactions actually processed: 12800/12800
tps = 133.458149 (including connections establishing)
tps = 134.298841 (excluding connections establishing)

real 1m35.931s
user 0m2.750s
sys 0m7.030s

As you can see commit_delay = 10 outperforms commit_delay = 0 by 3
times.
--
Tatsuo Ishii