CommitDelay performance improvement

Started by Tom Laneabout 25 years ago51 messageshackers
Jump to latest
#1Tom Lane
tgl@sss.pgh.pa.us

Looking at the XLOG stuff, I notice that we already have a field
(logRec) in the per-backend PROC structures that shows whether a
transaction is currently in progress with at least one change made
(ie at least one XLOG entry written).

It would be very easy to extend the existing code so that the commit
delay is not done unless there is at least one other backend with
nonzero logRec --- or, more generally, at least N other backends with
nonzero logRec. We cannot tell if any of them are actually nearing
their commits, but this seems better than just blindly waiting. Larger
values of N would presumably improve the odds that at least one of them
is nearing its commit.

A further refinement, still quite cheap to implement since the info is
in the PROC struct, would be to not count backends that are blocked
waiting for locks. These guys are less likely to be ready to commit
in the next few milliseconds than the guys who are actively running;
indeed they cannot commit until someone else has committed/aborted to
release the lock they need.

Comments? What should the threshold N be ... or do we need to make
that a tunable parameter?

regards, tom lane

#2Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#1)
Re: CommitDelay performance improvement

Looking at the XLOG stuff, I notice that we already have a field
(logRec) in the per-backend PROC structures that shows whether a
transaction is currently in progress with at least one change made
(ie at least one XLOG entry written).

It would be very easy to extend the existing code so that the commit
delay is not done unless there is at least one other backend with
nonzero logRec --- or, more generally, at least N other backends with
nonzero logRec. We cannot tell if any of them are actually nearing
their commits, but this seems better than just blindly waiting. Larger
values of N would presumably improve the odds that at least one of them
is nearing its commit.

Why not just set a flag in there when someone nears commit and clear
when they are about to commit?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#2)
Re: CommitDelay performance improvement

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Why not just set a flag in there when someone nears commit and clear
when they are about to commit?

Define "nearing commit", in such a way that you can specify where you
plan to set that flag.

regards, tom lane

#4Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#3)
Re: CommitDelay performance improvement

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Why not just set a flag in there when someone nears commit and clear
when they are about to commit?

Define "nearing commit", in such a way that you can specify where you
plan to set that flag.

Is there significant time between entry of CommitTransaction() and the
fsync()? Maybe not.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#4)
Re: CommitDelay performance improvement

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Is there significant time between entry of CommitTransaction() and the
fsync()? Maybe not.

I doubt it. No I/O anymore, anyway, unless the commit record happens to
overrun an xlog block boundary.

regards, tom lane

#6Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#5)
Re: CommitDelay performance improvement

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Is there significant time between entry of CommitTransaction() and the
fsync()? Maybe not.

I doubt it. No I/O anymore, anyway, unless the commit record happens to
overrun an xlog block boundary.

That's what I was afraid of. Since we don't write the dirty blocks to
the kernel anymore, we don't really have much happening before someone
says they are about to commit. In the old days, we were write()'ing
those buffers, and we had some delay and kernel calls in there.

Guess that idea is dead.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#7Nathan Myers
ncm@zembu.com
In reply to: Tom Lane (#1)
Re: CommitDelay performance improvement

On Fri, Feb 23, 2001 at 11:32:21AM -0500, Tom Lane wrote:

A further refinement, still quite cheap to implement since the info is
in the PROC struct, would be to not count backends that are blocked
waiting for locks. These guys are less likely to be ready to commit
in the next few milliseconds than the guys who are actively running;
indeed they cannot commit until someone else has committed/aborted to
release the lock they need.

Comments? What should the threshold N be ... or do we need to make
that a tunable parameter?

Once you make it tuneable, you're stuck with it. You can always add
a knob later, after somebody discovers a real need.

Nathan Myers
ncm@zembu.com

#8Bruce Momjian
bruce@momjian.us
In reply to: Nathan Myers (#7)
Re: CommitDelay performance improvement

On Fri, Feb 23, 2001 at 11:32:21AM -0500, Tom Lane wrote:

A further refinement, still quite cheap to implement since the info is
in the PROC struct, would be to not count backends that are blocked
waiting for locks. These guys are less likely to be ready to commit
in the next few milliseconds than the guys who are actively running;
indeed they cannot commit until someone else has committed/aborted to
release the lock they need.

Comments? What should the threshold N be ... or do we need to make
that a tunable parameter?

Once you make it tuneable, you're stuck with it. You can always add
a knob later, after somebody discovers a real need.

I wonder if Tom should implement it, but leave it at zero until people
can report that a non-zero helps. We already have the parameter, we can
just make it smarter and let people test it.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nathan Myers (#7)
Re: CommitDelay performance improvement

ncm@zembu.com (Nathan Myers) writes:

Comments? What should the threshold N be ... or do we need to make
that a tunable parameter?

Once you make it tuneable, you're stuck with it. You can always add
a knob later, after somebody discovers a real need.

If we had a good idea what the default level should be, I'd be willing
to go without a knob. I'm thinking of a default of about 5 (ie, at
least 5 other active backends to trigger a commit delay) ... but I'm not
so confident of that that I think it needn't be tunable. It's really
dependent on your average and peak transaction lengths, and that's
going to vary across installations, so unless we want to try to make it
self-adjusting, a knob seems like a good idea.

A self-adjusting delay might well be a great idea, BTW, but I'm trying
to be conservative about how much complexity we should add right now.

regards, tom lane

#10Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#9)
Re: CommitDelay performance improvement

ncm@zembu.com (Nathan Myers) writes:

Comments? What should the threshold N be ... or do we need to make
that a tunable parameter?

Once you make it tuneable, you're stuck with it. You can always add
a knob later, after somebody discovers a real need.

If we had a good idea what the default level should be, I'd be willing
to go without a knob. I'm thinking of a default of about 5 (ie, at
least 5 other active backends to trigger a commit delay) ... but I'm not
so confident of that that I think it needn't be tunable. It's really
dependent on your average and peak transaction lengths, and that's
going to vary across installations, so unless we want to try to make it
self-adjusting, a knob seems like a good idea.

A self-adjusting delay might well be a great idea, BTW, but I'm trying
to be conservative about how much complexity we should add right now.

OH, so you are saying N backends should have dirtied buffers before
doing the delay? Hmm, that seems almost untunable to me.

Let's suppose we decide to sleep. When we wake up, can we know that
someone else has fsync'ed for us? And if they have, should we be more
likely to fsync() in the future?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#10)
Re: CommitDelay performance improvement

Bruce Momjian <pgman@candle.pha.pa.us> writes:

A self-adjusting delay might well be a great idea, BTW, but I'm trying
to be conservative about how much complexity we should add right now.

OH, so you are saying N backends should have dirtied buffers before
doing the delay? Hmm, that seems almost untunable to me.

Let's suppose we decide to sleep. When we wake up, can we know that
someone else has fsync'ed for us?

XLogFlush will find that it has nothing to do, so yes we can.

And if they have, should we be more
likely to fsync() in the future?

You mean less likely. My thought for a self-adjusting delay was to
ratchet the delay up a little every time it succeeds in avoiding an
fsync, and down a little every time it fails to do so. No change when
we don't delay at all (because of no other active backends). But
testing this and making sure it behaves reasonably seems like more work
than we should try to accomplish before 7.1.

regards, tom lane

#12Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#11)
Re: CommitDelay performance improvement

And if they have, should we be more
likely to fsync() in the future?

I meant more likely to sleep().

You mean less likely. My thought for a self-adjusting delay was to
ratchet the delay up a little every time it succeeds in avoiding an
fsync, and down a little every time it fails to do so. No change when
we don't delay at all (because of no other active backends). But
testing this and making sure it behaves reasonably seems like more work
than we should try to accomplish before 7.1.

It could be tough. Imagine the delay increasing to 3 seconds? Seems
there has to be an upper bound on the sleep. The more you delay, the
more likely you will be to find someone to fsync you. Are we waking
processes up after we have fsync()'ed them? If so, we can keep
increasing the delay.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#13Nathan Myers
ncm@zembu.com
In reply to: Tom Lane (#9)
Re: CommitDelay performance improvement

On Fri, Feb 23, 2001 at 05:18:19PM -0500, Tom Lane wrote:

ncm@zembu.com (Nathan Myers) writes:

Comments? What should the threshold N be ... or do we need to make
that a tunable parameter?

Once you make it tuneable, you're stuck with it. You can always add
a knob later, after somebody discovers a real need.

If we had a good idea what the default level should be, I'd be willing
to go without a knob. I'm thinking of a default of about 5 (ie, at
least 5 other active backends to trigger a commit delay) ... but I'm not
so confident of that that I think it needn't be tunable. It's really
dependent on your average and peak transaction lengths, and that's
going to vary across installations, so unless we want to try to make it
self-adjusting, a knob seems like a good idea.

A self-adjusting delay might well be a great idea, BTW, but I'm trying
to be conservative about how much complexity we should add right now.

When thinking about tuning N, I like to consider what are the interesting
possible values for N:

0: Ignore any other potential committers.
1: The minimum possible responsiveness to other committers.
5: Tom's guess for what might be a good choice.
10: Harry's guess.
~0: Always delay.

I would rather release with N=1 than with 0, because it actually responds
to conditions. What N might best be, >1, probably varies on a lot of
hard-to-guess parameters.

It seems to me that comparing various choices (and other, more interesting,
algorithms) to the N=1 case would be more productive than comparing them
to the N=0 case, so releasing at N=1 would yield better statistics for
actually tuning in 7.2.

Nathan Myers
ncm@zembu.com

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#12)
Re: CommitDelay performance improvement

Bruce Momjian <pgman@candle.pha.pa.us> writes:

It could be tough. Imagine the delay increasing to 3 seconds? Seems
there has to be an upper bound on the sleep. The more you delay, the
more likely you will be to find someone to fsync you.

Good point, and an excellent illustration of the fact that
self-adjusting algorithms aren't that easy to get right the first
time ;-)

Are we waking processes up after we have fsync()'ed them?

Not at the moment. That would be another good mechanism to investigate
for 7.2; but right now there's no infrastructure that would allow a
backend to discover which other ones were sleeping for fsync.

regards, tom lane

#15Bruce Momjian
bruce@momjian.us
In reply to: Nathan Myers (#13)
Re: CommitDelay performance improvement

When thinking about tuning N, I like to consider what are the interesting
possible values for N:

0: Ignore any other potential committers.
1: The minimum possible responsiveness to other committers.
5: Tom's guess for what might be a good choice.
10: Harry's guess.
~0: Always delay.

I would rather release with N=1 than with 0, because it actually responds
to conditions. What N might best be, >1, probably varies on a lot of
hard-to-guess parameters.

It seems to me that comparing various choices (and other, more interesting,
algorithms) to the N=1 case would be more productive than comparing them
to the N=0 case, so releasing at N=1 would yield better statistics for
actually tuning in 7.2.

We don't release code becuase it has better tuning oportunities for
later releases. What we can do is give people parameters where the
default is safe, and they can play and report to us.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#16Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#14)
Re: CommitDelay performance improvement

Bruce Momjian <pgman@candle.pha.pa.us> writes:

It could be tough. Imagine the delay increasing to 3 seconds? Seems
there has to be an upper bound on the sleep. The more you delay, the
more likely you will be to find someone to fsync you.

Good point, and an excellent illustration of the fact that
self-adjusting algorithms aren't that easy to get right the first
time ;-)

I see. I am concerned that anything done to 7.1 at this point may cause
problems with performance under certain circumstances. Let's see what
the new code shows our testers.

Are we waking processes up after we have fsync()'ed them?

Not at the moment. That would be another good mechanism to investigate
for 7.2; but right now there's no infrastructure that would allow a
backend to discover which other ones were sleeping for fsync.

Can we put the backends to sleep waiting for a lock, and have them wake
up later?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#16)
Re: CommitDelay performance improvement

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Can we put the backends to sleep waiting for a lock, and have them wake
up later?

Locks don't have timeouts. There is no existing mechanism that will
serve this purpose; we'll have to create a new one.

regards, tom lane

#18Nathan Myers
ncm@zembu.com
In reply to: Bruce Momjian (#15)
Re: CommitDelay performance improvement

On Fri, Feb 23, 2001 at 06:37:06PM -0500, Bruce Momjian wrote:

When thinking about tuning N, I like to consider what are the interesting
possible values for N:

0: Ignore any other potential committers.
1: The minimum possible responsiveness to other committers.
5: Tom's guess for what might be a good choice.
10: Harry's guess.
~0: Always delay.

I would rather release with N=1 than with 0, because it actually
responds to conditions. What N might best be, >1, probably varies on
a lot of hard-to-guess parameters.

It seems to me that comparing various choices (and other, more
interesting, algorithms) to the N=1 case would be more productive
than comparing them to the N=0 case, so releasing at N=1 would yield
better statistics for actually tuning in 7.2.

We don't release code because it has better tuning opportunities for
later releases. What we can do is give people parameters where the
default is safe, and they can play and report to us.

Perhaps I misunderstood. I had perceived N=1 as a conservative choice
that was nevertheless preferable to N=0.

Nathan Myers
ncm@zembu.com

#19Bruce Momjian
bruce@momjian.us
In reply to: Nathan Myers (#18)
Re: CommitDelay performance improvement

It seems to me that comparing various choices (and other, more
interesting, algorithms) to the N=1 case would be more productive
than comparing them to the N=0 case, so releasing at N=1 would yield
better statistics for actually tuning in 7.2.

We don't release code because it has better tuning opportunities for
later releases. What we can do is give people parameters where the
default is safe, and they can play and report to us.

Perhaps I misunderstood. I had perceived N=1 as a conservative choice
that was nevertheless preferable to N=0.

I think zero delay is the conservative choice at this point, unless we
hear otherwise from testers.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#20Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#17)
Re: CommitDelay performance improvement

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Can we put the backends to sleep waiting for a lock, and have them wake
up later?

Locks don't have timeouts. There is no existing mechanism that will
serve this purpose; we'll have to create a new one.

That is what I suspected.

Having thought about it, We currently have a few options:

1) let every backend fsync on its own
2) try to delay backends so they all fsync() at the same time
3) delay fsync until after commit

Items 2 and 3 attempt to bunch up fsyncs. Option 2 has backends waiting
to fsync() on the expectation that some other backend may commit soon.
Option 3 I may turn out to be the best solution. No matter how smart we
make the code, we will never know for sure if someone is about to commit
and whether it is worth waiting.

My idea would be to let committing backends return "COMMIT" to the user,
and set a need_fsync flag that is guaranteed to cause an fsync within X
milliseconds. This way, if other backends commit in the next X
millisecond, they can all use one fsync().

Now, I know many will complain that we are returning commit while not
having the stuff on the platter. But consider, we only lose data from a
OS crash or hardware failure. Do people who commit something, and then
the machines crashes 2 milliseconds after the commit, really expect the
data to be on the disk when they restart? Maybe they do, but it seems
the benefit of grouped fsyncs() is large enough that many will say they
would rather have this option.

This was my point long ago that we could offer sub-second reliability
with no-fsync performance if we just had some process running that wrote
dirty pages and fsynced every 20 milliseconds.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#21Philip Warner
pjw@rhyme.com.au
In reply to: Nathan Myers (#13)
#22Philip Warner
pjw@rhyme.com.au
In reply to: Bruce Momjian (#20)
#23Philip Warner
pjw@rhyme.com.au
In reply to: Tom Lane (#1)
#24Bruce Momjian
bruce@momjian.us
In reply to: Philip Warner (#22)
#25Bruce Momjian
bruce@momjian.us
In reply to: Philip Warner (#22)
#26Nathan Myers
ncm@zembu.com
In reply to: Bruce Momjian (#19)
#27Philip Warner
pjw@rhyme.com.au
In reply to: Bruce Momjian (#25)
#28Bruce Momjian
bruce@momjian.us
In reply to: Philip Warner (#27)
#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#20)
#30Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#29)
#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: Philip Warner (#21)
#32Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nathan Myers (#13)
#33Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nathan Myers (#26)
#34Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#31)
#35Nathan Myers
ncm@zembu.com
In reply to: Tom Lane (#33)
#36Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nathan Myers (#26)
#37Philip Warner
pjw@rhyme.com.au
In reply to: Tom Lane (#36)
#38Tom Lane
tgl@sss.pgh.pa.us
In reply to: Philip Warner (#37)
#39Nathan Myers
ncm@zembu.com
In reply to: Tom Lane (#36)
#40Philip Warner
pjw@rhyme.com.au
In reply to: Nathan Myers (#39)
#41Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Tom Lane (#36)
In reply to: Philip Warner (#27)
#43Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hiroshi Inoue (#41)
#44Tom Lane
tgl@sss.pgh.pa.us
In reply to: Philip Warner (#40)
#45Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nathan Myers (#39)
#46Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Tom Lane (#36)
#47Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hiroshi Inoue (#46)
#48Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#47)
#49Zeugswetter Andreas SB
ZeugswetterA@wien.spardat.at
In reply to: Tom Lane (#48)
#50Zeugswetter Andreas SB
ZeugswetterA@wien.spardat.at
In reply to: Zeugswetter Andreas SB (#49)
#51Philip Warner
pjw@rhyme.com.au
In reply to: Zeugswetter Andreas SB (#50)