AW: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

vmikheev@SECTORBASE.COM

over 25 years ago

In reply to: Zeugswetter Andreas SB (#1)

RE: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

Earlier, Vadim was talking about arranging to share fsyncs of the WAL
log file across transactions (after writing your commit record to the
log, sleep a few milliseconds to see if anyone else fsyncs before you
do; if not, issue the fsync yourself). That would offer less-than-
one-fsync-per-transaction performance without giving up any
guarantees.
If people feel a compulsion to have a tunable parameter, let 'em tune
the length of the pre-fsync sleep ...

Already implemented (without ability to tune this parameter -
xact.c:CommitDelay, - yet). Currently CommitDelay is 5, so
backend sleeps 1/200 sec before checking/forcing log fsync.

Vadim

Import Notes

Resolved by subject fallback

bruce@momjian.us

over 25 years ago

In reply to: Mikheev, Vadim (#2)

Re: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

[ Charset ISO-8859-1 unsupported, converting... ]

Earlier, Vadim was talking about arranging to share fsyncs of the WAL
log file across transactions (after writing your commit record to the
log, sleep a few milliseconds to see if anyone else fsyncs before you
do; if not, issue the fsync yourself). That would offer less-than-
one-fsync-per-transaction performance without giving up any
guarantees.
If people feel a compulsion to have a tunable parameter, let 'em tune
the length of the pre-fsync sleep ...

Already implemented (without ability to tune this parameter -
xact.c:CommitDelay, - yet). Currently CommitDelay is 5, so
backend sleeps 1/200 sec before checking/forcing log fsync.

But it returns _completed_ to the client before sleeping, right?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

vmikheev@SECTORBASE.COM

over 25 years ago

In reply to: Bruce Momjian (#3)

Re: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

Earlier, Vadim was talking about arranging to share fsyncs of the WAL
log file across transactions (after writing your commit record to the
log, sleep a few milliseconds to see if anyone else fsyncs before you
do; if not, issue the fsync yourself). That would offer less-than-
one-fsync-per-transaction performance without giving up any
guarantees.
If people feel a compulsion to have a tunable parameter, let 'em tune
the length of the pre-fsync sleep ...

Already implemented (without ability to tune this parameter -
xact.c:CommitDelay, - yet). Currently CommitDelay is 5, so
backend sleeps 1/200 sec before checking/forcing log fsync.

But it returns _completed_ to the client before sleeping, right?

No.

Vadim

Zeugswetter Andreas SB

ZeugswetterA@wien.spardat.at

over 25 years ago

In reply to: Mikheev, Vadim (#2)

Earlier, Vadim was talking about arranging to share fsyncs of the WAL
log file across transactions (after writing your commit record to the
log, sleep a few milliseconds to see if anyone else fsyncs before you
do; if not, issue the fsync yourself). That would offer less-than-
one-fsync-per-transaction performance without giving up any
guarantees.
If people feel a compulsion to have a tunable parameter, let 'em tune
the length of the pre-fsync sleep ...

Already implemented (without ability to tune this parameter -
xact.c:CommitDelay, - yet). Currently CommitDelay is 5, so
backend sleeps 1/200 sec before checking/forcing log fsync.

Should definitely make that tuneable (per installation is imho sufficient),
no use in waiting if the dba knows there is only very little concurrency.
IIRC DB/2 defaults to not using this "commit pooling".

Andreas

Import Notes

Resolved by subject fallback

bruce@momjian.us

over 25 years ago

In reply to: Mikheev, Vadim (#4)

Re: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

[ Charset ISO-8859-1 unsupported, converting... ]

Earlier, Vadim was talking about arranging to share fsyncs of the WAL
log file across transactions (after writing your commit record to the
log, sleep a few milliseconds to see if anyone else fsyncs before you
do; if not, issue the fsync yourself). That would offer less-than-
one-fsync-per-transaction performance without giving up any
guarantees.
If people feel a compulsion to have a tunable parameter, let 'em tune
the length of the pre-fsync sleep ...

Already implemented (without ability to tune this parameter -
xact.c:CommitDelay, - yet). Currently CommitDelay is 5, so
backend sleeps 1/200 sec before checking/forcing log fsync.

But it returns _completed_ to the client before sleeping, right?

No.

Ewe, so we have this 1/200 second delay for every transaction. Seems
bad to me.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

bright@wintelcom.net

over 25 years ago

In reply to: Bruce Momjian (#6)

Re: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

* Bruce Momjian <pgman@candle.pha.pa.us> [001116 08:59] wrote:

[ Charset ISO-8859-1 unsupported, converting... ]

Earlier, Vadim was talking about arranging to share fsyncs of the WAL
log file across transactions (after writing your commit record to the
log, sleep a few milliseconds to see if anyone else fsyncs before you
do; if not, issue the fsync yourself). That would offer less-than-
one-fsync-per-transaction performance without giving up any
guarantees.
If people feel a compulsion to have a tunable parameter, let 'em tune
the length of the pre-fsync sleep ...

Already implemented (without ability to tune this parameter -
xact.c:CommitDelay, - yet). Currently CommitDelay is 5, so
backend sleeps 1/200 sec before checking/forcing log fsync.

But it returns _completed_ to the client before sleeping, right?

No.

Ewe, so we have this 1/200 second delay for every transaction. Seems
bad to me.

I think as long as it becomes a tunable this isn't a bad idea at
all. Fixing it at 1/200 isn't so great because people not wrapping
large amounts of inserts/updates with transaction blocks will
suffer.

--
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."

Don Baccus

dhogaza@pacifier.com

over 25 years ago

In reply to: Alfred Perlstein (#7)

Re: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

At 09:32 AM 11/16/00 -0800, Alfred Perlstein wrote:

* Bruce Momjian <pgman@candle.pha.pa.us> [001116 08:59] wrote:

Ewe, so we have this 1/200 second delay for every transaction. Seems
bad to me.

I think as long as it becomes a tunable this isn't a bad idea at
all. Fixing it at 1/200 isn't so great because people not wrapping
large amounts of inserts/updates with transaction blocks will
suffer.

I think the default should probably be no delay, and the documentation
on enabling this needs to be clear and obvious (i.e. hard to miss).

- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.

bruce@momjian.us

over 25 years ago

In reply to: Don Baccus (#8)

Re: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

At 09:32 AM 11/16/00 -0800, Alfred Perlstein wrote:

* Bruce Momjian <pgman@candle.pha.pa.us> [001116 08:59] wrote:

Ewe, so we have this 1/200 second delay for every transaction. Seems
bad to me.

I think as long as it becomes a tunable this isn't a bad idea at
all. Fixing it at 1/200 isn't so great because people not wrapping
large amounts of inserts/updates with transaction blocks will
suffer.

I think the default should probably be no delay, and the documentation
on enabling this needs to be clear and obvious (i.e. hard to miss).

I just talked to Tom Lane about this. I think a sleep(0) just before
the flush would be the best. It would reliquish the cpu slice if
another process is ready to run. If no other backend is running, it
probably just returns. If there is another one, it gives it a chance to
complete. On return from sleep(0), it can check if it still needs to
flush. This would tend to bunch up flushers so they flush only once,
while not delaying cases where only one backend is running.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#10

Don Baccus

dhogaza@pacifier.com

over 25 years ago

In reply to: Bruce Momjian (#9)

Re: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

At 02:13 PM 11/16/00 -0500, Bruce Momjian wrote:

I think the default should probably be no delay, and the documentation
on enabling this needs to be clear and obvious (i.e. hard to miss).

I just talked to Tom Lane about this. I think a sleep(0) just before
the flush would be the best. It would reliquish the cpu slice if
another process is ready to run. If no other backend is running, it
probably just returns. If there is another one, it gives it a chance to
complete. On return from sleep(0), it can check if it still needs to
flush. This would tend to bunch up flushers so they flush only once,
while not delaying cases where only one backend is running.

This sounds like an interesting approach, yes.

- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.

#11

ler@lerctr.org

over 25 years ago

In reply to: Don Baccus (#10)

Re: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

* Don Baccus <dhogaza@pacifier.com> [001116 13:46]:

At 02:13 PM 11/16/00 -0500, Bruce Momjian wrote:

I think the default should probably be no delay, and the documentation
on enabling this needs to be clear and obvious (i.e. hard to miss).

I just talked to Tom Lane about this. I think a sleep(0) just before
the flush would be the best. It would reliquish the cpu slice if
another process is ready to run. If no other backend is running, it
probably just returns. If there is another one, it gives it a chance to
complete. On return from sleep(0), it can check if it still needs to
flush. This would tend to bunch up flushers so they flush only once,
while not delaying cases where only one backend is running.

This sounds like an interesting approach, yes.

Question: Is sleep(0) guaranteed to at least give up control?

The way I read my UnixWare 7's man page, it might not, since alarm(0)
just cancels the alarm...

Larry

- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 (voice) Internet: ler@lerctr.org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749

#12

bruce@momjian.us

over 25 years ago

In reply to: Don Baccus (#10)

Re: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

At 02:13 PM 11/16/00 -0500, Bruce Momjian wrote:

I think the default should probably be no delay, and the documentation
on enabling this needs to be clear and obvious (i.e. hard to miss).

I just talked to Tom Lane about this. I think a sleep(0) just before
the flush would be the best. It would reliquish the cpu slice if
another process is ready to run. If no other backend is running, it
probably just returns. If there is another one, it gives it a chance to
complete. On return from sleep(0), it can check if it still needs to
flush. This would tend to bunch up flushers so they flush only once,
while not delaying cases where only one backend is running.

This sounds like an interesting approach, yes.

In OS kernel design, you try to avoid process herding bottlenecks.
Here, we want them herded, and giving up the CPU may be the best way to
do it.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#13

bruce@momjian.us

over 25 years ago

In reply to: Larry Rosenman (#11)

Re: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

* Don Baccus <dhogaza@pacifier.com> [001116 13:46]:

At 02:13 PM 11/16/00 -0500, Bruce Momjian wrote:

I think the default should probably be no delay, and the documentation
on enabling this needs to be clear and obvious (i.e. hard to miss).

I just talked to Tom Lane about this. I think a sleep(0) just before
the flush would be the best. It would reliquish the cpu slice if
another process is ready to run. If no other backend is running, it
probably just returns. If there is another one, it gives it a chance to
complete. On return from sleep(0), it can check if it still needs to
flush. This would tend to bunch up flushers so they flush only once,
while not delaying cases where only one backend is running.

This sounds like an interesting approach, yes.

Question: Is sleep(0) guaranteed to at least give up control?

The way I read my UnixWare 7's man page, it might not, since alarm(0)
just cancels the alarm...

Well, it certainly is a kernel call, and most OS's re-evaluate on kernel
call return.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#14

ler@lerctr.org

over 25 years ago

In reply to: Bruce Momjian (#13)

Re: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

* Bruce Momjian <pgman@candle.pha.pa.us> [001116 14:02]:

This sounds like an interesting approach, yes.

Question: Is sleep(0) guaranteed to at least give up control?

The way I read my UnixWare 7's man page, it might not, since alarm(0)
just cancels the alarm...

Well, it certainly is a kernel call, and most OS's re-evaluate on kernel
call return.

BUT, do we know for sure that sleep(0) is not optimized in the library
to just return?
--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 (voice) Internet: ler@lerctr.org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749

#15

bright@wintelcom.net

over 25 years ago

In reply to: Bruce Momjian (#12)

Re: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

* Bruce Momjian <pgman@candle.pha.pa.us> [001116 11:59] wrote:

At 02:13 PM 11/16/00 -0500, Bruce Momjian wrote:

I think the default should probably be no delay, and the documentation
on enabling this needs to be clear and obvious (i.e. hard to miss).

I just talked to Tom Lane about this. I think a sleep(0) just before
the flush would be the best. It would reliquish the cpu slice if
another process is ready to run. If no other backend is running, it
probably just returns. If there is another one, it gives it a chance to
complete. On return from sleep(0), it can check if it still needs to
flush. This would tend to bunch up flushers so they flush only once,
while not delaying cases where only one backend is running.

This sounds like an interesting approach, yes.

In OS kernel design, you try to avoid process herding bottlenecks.
Here, we want them herded, and giving up the CPU may be the best way to
do it.

Yes, but if everyone yeilds you're back where you started, and with
128 or more backends do you really want to cause possibly that many
context switches per fsync?

--
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."

#16

bruce@momjian.us

over 25 years ago

In reply to: Alfred Perlstein (#15)

Re: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

In OS kernel design, you try to avoid process herding bottlenecks.
Here, we want them herded, and giving up the CPU may be the best way to
do it.

Yes, but if everyone yeilds you're back where you started, and with
128 or more backends do you really want to cause possibly that many
context switches per fsync?

You are going to kernel call/yield anyway to fsync, so why not try and
if someone does the fsync, we don't need to do it. I am suggesting
re-checking the need for fsync after the return from sleep(0).

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#17

bright@wintelcom.net

over 25 years ago

In reply to: Larry Rosenman (#14)

Re: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

* Larry Rosenman <ler@lerctr.org> [001116 12:09] wrote:

* Bruce Momjian <pgman@candle.pha.pa.us> [001116 14:02]:

This sounds like an interesting approach, yes.

Question: Is sleep(0) guaranteed to at least give up control?

The way I read my UnixWare 7's man page, it might not, since alarm(0)
just cancels the alarm...

Well, it certainly is a kernel call, and most OS's re-evaluate on kernel
call return.

BUT, do we know for sure that sleep(0) is not optimized in the library
to just return?

sleep(3) should conform to POSIX specification, if anyone has the
reference they can check it to see what the effect of sleep(0)
should be.

--
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."

#18

bruce@momjian.us

over 25 years ago

In reply to: Larry Rosenman (#14)

Re: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

* Bruce Momjian <pgman@candle.pha.pa.us> [001116 14:02]:

This sounds like an interesting approach, yes.

Question: Is sleep(0) guaranteed to at least give up control?

The way I read my UnixWare 7's man page, it might not, since alarm(0)
just cancels the alarm...

Well, it certainly is a kernel call, and most OS's re-evaluate on kernel
call return.

BUT, do we know for sure that sleep(0) is not optimized in the library
to just return?

We can only do our best here. I think guessing whether other backends
are _about_ to commit is pretty shaky, and sleeping every time is a
waste. This seems the cleanest.

Funny you should mention the optimization. I just checked BSDI and saw:

u_int
sleep(secs)
u_int secs;
{
struct timeval nt, ot;
long diff;
int rc;

if (secs == 0)
return (0);

So maybe we need another _fake_ kernel call, or a select/usleep with a
very small value.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#19

Peter Eisentraut

peter_e@gmx.net

over 25 years ago

In reply to: Bruce Momjian (#13)

Re: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

Bruce Momjian writes:

The way I read my UnixWare 7's man page, it might not, since alarm(0)
just cancels the alarm...

Well, it certainly is a kernel call, and most OS's re-evaluate on kernel
call return.

In glibc, sleep(0) just does "return 0;", so if the compiler has a good
day the call will disappear completely.

--
Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/

#20

vmikheev@SECTORBASE.COM

over 25 years ago

In reply to: Alfred Perlstein (#17)

RE: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

BUT, do we know for sure that sleep(0) is not optimized in
the library to just return?

We can only do our best here. I think guessing whether other backends
are _about_ to commit is pretty shaky, and sleeping every time is a
waste. This seems the cleanest.

A long ago you, Bruce, made me gift - book about transaction processing
(thanks again -:)). This sleeping before fsync in commit is described
there as standard technique. And the reason is cleanest.
Men, cost of fsync is very high! { write (64 bytes) + fsync() }
takes ~ 1/50 sec. Yes, additional 1/200 sec or so results in worse
performance when there is only one backend running but greatly
increase overall performance for 100 simultaneous backends. Ie this
delay is trade off to gain better scalability.

I agreed that it must be configurable, smaller or probably 0 by
default, use approximate # of simultaneously running backends for
guessing (postmaster could maintain this number in shmem and
backends could just read it without any locking - exact number is
not required), good described as tuning patameter in documentation.
Anyway I object sleep(0).

Vadim

Import Notes

Resolved by subject fallback

#21

bright@wintelcom.net

over 25 years ago

In reply to: Bruce Momjian (#16)

#22

tgl@sss.pgh.pa.us

over 25 years ago

In reply to: Alfred Perlstein (#21)

#23

vmikheev@SECTORBASE.COM

over 25 years ago

In reply to: Alfred Perlstein (#21)

#24

bright@wintelcom.net

over 25 years ago

In reply to: Tom Lane (#22)

#25

tgl@sss.pgh.pa.us

over 25 years ago

In reply to: Alfred Perlstein (#24)

#26

tgl@sss.pgh.pa.us

over 25 years ago

In reply to: Mikheev, Vadim (#20)

#27

Tom

tom@sdf.com

over 25 years ago

In reply to: Alfred Perlstein (#17)

#28

bruce@momjian.us

over 25 years ago

In reply to: Tom (#27)

#29

ler@lerctr.org

over 25 years ago

In reply to: Bruce Momjian (#28)

#30

bruce@momjian.us

over 25 years ago

In reply to: Larry Rosenman (#29)

#31

ler@lerctr.org

over 25 years ago

In reply to: Bruce Momjian (#30)

#32

bruce@momjian.us

over 25 years ago

In reply to: Tom (#27)

#33

bruce@momjian.us

over 25 years ago

In reply to: Tom Lane (#26)

#34

tgl@sss.pgh.pa.us

over 25 years ago

In reply to: Bruce Momjian (#32)

#35

bruce@momjian.us

over 25 years ago

In reply to: Tom Lane (#34)

#36

bruce@momjian.us

over 25 years ago

In reply to: Bruce Momjian (#35)

#37

ler@lerctr.org

over 25 years ago

In reply to: Tom Lane (#34)

#38

bruce@momjian.us

over 25 years ago

In reply to: Larry Rosenman (#37)

#39

bruce@momjian.us

over 25 years ago

In reply to: Bruce Momjian (#38)

#40

tgl@sss.pgh.pa.us

over 25 years ago

In reply to: Bruce Momjian (#38)

#41

bruce@momjian.us

over 25 years ago

In reply to: Tom Lane (#40)

#42

tgl@sss.pgh.pa.us

over 25 years ago

In reply to: Bruce Momjian (#39)

#43

Peter Eisentraut

peter_e@gmx.net

over 25 years ago

In reply to: Larry Rosenman (#37)

#44

Peter Eisentraut

peter_e@gmx.net

over 25 years ago

In reply to: Tom Lane (#42)

#45

bruce@momjian.us

over 25 years ago

In reply to: Peter Eisentraut (#44)

#46

tgl@sss.pgh.pa.us

over 25 years ago

In reply to: Peter Eisentraut (#44)

#47

Martin Devera

devik@cdi.cz

over 25 years ago

In reply to: Mikheev, Vadim (#23)

#48

Don Baccus

dhogaza@pacifier.com

over 25 years ago

In reply to: Martin Devera (#47)

#49

bruce@momjian.us

over 25 years ago

In reply to: Tom Lane (#46)

#50

vmikheev@SECTORBASE.COM

over 25 years ago

In reply to: Bruce Momjian (#32)

#51

vmikheev@SECTORBASE.COM

over 25 years ago

In reply to: Mikheev, Vadim (#23)

#52

Kevin Brown

kevin@sysexperts.com

over 25 years ago

In reply to: Bruce Momjian (#33)

#53

tgl@sss.pgh.pa.us

over 25 years ago

In reply to: Kevin Brown (#52)

#54

bruce@momjian.us

about 25 years ago

In reply to: Mikheev, Vadim (#20)

#55

Manuel Cabido

manny@msuiit.edu.ph

about 25 years ago

In reply to: Bruce Momjian (#54)

#56