fsynch of pg_log write..

Started by Don Baccusover 26 years ago10 messages

dhogaza@pacifier.com

over 26 years ago

After the discussion about implementing a flag that
would selectively disable fsynch on the pg_log file,
I visited xact.c and tried a little test.

The code in RecordTransactionCommit looks essentially like
(ignoring stuff related to leaks)

FlushBufferPool /* flush and fsync the data blocks */
TransactionIdCommit /* log the fact that the transaction's done */
FlushBufferPool /* flush and fsync pg_log and whatever else
has changed during this brief period of time */

I just added a couple of lines of code that saves
disableFsync and sets it true before the second call
to FlushBufferPool, restoring it to its original state
afterwards.

Running without "-F", my disk is blessedly silent when
I access my web pages that hit the database several times
with read-only selects used to customize the presentation
to the user.

Cool!

So...does it sound like I'm doing the right thing?

- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, and other goodies at
http://donb.photo.net

Vadim Mikheev

vadim@krs.ru

over 26 years ago

In reply to: Don Baccus (#1)

Re: [HACKERS] fsynch of pg_log write..

Don Baccus wrote:

FlushBufferPool /* flush and fsync the data blocks */
TransactionIdCommit /* log the fact that the transaction's done */
FlushBufferPool /* flush and fsync pg_log and whatever else
has changed during this brief period of time */

I just added a couple of lines of code that saves
disableFsync and sets it true before the second call
to FlushBufferPool, restoring it to its original state
afterwards.

...

So...does it sound like I'm doing the right thing?

It's bad in the case of concurrent writes, because of
second FlushBufferPool "flushes whatever else has changed during
this brief period of time".

Right way is just set some flag in WriteBuffer()/WriteNoReleaseBuffer()
and don't do

FlushBufferPool
TransactionIdCommit
FlushBufferPool

at all when this flag is not setted.

I'll do it for 6.5.1 if no one else...

Vadim

Zeugswetter Andreas IZ5

Andreas.Zeugswetter@telecom.at

over 26 years ago

In reply to: Vadim Mikheev (#2)

Re: [HACKERS] fsynch of pg_log write..

Vadim wrote:

Right way is just set some flag in WriteBuffer()/WriteNoReleaseBuffer()
and don't do

FlushBufferPool
TransactionIdCommit
FlushBufferPool

at all when this flag is not setted.

While this is even much better for select only transactions
it will still do the second flush for writers.
This flush is not needed for those, that are only interested
in consistency, and don't care if the last transaction before
system/backend crash is lost.
It can actually really only be the very last transaction reported
ok to any client, that is rolled back, since all other xactions
will be flushed by this same first FlushBufferPool
(since BufferSync currently flushes all dirty Pages).
So IMHO a switch to avoid the second FlushBufferPool
would still be useful, even with this suggested fix.

Andreas

Import Notes

Resolved by subject fallback

Don Baccus

dhogaza@pacifier.com

over 26 years ago

In reply to: Zeugswetter Andreas IZ5 (#3)

Re: [HACKERS] fsynch of pg_log write..

At 10:18 AM 6/25/99 +0200, Zeugswetter Andreas IZ5 wrote:

Vadim wrote:

Right way is just set some flag in WriteBuffer()/WriteNoReleaseBuffer()
and don't do

FlushBufferPool
TransactionIdCommit
FlushBufferPool

at all when this flag is not setted.

While this is even much better for select only transactions
it will still do the second flush for writers.
This flush is not needed for those, that are only interested
in consistency, and don't care if the last transaction before
system/backend crash is lost.
It can actually really only be the very last transaction reported
ok to any client, that is rolled back, since all other xactions
will be flushed by this same first FlushBufferPool
(since BufferSync currently flushes all dirty Pages).
So IMHO a switch to avoid the second FlushBufferPool
would still be useful, even with this suggested fix.

That was what I was wondering when I saw Vadim's post,
but seeing as yesterday was the first time I'd ever
dug into the Postgres source, I didn't really feel I
was on solid ground.

Obviously, skipping the entire flush/log id/flush cycle
for read only selects is the RIGHT thing to do. As is
ensuring that flushing the buffers only flushes those
modified by the transaction in question rather than
flushing the world...

For now, though, I don't mind living with my simple
hack if indeed it simply means I risk losing a transaction
during a crash. Or, actually, have simply increased that risk
(the sequence flush/log id/CRASH is possible, after all).

I'm a lot more comfortable with this than with the potential
damage done during a crash when fsync'ing both log file and
data is disabled, when the log can then be written by the
system followed by a crash before the data tuples make it
to disk.

- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, and other goodies at
http://donb.photo.net

Don Baccus

dhogaza@pacifier.com

over 26 years ago

In reply to: Don Baccus (#4)

Re: [HACKERS] fsynch of pg_log write..

At 11:23 AM 6/25/99 +0200, Zeugswetter Andreas IZ5 wrote:

So...does it sound like I'm doing the right thing?

I don't see how this could be, since the first FlushBuffer still does
the sync.

Oh-uh, you're right of course. The first select doesn't hit
the disk, the next one does during the first flugh. Silly me,
where was my head?

Sigh.

So, Vadim's "right way" is also the "only way", it would appear.

- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, and other goodies at
http://donb.photo.net

Import Notes

Reply to msg id not found: 219F68D65015D011A8E000006F8590C60267B3B2@sdexcsrv1.f000.d0188.sd.spardat.at | Resolved by subject fallback

Zeugswetter Andreas IZ5

Andreas.Zeugswetter@telecom.at

over 26 years ago

In reply to: Don Baccus (#5)

Re: [HACKERS] fsynch of pg_log write..

For now, though, I don't mind living with my simple
hack if indeed it simply means I risk losing a transaction
during a crash. Or, actually, have simply increased that risk
(the sequence flush/log id/CRASH is possible, after all).

No. This is why Vadim wants the second flush. If the machine
crashes like you describe the client will not be told "transaction
committed". The problem is when a client is told something,
that is not true after a crash, which can happen if the second
flush is left out.

I'm a lot more comfortable with this than with the potential
damage done during a crash when fsync'ing both log file and
data is disabled, when the log can then be written by the
system followed by a crash before the data tuples make it
to disk.

Yes, this is a much better situation.

Andreas

Import Notes

Resolved by subject fallback

Bruce Momjian

maillist@candle.pha.pa.us

over 26 years ago

In reply to: Zeugswetter Andreas IZ5 (#6)

Re: [HACKERS] fsynch of pg_log write..

For now, though, I don't mind living with my simple
hack if indeed it simply means I risk losing a transaction
during a crash. Or, actually, have simply increased that risk
(the sequence flush/log id/CRASH is possible, after all).

No. This is why Vadim wants the second flush. If the machine
crashes like you describe the client will not be told "transaction
committed". The problem is when a client is told something,
that is not true after a crash, which can happen if the second
flush is left out.

But commercial db's do that. They return 'done' for every query, while
they write they log files ever X seconds. We need to allow this. No
reason to be more reliable than commercial db's by default. Or, at
least we need to give them the option because the speed advantage is
huge.

I'm a lot more comfortable with this than with the potential
damage done during a crash when fsync'ing both log file and
data is disabled, when the log can then be written by the
system followed by a crash before the data tuples make it
to disk.

Yes, this is a much better situation.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Zeugswetter Andreas IZ5

Andreas.Zeugswetter@telecom.at

over 26 years ago

In reply to: Bruce Momjian (#7)

AW: [HACKERS] fsynch of pg_log write..

committed". The problem is when a client is told something,
that is not true after a crash, which can happen if the second
flush is left out.

But commercial db's do that. They return 'done' for every query, while
they write they log files ever X seconds. We need to allow this. No
reason to be more reliable than commercial db's by default. Or, at
least we need to give them the option because the speed advantage is
huge.

I agree, but commercial db's don't do that.
Oracle does not (only on Linux).
Informix only does it when you specially create the database
(create database dada with buffered log;) I always use it :-)
Informix has a log buffer, which is flushed at transaction commit
(unbuffered logging) or when the buffer is full (buffered logging).
None of them do any "every X seconds stuff".

Andreas

Import Notes

Resolved by subject fallback

Bruce Momjian

maillist@candle.pha.pa.us

over 26 years ago

In reply to: Zeugswetter Andreas IZ5 (#8)

Re: AW: [HACKERS] fsynch of pg_log write..

committed". The problem is when a client is told something,
that is not true after a crash, which can happen if the second
flush is left out.

But commercial db's do that. They return 'done' for every query, while
they write they log files ever X seconds. We need to allow this. No
reason to be more reliable than commercial db's by default. Or, at
least we need to give them the option because the speed advantage is
huge.

I agree, but commercial db's don't do that.
Oracle does not (only on Linux).
Informix only does it when you specially create the database
(create database dada with buffered log;) I always use it :-)
Informix has a log buffer, which is flushed at transaction commit
(unbuffered logging) or when the buffer is full (buffered logging).
None of them do any "every X seconds stuff".

Yes! All my clients use Informix buffered logging. Now, these are law
firms running their billing systems using Informix. The 'buffer full'
write is kind of limited in that it does not give a good time limit on
vulnerability. It has to do this because it wants to write a full tape
block. Newer versions worked around this with some kind of intermediate
fix(not sure). Anyway, having a time limit in the fsync will give us
goo performance with a reliable/limited exposure to risk.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#10

Vadim Mikheev

vadim@krs.ru

over 26 years ago

In reply to: Zeugswetter Andreas IZ5 (#3)

Re: [HACKERS] fsynch of pg_log write..

Zeugswetter Andreas IZ5 wrote:

Vadim wrote:

Right way is just set some flag in WriteBuffer()/WriteNoReleaseBuffer()
and don't do

FlushBufferPool
TransactionIdCommit
FlushBufferPool

at all when this flag is not setted.

While this is even much better for select only transactions
it will still do the second flush for writers.
This flush is not needed for those, that are only interested
in consistency, and don't care if the last transaction before
system/backend crash is lost.
It can actually really only be the very last transaction reported
ok to any client, that is rolled back, since all other xactions
will be flushed by this same first FlushBufferPool
(since BufferSync currently flushes all dirty Pages).
So IMHO a switch to avoid the second FlushBufferPool

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

would still be useful, even with this suggested fix.

I didn't object this.

Vadim