fsynch of pg_log write..
After the discussion about implementing a flag that
would selectively disable fsynch on the pg_log file,
I visited xact.c and tried a little test.
The code in RecordTransactionCommit looks essentially like
(ignoring stuff related to leaks)
FlushBufferPool /* flush and fsync the data blocks */
TransactionIdCommit /* log the fact that the transaction's done */
FlushBufferPool /* flush and fsync pg_log and whatever else
has changed during this brief period of time */
I just added a couple of lines of code that saves
disableFsync and sets it true before the second call
to FlushBufferPool, restoring it to its original state
afterwards.
Running without "-F", my disk is blessedly silent when
I access my web pages that hit the database several times
with read-only selects used to customize the presentation
to the user.
Cool!
So...does it sound like I'm doing the right thing?
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, and other goodies at
http://donb.photo.net
Don Baccus wrote:
FlushBufferPool /* flush and fsync the data blocks */
TransactionIdCommit /* log the fact that the transaction's done */
FlushBufferPool /* flush and fsync pg_log and whatever else
has changed during this brief period of time */I just added a couple of lines of code that saves
disableFsync and sets it true before the second call
to FlushBufferPool, restoring it to its original state
afterwards.
...
So...does it sound like I'm doing the right thing?
It's bad in the case of concurrent writes, because of
second FlushBufferPool "flushes whatever else has changed during
this brief period of time".
Right way is just set some flag in WriteBuffer()/WriteNoReleaseBuffer()
and don't do
FlushBufferPool
TransactionIdCommit
FlushBufferPool
at all when this flag is not setted.
I'll do it for 6.5.1 if no one else...
Vadim
Vadim wrote:
Right way is just set some flag in WriteBuffer()/WriteNoReleaseBuffer()
and don't doFlushBufferPool
TransactionIdCommit
FlushBufferPoolat all when this flag is not setted.
While this is even much better for select only transactions
it will still do the second flush for writers.
This flush is not needed for those, that are only interested
in consistency, and don't care if the last transaction before
system/backend crash is lost.
It can actually really only be the very last transaction reported
ok to any client, that is rolled back, since all other xactions
will be flushed by this same first FlushBufferPool
(since BufferSync currently flushes all dirty Pages).
So IMHO a switch to avoid the second FlushBufferPool
would still be useful, even with this suggested fix.
Andreas
Import Notes
Resolved by subject fallback
At 10:18 AM 6/25/99 +0200, Zeugswetter Andreas IZ5 wrote:
Vadim wrote:
Right way is just set some flag in WriteBuffer()/WriteNoReleaseBuffer()
and don't doFlushBufferPool
TransactionIdCommit
FlushBufferPoolat all when this flag is not setted.
While this is even much better for select only transactions
it will still do the second flush for writers.
This flush is not needed for those, that are only interested
in consistency, and don't care if the last transaction before
system/backend crash is lost.
It can actually really only be the very last transaction reported
ok to any client, that is rolled back, since all other xactions
will be flushed by this same first FlushBufferPool
(since BufferSync currently flushes all dirty Pages).
So IMHO a switch to avoid the second FlushBufferPool
would still be useful, even with this suggested fix.
That was what I was wondering when I saw Vadim's post,
but seeing as yesterday was the first time I'd ever
dug into the Postgres source, I didn't really feel I
was on solid ground.
Obviously, skipping the entire flush/log id/flush cycle
for read only selects is the RIGHT thing to do. As is
ensuring that flushing the buffers only flushes those
modified by the transaction in question rather than
flushing the world...
For now, though, I don't mind living with my simple
hack if indeed it simply means I risk losing a transaction
during a crash. Or, actually, have simply increased that risk
(the sequence flush/log id/CRASH is possible, after all).
I'm a lot more comfortable with this than with the potential
damage done during a crash when fsync'ing both log file and
data is disabled, when the log can then be written by the
system followed by a crash before the data tuples make it
to disk.
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, and other goodies at
http://donb.photo.net
At 11:23 AM 6/25/99 +0200, Zeugswetter Andreas IZ5 wrote:
So...does it sound like I'm doing the right thing?
I don't see how this could be, since the first FlushBuffer still does
the sync.
Oh-uh, you're right of course. The first select doesn't hit
the disk, the next one does during the first flugh. Silly me,
where was my head?
Sigh.
So, Vadim's "right way" is also the "only way", it would appear.
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, and other goodies at
http://donb.photo.net
Import Notes
Reply to msg id not found: 219F68D65015D011A8E000006F8590C60267B3B2@sdexcsrv1.f000.d0188.sd.spardat.at | Resolved by subject fallback
For now, though, I don't mind living with my simple
hack if indeed it simply means I risk losing a transaction
during a crash. Or, actually, have simply increased that risk
(the sequence flush/log id/CRASH is possible, after all).
No. This is why Vadim wants the second flush. If the machine
crashes like you describe the client will not be told "transaction
committed". The problem is when a client is told something,
that is not true after a crash, which can happen if the second
flush is left out.
I'm a lot more comfortable with this than with the potential
damage done during a crash when fsync'ing both log file and
data is disabled, when the log can then be written by the
system followed by a crash before the data tuples make it
to disk.
Yes, this is a much better situation.
Andreas
Import Notes
Resolved by subject fallback
For now, though, I don't mind living with my simple
hack if indeed it simply means I risk losing a transaction
during a crash. Or, actually, have simply increased that risk
(the sequence flush/log id/CRASH is possible, after all).No. This is why Vadim wants the second flush. If the machine
crashes like you describe the client will not be told "transaction
committed". The problem is when a client is told something,
that is not true after a crash, which can happen if the second
flush is left out.
But commercial db's do that. They return 'done' for every query, while
they write they log files ever X seconds. We need to allow this. No
reason to be more reliable than commercial db's by default. Or, at
least we need to give them the option because the speed advantage is
huge.
I'm a lot more comfortable with this than with the potential
damage done during a crash when fsync'ing both log file and
data is disabled, when the log can then be written by the
system followed by a crash before the data tuples make it
to disk.Yes, this is a much better situation.
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
committed". The problem is when a client is told something,
that is not true after a crash, which can happen if the second
flush is left out.But commercial db's do that. They return 'done' for every query, while
they write they log files ever X seconds. We need to allow this. No
reason to be more reliable than commercial db's by default. Or, at
least we need to give them the option because the speed advantage is
huge.
I agree, but commercial db's don't do that.
Oracle does not (only on Linux).
Informix only does it when you specially create the database
(create database dada with buffered log;) I always use it :-)
Informix has a log buffer, which is flushed at transaction commit
(unbuffered logging) or when the buffer is full (buffered logging).
None of them do any "every X seconds stuff".
Andreas
Import Notes
Resolved by subject fallback
committed". The problem is when a client is told something,
that is not true after a crash, which can happen if the second
flush is left out.But commercial db's do that. They return 'done' for every query, while
they write they log files ever X seconds. We need to allow this. No
reason to be more reliable than commercial db's by default. Or, at
least we need to give them the option because the speed advantage is
huge.I agree, but commercial db's don't do that.
Oracle does not (only on Linux).
Informix only does it when you specially create the database
(create database dada with buffered log;) I always use it :-)
Informix has a log buffer, which is flushed at transaction commit
(unbuffered logging) or when the buffer is full (buffered logging).
None of them do any "every X seconds stuff".
Yes! All my clients use Informix buffered logging. Now, these are law
firms running their billing systems using Informix. The 'buffer full'
write is kind of limited in that it does not give a good time limit on
vulnerability. It has to do this because it wants to write a full tape
block. Newer versions worked around this with some kind of intermediate
fix(not sure). Anyway, having a time limit in the fsync will give us
goo performance with a reliable/limited exposure to risk.
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Zeugswetter Andreas IZ5 wrote:
Vadim wrote:
Right way is just set some flag in WriteBuffer()/WriteNoReleaseBuffer()
and don't doFlushBufferPool
TransactionIdCommit
FlushBufferPoolat all when this flag is not setted.
While this is even much better for select only transactions
it will still do the second flush for writers.
This flush is not needed for those, that are only interested
in consistency, and don't care if the last transaction before
system/backend crash is lost.
It can actually really only be the very last transaction reported
ok to any client, that is rolled back, since all other xactions
will be flushed by this same first FlushBufferPool
(since BufferSync currently flushes all dirty Pages).
So IMHO a switch to avoid the second FlushBufferPool
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
would still be useful, even with this suggested fix.
I didn't object this.
Vadim