COMMIT NOWAIT Performance Option
Proposal: Implement a new option for COMMIT, for enhancing performance,
providing a MySQL-like trade-off between performance and robustness for
*only* those that want it.
COMMIT NOWAIT
This form of COMMIT will *not* perform XLogFlush(), but will rely on a
special background process to perform regular WAL fsyncs (see later).
COMMIT NOWAIT can co-exist with the normal form of COMMIT and does not
threaten the consistency or robustness of other COMMIT modes. Read that
again and think about it, before we go further, please. Normal COMMIT
still guarantees to flush all of WAL up to the point of the commit,
whether or not the previous commits have requested that.
Mixing COMMIT NOWAIT with other modes does not effect the performance of
other backends - those that specify that mode are faster, those that do
not simply go at the same speed they did before. This is important,
because it allows us to have a fully robust server, yet with certain
critical applications going along much faster. No need for an
all-or-nothing approach at db cluster level.
Unlike fsync = off, WAL is always consistent and the server can be
recovered easily, though with some potential for data loss for
transactions that chose the COMMIT NOWAIT option. Sounds like a hole
there: normal COMMITs that rely on data written by COMMIT NOWAIT
transactions are still safe, because the normal COMMIT is still bound by
the guarantee to go to disk. The buffer manager/WAL interlock is not
effected by this change and remains in place, as it should.
This implements the TODO item:
--Allow buffered WAL writes and fsync
"Instead of guaranteeing recovery of all committed transactions, this
would provide improved performance by delaying WAL writes and fsync so
an abrupt operating system restart might lose a few seconds of committed
transactions but still be consistent. We could perhaps remove the
'fsync' parameter (which results in an an inconsistent database) in
favor of this capability."
Why do we want this?? Because some apps have *lots* of data and many
really don't care whether they lose a few records. Honestly, I've met
people that want this, even after 2 hours of discussion and
understanding. Plus probably lots of MySQLers also.
User Control
------------
New commit mode is available by explicit command, or as a default
setting that will be applied to all COMMITs, or both.
The full syntax would be COMMIT [WRITE] NOWAIT [IMMEDIATE], for Oracle
compatibility (why choose incompatibility?). Note that this is not a
transaction start setting like Isolation Level; this happens at end of
transaction. The syntax for END is unchanged, defaulting to normal
behaviour unless overridden.
New userset GUC, commit_wait_default = on (default) | off
We change the meaning of the commit_delay parameter:
- If commit_delay = 0 then commit_wait_default cannot be set off.
- WAL will be flushed every commit_delay milliseconds; if no flush is
required this will do nothing very quickly, so there is little overhead
of no COMMIT NOWAIT commits have been made.
Implementation
--------------
COMMIT NOWAIT in xact.c simply ignores XLogFlush and returns.
Who does the XLogFlush? Well, my recommendation is a totally new
process, WALWriter. But I can see that many of you will say bgwriter
should be the person to do this work. IMHO doing WAL flushes will take
time and thats time that bgwriter really needs to do other things, plus
it can't really guarantee to do flush regularly when its doing
checkpoints.
When commit_delay > 0 then the WALwriter will startup, or shutdown if
commit_delay = 0.
WALWriter will XLogFlush every commit_delay milliseconds.
A prototype patch is posted to -patches, which is WORK IN PROGRESS.
The following TODO items remain
1. discuss which process will issue regular XLogFlush(). If agreed,
implement WALWriter process to perform this task. (Yes, the patch isn't
fully implemented, yet).
2. remove fsync parameter
3. Prevent COMMIT NOWAIT when commit_delay = 0
4. Discuss whether commit_delay is OK to usurp; twas just an earlier
suggestion from someone else, can go either way.
5. docs
The remaining items can be completed very quickly if this proposal is
acceptable. (I wrote this over Christmas, so it turning up now isn't a
rushed proposal and I'm pretty certain it ain't broke).
Comments?
--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com
Why do we want this?? Because some apps have *lots* of data and many
really don't care whether they lose a few records. Honestly, I've met
people that want this, even after 2 hours of discussion and
understanding. Plus probably lots of MySQLers also.
Most users will take speed over data loss any day. Whether we want to
admit it or not.
I think these feature is a good middle ground.
Sincerely,
Joshua D. Drake
User Control
------------New commit mode is available by explicit command, or as a default
setting that will be applied to all COMMITs, or both.The full syntax would be COMMIT [WRITE] NOWAIT [IMMEDIATE], for Oracle
compatibility (why choose incompatibility?). Note that this is not a
transaction start setting like Isolation Level; this happens at end of
transaction. The syntax for END is unchanged, defaulting to normal
behaviour unless overridden.New userset GUC, commit_wait_default = on (default) | off
We change the meaning of the commit_delay parameter:
- If commit_delay = 0 then commit_wait_default cannot be set off.
- WAL will be flushed every commit_delay milliseconds; if no flush is
required this will do nothing very quickly, so there is little overhead
of no COMMIT NOWAIT commits have been made.Implementation
--------------COMMIT NOWAIT in xact.c simply ignores XLogFlush and returns.
Who does the XLogFlush? Well, my recommendation is a totally new
process, WALWriter. But I can see that many of you will say bgwriter
should be the person to do this work. IMHO doing WAL flushes will take
time and thats time that bgwriter really needs to do other things, plus
it can't really guarantee to do flush regularly when its doing
checkpoints.When commit_delay > 0 then the WALwriter will startup, or shutdown if
commit_delay = 0.WALWriter will XLogFlush every commit_delay milliseconds.
A prototype patch is posted to -patches, which is WORK IN PROGRESS.
The following TODO items remain
1. discuss which process will issue regular XLogFlush(). If agreed,
implement WALWriter process to perform this task. (Yes, the patch isn't
fully implemented, yet).
2. remove fsync parameter
3. Prevent COMMIT NOWAIT when commit_delay = 0
4. Discuss whether commit_delay is OK to usurp; twas just an earlier
suggestion from someone else, can go either way.
5. docsThe remaining items can be completed very quickly if this proposal is
acceptable. (I wrote this over Christmas, so it turning up now isn't a
rushed proposal and I'm pretty certain it ain't broke).Comments?
--
=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/
Simon Riggs wrote:
Proposal: Implement a new option for COMMIT, for enhancing performance,
providing a MySQL-like trade-off between performance and robustness for
*only* those that want it.COMMIT NOWAIT
This form of COMMIT will *not* perform XLogFlush(), but will rely on a
special background process to perform regular WAL fsyncs (see later).COMMIT NOWAIT can co-exist with the normal form of COMMIT and does not
threaten the consistency or robustness of other COMMIT modes. Read that
again and think about it, before we go further, please. Normal COMMIT
still guarantees to flush all of WAL up to the point of the commit,
whether or not the previous commits have requested that.Mixing COMMIT NOWAIT with other modes does not effect the performance of
other backends - those that specify that mode are faster, those that do
not simply go at the same speed they did before. This is important,
because it allows us to have a fully robust server, yet with certain
critical applications going along much faster. No need for an
all-or-nothing approach at db cluster level.Unlike fsync = off, WAL is always consistent and the server can be
recovered easily, though with some potential for data loss for
transactions that chose the COMMIT NOWAIT option. Sounds like a hole
there: normal COMMITs that rely on data written by COMMIT NOWAIT
transactions are still safe, because the normal COMMIT is still bound by
the guarantee to go to disk. The buffer manager/WAL interlock is not
effected by this change and remains in place, as it should.
OK, so do I have this right?
A is issuing COMMIT NOWAIT once a second
B is issuing COMMIT every other second
Checkpoints happen every 10 seconds
Every 10 seconds we have a checkpoint and all WAL+data are on-disk.
Every 2 seconds a standard COMMIT occurs and A+B are synced to WAL
In-between COMMIT NOWAIT occurs and no data is written to disk - neither
WAL nor data-blocks.
So, if I have a load of connections issuing standard COMMITs regularly
then I'll not see much performance-gain. However, I can guarantee any
data visible to those transactions is committed to disk.
If I have only one connection to the database and that uses only COMMIT
NOWAIT, it should approach the speed of fsync=off.
So this is for:
1. data-loading where I don't want to use fsync=off (for whatever reason)
2. Data-logging type apps
Have I got the gist of that?
--
Richard Huxton
Archonet Ltd
On Mon, 2007-02-26 at 23:25 +0000, Richard Huxton wrote:
Simon Riggs wrote:
Proposal: Implement a new option for COMMIT, for enhancing performance,
providing a MySQL-like trade-off between performance and robustness for
*only* those that want it.COMMIT NOWAIT
This form of COMMIT will *not* perform XLogFlush(), but will rely on a
special background process to perform regular WAL fsyncs (see later).COMMIT NOWAIT can co-exist with the normal form of COMMIT and does not
threaten the consistency or robustness of other COMMIT modes. Read that
again and think about it, before we go further, please. Normal COMMIT
still guarantees to flush all of WAL up to the point of the commit,
whether or not the previous commits have requested that.Mixing COMMIT NOWAIT with other modes does not effect the performance of
other backends - those that specify that mode are faster, those that do
not simply go at the same speed they did before. This is important,
because it allows us to have a fully robust server, yet with certain
critical applications going along much faster. No need for an
all-or-nothing approach at db cluster level.Unlike fsync = off, WAL is always consistent and the server can be
recovered easily, though with some potential for data loss for
transactions that chose the COMMIT NOWAIT option. Sounds like a hole
there: normal COMMITs that rely on data written by COMMIT NOWAIT
transactions are still safe, because the normal COMMIT is still bound by
the guarantee to go to disk. The buffer manager/WAL interlock is not
effected by this change and remains in place, as it should.OK, so do I have this right?
A is issuing COMMIT NOWAIT once a second
B is issuing COMMIT every other second
Checkpoints happen every 10 secondsEvery 10 seconds we have a checkpoint and all WAL+data are on-disk.
Every 2 seconds a standard COMMIT occurs and A+B are synced to WAL
In-between COMMIT NOWAIT occurs and no data is written to disk - neither
WAL nor data-blocks.
Right, but there'd also be a WAL flush every 100ms or so, according to
how you set the parameter. So the window of data loss is controllable
for both efficiency and user selected robustness for those transactions
that want it.
(Plus the point that a COMMIT doesn't send data to disk anyway, only
WAL. This proposal does nothing to change that, I should add).
So, if I have a load of connections issuing standard COMMITs regularly
then I'll not see much performance-gain. However, I can guarantee any
data visible to those transactions is committed to disk.If I have only one connection to the database and that uses only COMMIT
NOWAIT, it should approach the speed of fsync=off.So this is for:
1. data-loading where I don't want to use fsync=off (for whatever reason)
2. Data-logging type appsHave I got the gist of that?
Pretty much.
Only the people using COMMIT NOWAIT see a performance gain, and of
course only if they are doing write transactions of relatively short
duration.
The interesting point is you can have a huge data grinding app, yet with
other tables alongside that hold more important data. In that scenario,
90% of the data would be COMMIT NOWAIT, whilst the small important data
is safe.
--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com
Simon Riggs wrote:
The interesting point is you can have a huge data grinding app, yet with
other tables alongside that hold more important data. In that scenario,
90% of the data would be COMMIT NOWAIT, whilst the small important data
is safe.
Does this means that the regular COMMIT is slower because it has to
force more data to disk? I imagine that this isn't the case, because
it's not the write itself that's slow; rather, it's the wait until the
fsync on WAL is reported complete. However, did you measure this?
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Feb 26, 2007, at 18:58 , Simon Riggs wrote:
On Mon, 2007-02-26 at 23:25 +0000, Richard Huxton wrote:
Simon Riggs wrote:
Proposal: Implement a new option for COMMIT, for enhancing
performance,
providing a MySQL-like trade-off between performance and
robustness for
*only* those that want it.COMMIT NOWAIT
Isn't the domain of transient data relegated to certain tables
instead of specific transactions? Wouldn't it be easier to create un-
wal-logged global temp tables?
-M
On Mon, 2007-02-26 at 22:56 +0000, Simon Riggs wrote:
Proposal: Implement a new option for COMMIT, for enhancing performance,
providing a MySQL-like trade-off between performance and robustness for
*only* those that want it.COMMIT NOWAIT
This form of COMMIT will *not* perform XLogFlush(), but will rely on a
special background process to perform regular WAL fsyncs (see later).
I would think that the type of people who would want this feature would
want it to be a property of the connection, not per-statement. For one
thing, many applications use solitary SQL statements without surrounding
them in BEGIN/COMMIT.
Regards,
Jeff Davis
Jeff Davis wrote:
On Mon, 2007-02-26 at 22:56 +0000, Simon Riggs wrote:
Proposal: Implement a new option for COMMIT, for enhancing performance,
providing a MySQL-like trade-off between performance and robustness for
*only* those that want it.COMMIT NOWAIT
This form of COMMIT will *not* perform XLogFlush(), but will rely on a
special background process to perform regular WAL fsyncs (see later).I would think that the type of people who would want this feature would
want it to be a property of the connection, not per-statement. For one
thing, many applications use solitary SQL statements without surrounding
them in BEGIN/COMMIT.
You can set the GUC locally in your connection, of course. (You can
even do it per role or per database by using ALTER/SET commands)
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
"Simon Riggs" <simon@2ndquadrant.com> writes:
COMMIT NOWAIT can co-exist with the normal form of COMMIT and does not
threaten the consistency or robustness of other COMMIT modes. Read that
again and think about it, before we go further, please.
I read that, and thought about it, and don't think I believe it. The
problem is that there are more interconnections between different
transactions than you're allowing for. In particular you need to
justify that the behavior is safe for non-transactional operations like
btree page splits and pg_clog buffer page writes. The idea that's
particularly bothering me at the moment is that after a system crash,
we might come back up in a state where a NOWAIT transaction appears
committed when its updates didn't all get to disk. "Database corrupt"
is a situation that threatens all your transactions...
New commit mode is available by explicit command, or as a default
setting that will be applied to all COMMITs, or both.
I dislike introducing new nonstandard syntax ("Oracle compatible" is not
standard). If we did this I'd vote for control via a GUC setting only;
I think that is more useful anyway, as an application can be made to run
with such a setting without invasive source code changes.
regards, tom lane
On Tue, February 27, 2007 06:06, Joshua D. Drake wrote:
Why do we want this?? Because some apps have *lots* of data and many
really don't care whether they lose a few records. Honestly, I've met
people that want this, even after 2 hours of discussion and
understanding. Plus probably lots of MySQLers also.Most users will take speed over data loss any day. Whether we want to
admit it or not.
In that case, wouldn't it make just as much sense to have an equivalent
for this special transaction mode on individual statements, without
transaction context? I'm guessing that who don't really know or want
transactions would never start one, running lots of loose statements
instead that otherwise get committed individually.
Jeroen
On Mon, Feb 26, 2007 at 10:56:58PM +0000, Simon Riggs wrote:
2. remove fsync parameter
Why? Wouldn't fsync=off still speed up checkpoints? ISTM you'd still
want this for things like database restores.
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
On Tue, Feb 27, 2007 at 11:05:45AM +0700, Jeroen T. Vermeulen wrote:
On Tue, February 27, 2007 06:06, Joshua D. Drake wrote:
Why do we want this?? Because some apps have *lots* of data and many
really don't care whether they lose a few records. Honestly, I've met
people that want this, even after 2 hours of discussion and
understanding. Plus probably lots of MySQLers also.Most users will take speed over data loss any day. Whether we want to
admit it or not.In that case, wouldn't it make just as much sense to have an equivalent
for this special transaction mode on individual statements, without
transaction context? I'm guessing that who don't really know or want
transactions would never start one, running lots of loose statements
instead that otherwise get committed individually.
I don't think it makes sense to optimize for people who can't be
bothered to learn about a transaction. In any case, that option is
there; you just set the GUC in the session.
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
On Mon, 2007-02-26 at 22:50 -0600, Jim C. Nasby wrote:
On Mon, Feb 26, 2007 at 10:56:58PM +0000, Simon Riggs wrote:
2. remove fsync parameter
Why? Wouldn't fsync=off still speed up checkpoints? ISTM you'd still
want this for things like database restores.
Well, it seemed to be part of the documented TODO, so I included it as a
discussion point.
I'm happy to leave it, personally, FWIW.
--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com
On Mon, 2007-02-26 at 21:20 -0300, Alvaro Herrera wrote:
Simon Riggs wrote:
The interesting point is you can have a huge data grinding app, yet with
other tables alongside that hold more important data. In that scenario,
90% of the data would be COMMIT NOWAIT, whilst the small important data
is safe.Does this means that the regular COMMIT is slower because it has to
force more data to disk? I imagine that this isn't the case, because
it's not the write itself that's slow; rather, it's the wait until the
fsync on WAL is reported complete. However, did you measure this?
No, I've not measured that aspect specifically. But the overall effect
depends upon your hardware. (Laptops work great :-)
A COMMIT will write all of WAL, no matter how much that is. This is no
different from now, where a normal COMMIT executing while another
backend is doing a work may have to write more than just its own WAL.
We actually consider this a good thing: piggyback writes go out of their
way to ensure we write as much WAL as possible in one go.
There is no doubt a point where the fsync delay has been set wrong *and*
there are so few normal COMMITs that they do become slower. But that
slower isn't the same as "starved", just the result of having to do the
work of others. The WALwriter will be doing the heavy lifting, if set
correctly.
--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com
On Mon, 2007-02-26 at 23:04 -0500, Tom Lane wrote:
"Simon Riggs" <simon@2ndquadrant.com> writes:
COMMIT NOWAIT can co-exist with the normal form of COMMIT and does not
threaten the consistency or robustness of other COMMIT modes. Read that
again and think about it, before we go further, please.I read that, and thought about it, and don't think I believe it.
I don't mind saying its taken a while to believe it myself. And I very
much want to have the concept tested so we all believe it. Doubt=>risk.
The
problem is that there are more interconnections between different
transactions than you're allowing for. In particular you need to
justify that the behavior is safe for non-transactional operations like
btree page splits and pg_clog buffer page writes. The idea that's
particularly bothering me at the moment is that after a system crash,
we might come back up in a state where a NOWAIT transaction appears
committed when its updates didn't all get to disk. "Database corrupt"
is a situation that threatens all your transactions...
OK, I can see its possible for the following WAL history to occur:
t1 btree split part1
t2 COMMIT;
<---------- risk of inconsistent WAL
t1 btree split part2
The COMMIT by t2 flushes WAL, so if a crash happens at the point shown
we have an incomplete multi-part WAL operation. At this stage, lets
assume that no data writes by either t1 or t2 have made it to disk,
which is normal and likely.
On recovery, we apply all the changes in WAL up to the end of WAL. At
the last minute t2 sneaks in and gets committed. t1 was never committed,
so we never found out whether it would be a COMMIT NOWAIT or a COMMIT.
*Whichever* it will be the btree split is still incomplete and recovery
understands this and acts accordingly.
So the situation that sounds nasty is actually a normal situation
currently, so presents no threat because we already handle this
correctly. Multi-part operations seem good to me.
The idea that's
particularly bothering me at the moment is that after a system crash,
we might come back up in a state where a NOWAIT transaction appears
committed when its updates didn't all get to disk. "Database corrupt"
is a situation that threatens all your transactions...
Well, I've spent 2 hours thinking this through, gone pale, thought hard
and gone down a few blind alleys along the way.
The bottom line is that COMMIT NOWAIT isn't fully safe, and thats part
of the explicit non-full guarantee, written in big letters on the tin.
If a transaction commits and then we crash before we flush WAL, then the
transaction will be incomplete. As we define COMMIT now, this is broken
and I use that word accurately: If you use COMMIT NOWAIT, you risk data
loss *but* you have the choice of which transactions this applies to.
However, WAL *is* consistent and there is no danger of database
corruption. Simply put, this is not group commit.
So if you issue COMMIT NOWAIT and then crash, the transactions that were
marked committed need to be marked aborted in clog. So during recovery
we will need to keep track of which transactions are in progress, so we
can mark them explicitly aborted, rather than the current implicit
mechanism. This would need to be done carefully, since a clog page that
was created at transaction start may never have made it to disk by the
time of the crash. We must extend clog in that case, even if the
extension WAL never made it to WAL, either, so that we can record the
aborted state of the Xids that exist somewhere on disk.
Keeping track of transactions in progress won't take long. It isn't
required at all during archive recovery, but that might form the basis
for later use as a Snapshot creation mechanism for read-only access
during startup.
It isn't possible for a normal non-readonly transaction to change data
that has been changed by a commit-nowait transaction, and then have the
normal transaction commit, yet without also implicitly committing the
commit-nowait transaction.
Hopefully, I've got that right?
New commit mode is available by explicit command, or as a default
setting that will be applied to all COMMITs, or both.I dislike introducing new nonstandard syntax ("Oracle compatible" is not
standard). If we did this I'd vote for control via a GUC setting only;
I think that is more useful anyway, as an application can be made to run
with such a setting without invasive source code changes.
OK.
Having read through all of the above things again, ISTM that we should
make this functionality available by a new GUC commit_fsync_delay, which
must be set explicitly > 0 before this feature can be used at all. If I
confused Tom by using commit_delay, then I'll confuse others also and
group commit and deferred fsync are different techniques with different
robustness guarantees. When enabled it should have a clear message in
the log to show that some commits might be using commit_nowait.
I'd even welcome a more descriptive term that summed up the relaxed
transaction guarantee implied by the use of the deferred fsync
technique. Perhaps even a very explicit USERSET GUC:
transaction_guarantee = on (default) | off
--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com
On Tue, Feb 27, 2007 at 10:49:32AM +0000, Simon Riggs wrote:
I dislike introducing new nonstandard syntax ("Oracle compatible" is not
standard). If we did this I'd vote for control via a GUC setting only;
I think that is more useful anyway, as an application can be made to run
with such a setting without invasive source code changes.OK.
Having read through all of the above things again, ISTM that we should
make this functionality available by a new GUC commit_fsync_delay, which
must be set explicitly > 0 before this feature can be used at all. If I
confused Tom by using commit_delay, then I'll confuse others also and
group commit and deferred fsync are different techniques with different
robustness guarantees. When enabled it should have a clear message in
the log to show that some commits might be using commit_nowait.I'd even welcome a more descriptive term that summed up the relaxed
transaction guarantee implied by the use of the deferred fsync
technique. Perhaps even a very explicit USERSET GUC:transaction_guarantee = on (default) | off
So would you set commit_fsync_delay on a per-transaction basis? That
doesn't make much sense to me... I guess I'm not seeing how you would
explicitly mark transactions that you didn't want to fsync immediately.
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Simon,
One of the things I love about doing informal online user support in the
PostgreSQL community, and formal user support for Sun's customers, is the
almost-ironclad guarentee that if a user has a corrupt database or data loss,
one of three things is true:
a) they didn't apply some recommended PG update;
b) they have a bad disk controller or disk config;
c) they have bad ram.
It seriously narrows down the problem space to know that PostgreSQL does *not*
allow data loss if it's physically possible to prevent it.
Therefore, if we're going to arm a foot-gun as big as COMMIT NOWAIT for
PostgreSQL, I'd like to see the answers to two questions:
a) Please give some examples of performance gain on applications using COMMIT
NOWAIT. The performance gain needs to be substantial (like, 50% to 100%) to
justify a compromise like this.
b) Why this and not global temporary tables or queuing?
--
Josh Berkus
PostgreSQL @ Sun
San Francisco
"Jim C. Nasby" <jim@nasby.net> writes:
So would you set commit_fsync_delay on a per-transaction basis? That
doesn't make much sense to me... I guess I'm not seeing how you would
explicitly mark transactions that you didn't want to fsync immediately.
My assumption was that most of the time you'd want this behavior
per-session and so the existing mechanisms for setting a GUC variable
would work perfectly well. If you really want it per-transaction then
changing the variable on the fly is possible (perhaps using SET LOCAL).
You'll be issuing nonstandard commands either way...
regards, tom lane
On Tue, 2007-02-27 at 11:32 -0600, Jim C. Nasby wrote:
On Tue, Feb 27, 2007 at 10:49:32AM +0000, Simon Riggs wrote:
I dislike introducing new nonstandard syntax ("Oracle compatible" is not
standard). If we did this I'd vote for control via a GUC setting only;
I think that is more useful anyway, as an application can be made to run
with such a setting without invasive source code changes.OK.
Having read through all of the above things again, ISTM that we should
make this functionality available by a new GUC commit_fsync_delay, which
must be set explicitly > 0 before this feature can be used at all. If I
confused Tom by using commit_delay, then I'll confuse others also and
group commit and deferred fsync are different techniques with different
robustness guarantees. When enabled it should have a clear message in
the log to show that some commits might be using commit_nowait.I'd even welcome a more descriptive term that summed up the relaxed
transaction guarantee implied by the use of the deferred fsync
technique. Perhaps even a very explicit USERSET GUC:transaction_guarantee = on (default) | off
So would you set commit_fsync_delay on a per-transaction basis? That
doesn't make much sense to me... I guess I'm not seeing how you would
explicitly mark transactions that you didn't want to fsync immediately.
There are 2 GUCs that would control the behaviour here:
transaction_guarantee = on | off
Specifies whether following transaction commits will guarantee
WAL has been flushed prior to reporting commit. If no guarantee
is requested (=off), then data loss may result even after the
transaction has reported its COMMIT message.
USERSET, but listed in postgresql.conf where default = on
Set this at role, individual session or transaction level to
improve performance of non-critical user data. Use of this
setting does not interfere with the transaction_guarantee
that other transactions may choose. i.e. if somebody else
chooses to take risks with their data it will not affect
the transaction guarantees the server offers to you.
Can only be set off by a transaction if commit_fsync_delay
has been enabled. Use this parameter with care; if you find
yourself wanting to use this parameter all of the time you
should consult a psychiatrist or change open source databases.
commit_fsync_delay = 0...10000 microseconds (0 = off, default)
Controls how often the WALWriter issues an XLogFlush()
SIGHUP, so set once for each server, in postgresql.conf
This provides a maximum time window of potential data loss
in the event of a server crash for transactions that choose
transaction_guarantee = off. This parameter has no effect
on transactions that choose transaction_guarantee = on.
--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com
There are 2 GUCs that would control the behaviour here:
transaction_guarantee = on | off
has been enabled. Use this parameter with care; if you find
yourself wanting to use this parameter all of the time you
should consult a psychiatrist or change open source databases.
If you guarantee your customers, that you wont loose a transaction that
has already been committed you need to at least have synchronous
replication to a remote site. Since not many installations have that, I
do find it funny that people imply so much safety only by syncing the
wal.
Without sync replication a "buffered wal" as proposed only increases the
chances that you loose something. It certainly is no change from safe
heaven to abyssmal hell.
So I think the part after the semicolon can safely be dropped.
Many will be able to use it always, without changing to another db :-)
commit_fsync_delay = 0...10000 microseconds (0 = off, default)
Controls how often the WALWriter issues an XLogFlush()
SIGHUP, so set once for each server, in postgresql.conf
This provides a maximum time window of potential data loss
in the event of a server crash for transactions that choose
transaction_guarantee = off. This parameter has no effect
on transactions that choose transaction_guarantee = on.
The wal sync method probably needs to be considered ?
If the wal is opened with open_datasync, how does that affect the
performance, or do you ommit the write and leave that to the WALWriter
also ? You probably also want more wal_buffers in such a setup. It may
be better to trigger the WALWriter with wal_buffer fill-level instead of
an extra parameter ?
It is imho great that you are working on this. I always thought it
impossible, because WAL (write ahead) implied to me, that you are not
allowed to do some data/index page changes before wal is on disk.
Andreas