Proposal: Commit timestamp

Started by Jan Wieckabout 19 years ago82 messageshackers
Jump to latest
#1Jan Wieck
JanWieck@Yahoo.com

For a future multimaster replication system, I will need a couple of
features in the PostgreSQL server itself. I will submit separate
proposals per feature so that discussions can be kept focused on one
feature per thread.

For conflict resolution purposes in an asynchronous multimaster system,
the "last update" definition often comes into play. For this to work,
the system must provide a monotonically increasing timestamp taken at
the commit of a transaction. During replication, the replication process
must be able to provide the remote nodes timestamp so that the
replicated data will be "as of the time it was written on the remote
node", and not the current local time of the replica, which is by
definition of "asynchronous" later.

To provide this data, I would like to add another "log" directory,
pg_tslog. The files in this directory will be similar to the clog, but
contain arrays of timestamptz values. On commit, the current system time
will be taken. As long as this time is lower or equal to the last taken
time in this PostgreSQL instance, the value will be increased by one
microsecond. The resulting time will be added to the commit WAL record
and written into the pg_tslog file.

If a per database configurable tslog_priority is given, the timestamp
will be truncated to milliseconds and the increment logic is done on
milliseconds. The priority is added to the timestamp. This guarantees
that no two timestamps for commits will ever be exactly identical, even
across different servers.

The COMMIT syntax will get extended to

COMMIT [TRANSACTION] [WITH TIMESTAMP <timestamptz>];

The extension is limited to superusers and will override the normally
generated commit timestamp. This will be used to give the replicating
transaction on the replica the exact same timestamp it got on the
originating master node.

The pg_tslog segments will be purged like the clog segments, after all
transactions belonging to them have been stamped frozen. A frozen xid by
definition has a timestamp of epoch. To ensure a system using this
timestamp feature has enough time to perform its work, a new GUC
variable defining an interval will prevent vacuum from freezing xid's
that are younger than that.

A function get_commit_timestamp(xid) returning timpstamptz will return
the commit time of a transaction as recorded by this feature.

Comments, changes, additions?

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

#2Neil Conway
neilc@samurai.com
In reply to: Jan Wieck (#1)
Re: Proposal: Commit timestamp

On Thu, 2007-01-25 at 18:16 -0500, Jan Wieck wrote:

For conflict resolution purposes in an asynchronous multimaster system,
the "last update" definition often comes into play. For this to work,
the system must provide a monotonically increasing timestamp taken at
the commit of a transaction.

Do you really need an actual timestamptz derived from the system clock,
or would a monotonically increasing 64-bit counter be sufficient? (The
assumption that the system clock is monotonically increasing seems
pretty fragile, in the presence of manual system clock changes, ntpd,
etc.)

Comments, changes, additions?

Would this feature have any use beyond the specific project/algorithm
you have in mind?

-Neil

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jan Wieck (#1)
Re: Proposal: Commit timestamp

Jan Wieck <JanWieck@Yahoo.com> writes:

To provide this data, I would like to add another "log" directory,
pg_tslog. The files in this directory will be similar to the clog, but
contain arrays of timestamptz values.

Why should everybody be made to pay this overhead?

The COMMIT syntax will get extended to
COMMIT [TRANSACTION] [WITH TIMESTAMP <timestamptz>];
The extension is limited to superusers and will override the normally
generated commit timestamp. This will be used to give the replicating
transaction on the replica the exact same timestamp it got on the
originating master node.

I'm not convinced you've even thought this through. If you do that then
you have no guarantee of commit timestamp monotonicity on the slave
(if it has either multi masters or any locally generated transactions).
Since this is supposedly for a multi-master system, that seems a rather
fatal objection --- no node in the system will actually have commit
timestamp monotonicity. What are you hoping to accomplish with this?

regards, tom lane

#4Jan Wieck
JanWieck@Yahoo.com
In reply to: Neil Conway (#2)
Re: Proposal: Commit timestamp

On 1/25/2007 6:47 PM, Neil Conway wrote:

On Thu, 2007-01-25 at 18:16 -0500, Jan Wieck wrote:

For conflict resolution purposes in an asynchronous multimaster system,
the "last update" definition often comes into play. For this to work,
the system must provide a monotonically increasing timestamp taken at
the commit of a transaction.

Do you really need an actual timestamptz derived from the system clock,
or would a monotonically increasing 64-bit counter be sufficient? (The
assumption that the system clock is monotonically increasing seems
pretty fragile, in the presence of manual system clock changes, ntpd,
etc.)

Yes, I do need it to be a timestamp, and one assumption is that all
servers in the multimaster cluster are ntp synchronized. The reason is
that this is for asynchronous multimaster (in my case). Two sequences
running on separate systems don't tell which was the "last update" on a
timeline. This conflict resolution method alone is of course completely
inadequate.

Comments, changes, additions?

Would this feature have any use beyond the specific project/algorithm
you have in mind?

The tablelog project on pgfoundry currently uses the transactions start
time but would be very delighted to have the commit time available instead.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jan Wieck (#4)
Re: Proposal: Commit timestamp

Jan Wieck <JanWieck@Yahoo.com> writes:

On 1/25/2007 6:47 PM, Neil Conway wrote:

Would this feature have any use beyond the specific project/algorithm
you have in mind?

The tablelog project on pgfoundry currently uses the transactions start
time but would be very delighted to have the commit time available instead.

BTW, it's not clear to me why you need a new log area for this. (We
don't log transaction start time anywhere, so certainly tablelog's needs
would not include it.) Commit timestamps are available from WAL commit
records in a crash-and-restart scenario, so wouldn't that be enough?

regards, tom lane

#6Jan Wieck
JanWieck@Yahoo.com
In reply to: Tom Lane (#3)
Re: Proposal: Commit timestamp

On 1/25/2007 6:49 PM, Tom Lane wrote:

Jan Wieck <JanWieck@Yahoo.com> writes:

To provide this data, I would like to add another "log" directory,
pg_tslog. The files in this directory will be similar to the clog, but
contain arrays of timestamptz values.

Why should everybody be made to pay this overhead?

It could be made an initdb time option. If you intend to use a product
that requires this feature, you will be willing to pay that price.

The COMMIT syntax will get extended to
COMMIT [TRANSACTION] [WITH TIMESTAMP <timestamptz>];
The extension is limited to superusers and will override the normally
generated commit timestamp. This will be used to give the replicating
transaction on the replica the exact same timestamp it got on the
originating master node.

I'm not convinced you've even thought this through. If you do that then
you have no guarantee of commit timestamp monotonicity on the slave
(if it has either multi masters or any locally generated transactions).
Since this is supposedly for a multi-master system, that seems a rather
fatal objection --- no node in the system will actually have commit
timestamp monotonicity. What are you hoping to accomplish with this?

Maybe I wasn't clear enough about this. If the commit timestamps on the
local machine are guaranteed to increase at least by one millisecond
(okay that limits the system to a sustained 1000 commits per second
before it really seems to run ahead of time), then no two commits on the
same instance will ever have the same timestamp. If furthermore each
instance in a cluster has a distinct priority (the microsecond part
added to the millisecond-truncated timestamp), each commit timestamp
could even act as a globally unique ID. It does require that all the
nodes in the cluster are configured with a distinct priority.

What I hope to accomplish with this is a very easy, commit time based
"last update wins" conflict resolution for data fields of the overwrite
nature.

The replication system I have in mind will have another field type of
the balance nature, where it will never communicate the current value
but only deltas that get applied regardless of the two timestamps.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

#7Jan Wieck
JanWieck@Yahoo.com
In reply to: Tom Lane (#5)
Re: Proposal: Commit timestamp

On 1/25/2007 7:41 PM, Tom Lane wrote:

Jan Wieck <JanWieck@Yahoo.com> writes:

On 1/25/2007 6:47 PM, Neil Conway wrote:

Would this feature have any use beyond the specific project/algorithm
you have in mind?

The tablelog project on pgfoundry currently uses the transactions start
time but would be very delighted to have the commit time available instead.

BTW, it's not clear to me why you need a new log area for this. (We
don't log transaction start time anywhere, so certainly tablelog's needs
would not include it.) Commit timestamps are available from WAL commit
records in a crash-and-restart scenario, so wouldn't that be enough?

First, I need the timestamp of the original transaction that caused the
data to change, which can be a remote or a local transaction. So the
timestamp currently recorded in the WAL commit record is useless and the
commit record has to be extended by one more timestamp.

Second, I don't think that an API scanning for WAL commit records by xid
would be efficient enough to satisfy the needs of a timestamp based
conflict resolution system, which would have to retrieve the timestamp
for every rows xmin that it is about to update in order to determine if
the old or the new values should be used.

Third, keeping the timestamp information in the WAL only would require
to keep the WAL segments around until they are older than the admin
chosen minimum freeze age. I hope you don't want to force that penalty
on everyone who intends to use multimaster replication.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

#8Richard Troy
rtroy@ScienceTools.com
In reply to: Jan Wieck (#1)
Re: Proposal: Commit timestamp

On Thu, 25 Jan 2007, Jan Wieck wrote:

For a future multimaster replication system, I will need a couple of
features in the PostgreSQL server itself. I will submit separate
proposals per feature so that discussions can be kept focused on one
feature per thread.

Hmm... "will need" ... Have you prototyped this system yet? ISTM you can
prototype your proposal using "external" components so you can work out
the kinks first.

Richard

--
Richard Troy, Chief Scientist
Science Tools Corporation
510-924-1363 or 202-747-1263
rtroy@ScienceTools.com, http://ScienceTools.com/

#9Jan Wieck
JanWieck@Yahoo.com
In reply to: Richard Troy (#8)
Re: Proposal: Commit timestamp

On 1/25/2007 8:42 PM, Richard Troy wrote:

On Thu, 25 Jan 2007, Jan Wieck wrote:

For a future multimaster replication system, I will need a couple of
features in the PostgreSQL server itself. I will submit separate
proposals per feature so that discussions can be kept focused on one
feature per thread.

Hmm... "will need" ... Have you prototyped this system yet? ISTM you can
prototype your proposal using "external" components so you can work out
the kinks first.

These details are pretty drilled down and are needed with the described
functionality. And I will not make the same mistake as with Slony-I
again and develop things, that require backend support, as totally
external (look at the catalog corruption mess I created there and you
know what I'm talking about).

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

#10Bruce Momjian
bruce@momjian.us
In reply to: Jan Wieck (#6)
Re: Proposal: Commit timestamp

Jan Wieck wrote:

On 1/25/2007 6:49 PM, Tom Lane wrote:

Jan Wieck <JanWieck@Yahoo.com> writes:

To provide this data, I would like to add another "log" directory,
pg_tslog. The files in this directory will be similar to the clog, but
contain arrays of timestamptz values.

Why should everybody be made to pay this overhead?

It could be made an initdb time option. If you intend to use a product
that requires this feature, you will be willing to pay that price.

That is going to cut your usage by like 80%. There must be a better
way.

--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#11Jan Wieck
JanWieck@Yahoo.com
In reply to: Bruce Momjian (#10)
Re: Proposal: Commit timestamp

On 1/25/2007 11:41 PM, Bruce Momjian wrote:

Jan Wieck wrote:

On 1/25/2007 6:49 PM, Tom Lane wrote:

Jan Wieck <JanWieck@Yahoo.com> writes:

To provide this data, I would like to add another "log" directory,
pg_tslog. The files in this directory will be similar to the clog, but
contain arrays of timestamptz values.

Why should everybody be made to pay this overhead?

It could be made an initdb time option. If you intend to use a product
that requires this feature, you will be willing to pay that price.

That is going to cut your usage by like 80%. There must be a better
way.

I'd love to.

But it is a datum that needs to be collected at the moment where
basically the clog entry is made ... I don't think any external module
can do that ever.

You know how long I've been in and out and back into replication again.
The one thing that pops up again and again in all the scenarios is "what
the heck was the commit order?". Now the pure commit order for a single
node could certainly be recorded from a sequence, but that doesn't cover
the multi-node environment I am after. That's why I want it to be a
timestamp with a few fudged bits at the end. If you look at what I've
described, you will notice that as long as all node priorities are
unique, this timestamp will be a globally unique ID in a somewhat
ascending order along a timeline. That is what replication people are
looking for.

Tom fears that the overhead is significant, which I do understand and
frankly, wonder myself about (actually I don't even have a vague
estimate). I really think we should make this thing an initdb option and
decide later if it's on or off by default. Probably we can implement it
even in a way that one can turn it on/off and a postmaster restart plus
waiting the desired freeze-delay would do.

What I know for certain is that no async replication system can ever do
without the commit timestamp information. Using the transaction start
time or even the single statements timeofday will only lead to
inconsistencies all over the place (I haven't been absent from the
mailing lists for the past couple of month hiding in my closet ... I've
been experimenting and trying to get around all these issues - in my
closet). Slony-I can survive without that information because everything
happens on one node and we record snapshot information for later abusal.
But look at what cost we are dealing with this rather trivial issue. All
we need to know is the serializable commit order. And we have to issue
queries that eventually might exceed address space limits?

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

#12Naz Gassiep
naz@mira.net
In reply to: Jan Wieck (#1)
Re: Proposal: Commit timestamp

I would be *very* concerned that system time is not a guaranteed
monotonic entity. Surely a counter or other internally managed mechanism
would be a better solution.

Furthermore, what would be the ramifications of master and slave system
times being out of sync?

Finally what if system time is rolled forward a few minutes as part of a
correction and there were transactions completed in that time? There is
a change, albeit small, that two transactions will have the same
timestamp. More importantly, this will throw all kinds of issues in when
the slave sees transactions in the future. Even with regular NTP syncs,
drift can cause a clock to be rolled forward a few milliseconds,
possibly resulting in duplicate transaction IDs.

In summary, I don't think the use of system time has any place in
PostgreSQL's internal consistency mechanisms, it is too unreliable an
environment property. Why can't a counter be used for this instead?

- Naz.

Jan Wieck wrote:

Show quoted text

For a future multimaster replication system, I will need a couple of
features in the PostgreSQL server itself. I will submit separate
proposals per feature so that discussions can be kept focused on one
feature per thread.

For conflict resolution purposes in an asynchronous multimaster
system, the "last update" definition often comes into play. For this
to work, the system must provide a monotonically increasing timestamp
taken at the commit of a transaction. During replication, the
replication process must be able to provide the remote nodes timestamp
so that the replicated data will be "as of the time it was written on
the remote node", and not the current local time of the replica, which
is by definition of "asynchronous" later.

To provide this data, I would like to add another "log" directory,
pg_tslog. The files in this directory will be similar to the clog, but
contain arrays of timestamptz values. On commit, the current system
time will be taken. As long as this time is lower or equal to the last
taken time in this PostgreSQL instance, the value will be increased by
one microsecond. The resulting time will be added to the commit WAL
record and written into the pg_tslog file.

If a per database configurable tslog_priority is given, the timestamp
will be truncated to milliseconds and the increment logic is done on
milliseconds. The priority is added to the timestamp. This guarantees
that no two timestamps for commits will ever be exactly identical,
even across different servers.

The COMMIT syntax will get extended to

COMMIT [TRANSACTION] [WITH TIMESTAMP <timestamptz>];

The extension is limited to superusers and will override the normally
generated commit timestamp. This will be used to give the replicating
transaction on the replica the exact same timestamp it got on the
originating master node.

The pg_tslog segments will be purged like the clog segments, after all
transactions belonging to them have been stamped frozen. A frozen xid
by definition has a timestamp of epoch. To ensure a system using this
timestamp feature has enough time to perform its work, a new GUC
variable defining an interval will prevent vacuum from freezing xid's
that are younger than that.

A function get_commit_timestamp(xid) returning timpstamptz will return
the commit time of a transaction as recorded by this feature.

Comments, changes, additions?

Jan

#13Markus Wanner
markus@bluegap.ch
In reply to: Jan Wieck (#6)
Re: Proposal: Commit timestamp

Hi,

Jan Wieck wrote:

The replication system I have in mind will have another field type of
the balance nature, where it will never communicate the current value
but only deltas that get applied regardless of the two timestamps.

I'd favor a more generally usable conflict resolution function
interface, on top of which you can implement both, the "last update
wins" as well as the "balance" conflict resolution type.

Passing the last common ancestor and the two conflicting heads to the
conflict resolution function (CRF) should be enough. That would easily
allow to implement the "balance" type (as you can calculate both
deltas). And if you want to rely on something as arbitrary as a
timestamp, you'd simply have to add a timestamp column to your table and
let the CRF decide uppon that.

This would allow pretty much any type of conflict resolution, for
example: higher priority cleanup transactions, which change lots of
tuples and should better not be aborted later on. Those could be
implemented by adding a priority column and having the CRF respect that
one, too.

To find the last common ancestor tuple, transaction ids and MVCC are
enough. You wouldn't need to add timestamps. You'd only have to make
sure VACUUM doesn't delete tuples you still need.

Regards

Markus

#14Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Jan Wieck (#11)
Re: Proposal: Commit timestamp

Jan Wieck wrote:

But it is a datum that needs to be collected at the moment where
basically the clog entry is made ... I don't think any external module
can do that ever.

How atomic does it need to be? External modules can register callbacks
that get called right after the clog update and removing the xid from
MyProc entry. That's about as close to making the clog entry you can get.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#15Theo Schlossnagle
jesus@omniti.com
In reply to: Naz Gassiep (#12)
Re: Proposal: Commit timestamp

Jan, et. al.,

On Jan 26, 2007, at 2:37 AM, Naz Gassiep wrote:

I would be *very* concerned that system time is not a guaranteed
monotonic entity. Surely a counter or other internally managed
mechanism would be a better solution.

As you should be concerned. Looking on my desk through the last few
issues in IEEE Transactions on Parallel and Distributed Systems, I
see no time synch stuff for clusters of machines that is actually
based on time. Almost all rely on something like a Lamport timestamp
or some relaxation thereof. A few are based off a tree based pulse.
Using actual times is fraught with problems and is typically
inappropriate for cluster synchronization needs.

Furthermore, what would be the ramifications of master and slave
system times being out of sync?

I'm much more concerned with the overall approach. The algorithm for
replication should be published in theoretic style with a thorough
analysis of its assumptions and a proof of correctness based on those
assumptions. Databases and replication therein are definitely
technologies that aren't "off-the-cuff," and rigorous academic
discussion and acceptance before they will get adopted. People
generally will not adopt technologies to store mission critical data
until they are confident that it will both work as designed and work
as implemented -- the second is far less important as the weakness
there are simply bugs.

I'm not implying that this rigorous dissection of replication design
hasn't happened, but I didn't see it referenced anywhere in this
thread. Can you point me to it? I've reviewed many of these papers
and would like to better understand what you are aiming at.

Best regards,

Theo Schlossnagle

#16Jan Wieck
JanWieck@Yahoo.com
In reply to: Naz Gassiep (#12)
Re: Proposal: Commit timestamp

On 1/26/2007 2:37 AM, Naz Gassiep wrote:

I would be *very* concerned that system time is not a guaranteed
monotonic entity. Surely a counter or other internally managed mechanism
would be a better solution.

Such a counter has only "local" relevance. How do you plan to compare
the two separate counters on different machines to tell which
transaction happened last?

Even if the system clock isn't monotonically increasing, the described
increment system guarantees the timestamp used to appear so. Granted,
this system will not work too well on a platform that doesn't allow to
slew the system clock.

Furthermore, what would be the ramifications of master and slave system
times being out of sync?

The origin of a transaction must scan all tuples it updates and make
sure that the timestamp it uses for commit appears in the future with
respect to them.

Finally what if system time is rolled forward a few minutes as part of a
correction and there were transactions completed in that time? There is
a change, albeit small, that two transactions will have the same
timestamp. More importantly, this will throw all kinds of issues in when
the slave sees transactions in the future. Even with regular NTP syncs,
drift can cause a clock to be rolled forward a few milliseconds,
possibly resulting in duplicate transaction IDs.

In summary, I don't think the use of system time has any place in
PostgreSQL's internal consistency mechanisms, it is too unreliable an
environment property. Why can't a counter be used for this instead?

This is nothing used for PostgreSQL's consistency. It is a vehicle
intended to be used to synchronize the "last update wins" decision
process of an asynchronous multimaster system. If not with a timestamp,
how would you make sure that the replication processes of two different
nodes will come to the same conclusion as to which update was last?
Especially considering that the replication might take place hours after
the original transaction happened.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

#17Simon Riggs
simon@2ndQuadrant.com
In reply to: Jan Wieck (#1)
Re: Proposal: Commit timestamp

On Thu, 2007-01-25 at 18:16 -0500, Jan Wieck wrote:

To provide this data, I would like to add another "log" directory,
pg_tslog. The files in this directory will be similar to the clog, but
contain arrays of timestamptz values. On commit, the current system time
will be taken. As long as this time is lower or equal to the last taken
time in this PostgreSQL instance, the value will be increased by one
microsecond. The resulting time will be added to the commit WAL record
and written into the pg_tslog file.

A transaction time table/log has other uses as well, so its fairly
interesting to have this.

COMMIT [TRANSACTION] [WITH TIMESTAMP <timestamptz>];

The extension is limited to superusers and will override the normally
generated commit timestamp.

I don't think its acceptable to override the normal timestamp. That
could lead to non monotonic time values which could screw up PITR. My
view is that you still need PITR even when you are using replication,
because the former provides recoverability and the latter provides
availability.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#18Jan Wieck
JanWieck@Yahoo.com
In reply to: Simon Riggs (#17)
Re: Proposal: Commit timestamp

On 1/26/2007 8:26 AM, Simon Riggs wrote:

On Thu, 2007-01-25 at 18:16 -0500, Jan Wieck wrote:

To provide this data, I would like to add another "log" directory,
pg_tslog. The files in this directory will be similar to the clog, but
contain arrays of timestamptz values. On commit, the current system time
will be taken. As long as this time is lower or equal to the last taken
time in this PostgreSQL instance, the value will be increased by one
microsecond. The resulting time will be added to the commit WAL record
and written into the pg_tslog file.

A transaction time table/log has other uses as well, so its fairly
interesting to have this.

COMMIT [TRANSACTION] [WITH TIMESTAMP <timestamptz>];

The extension is limited to superusers and will override the normally
generated commit timestamp.

I don't think its acceptable to override the normal timestamp. That
could lead to non monotonic time values which could screw up PITR. My
view is that you still need PITR even when you are using replication,
because the former provides recoverability and the latter provides
availability.

Without that it is rendered useless for conflict resolution purposes.

The timestamp used does not necessarily have much to do with the real
time at commit. Although I'd like it to be as close as possible. This
timestamp marks the age of the new datum in an update. Since the
replication is asynchronous, the update on the remote systems will
happen later, but the timestamp recorded with that datum must be the
timestamp of the original transaction, not the current time when it is
replicated remotely. All we have to determine that is the xmin in the
rows tuple header, so that xmin must resolve to the original
transactions timestamp.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

#19Stephen Frost
sfrost@snowman.net
In reply to: Jan Wieck (#16)
Re: Proposal: Commit timestamp

* Jan Wieck (JanWieck@Yahoo.com) wrote:

On 1/26/2007 2:37 AM, Naz Gassiep wrote:

I would be *very* concerned that system time is not a guaranteed
monotonic entity. Surely a counter or other internally managed mechanism
would be a better solution.

Such a counter has only "local" relevance. How do you plan to compare
the two separate counters on different machines to tell which
transaction happened last?

I'd also suggest you look into Lamport timestamps... Trusting the
system clock just isn't practical, even with NTP. I've developed
(albeit relatively small) systems using Lamport timestamps and would be
happy to talk about it offlist. I've probably got some code I could
share as well.

Thanks,

Stephen

#20Andrew Dunstan
andrew@dunslane.net
In reply to: Stephen Frost (#19)
Re: Proposal: Commit timestamp

Stephen Frost wrote:

I'd also suggest you look into Lamport timestamps... Trusting the
system clock just isn't practical, even with NTP. I've developed
(albeit relatively small) systems using Lamport timestamps and would be
happy to talk about it offlist. I've probably got some code I could
share as well.

that looks like what Oracle RAC uses:
http://www.lc.leidenuniv.nl/awcourse/oracle/rac.920/a96597/coord.htm

cheers

andrew

#21Jan Wieck
JanWieck@Yahoo.com
In reply to: Stephen Frost (#19)
#22Bruce Momjian
bruce@momjian.us
In reply to: Jan Wieck (#21)
#23Jan Wieck
JanWieck@Yahoo.com
In reply to: Bruce Momjian (#22)
#24Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Jan Wieck (#1)
#25Jan Wieck
JanWieck@Yahoo.com
In reply to: Jim Nasby (#24)
#26Theo Schlossnagle
jesus@omniti.com
In reply to: Jan Wieck (#25)
#27Jan Wieck
JanWieck@Yahoo.com
In reply to: Theo Schlossnagle (#26)
#28Theo Schlossnagle
jesus@omniti.com
In reply to: Jan Wieck (#27)
#29Jan Wieck
JanWieck@Yahoo.com
In reply to: Theo Schlossnagle (#28)
#30Theo Schlossnagle
jesus@omniti.com
In reply to: Jan Wieck (#29)
#31Bruce Momjian
bruce@momjian.us
In reply to: Jan Wieck (#29)
#32Jan Wieck
JanWieck@Yahoo.com
In reply to: Bruce Momjian (#31)
#33Peter Eisentraut
peter_e@gmx.net
In reply to: Jan Wieck (#32)
#34Jan Wieck
JanWieck@Yahoo.com
In reply to: Peter Eisentraut (#33)
#35Theo Schlossnagle
jesus@omniti.com
In reply to: Jan Wieck (#34)
#36Bruce Momjian
bruce@momjian.us
In reply to: Theo Schlossnagle (#35)
#37Jan Wieck
JanWieck@Yahoo.com
In reply to: Theo Schlossnagle (#35)
#38Theo Schlossnagle
jesus@omniti.com
In reply to: Jan Wieck (#37)
#39Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Theo Schlossnagle (#35)
#40Andrew Sullivan
ajs@crankycanuck.ca
In reply to: Jan Wieck (#37)
#41Markus Wanner
markus@bluegap.ch
In reply to: Theo Schlossnagle (#38)
#42Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Markus Wanner (#41)
#43Markus Wanner
markus@bluegap.ch
In reply to: Zeugswetter Andreas SB SD (#42)
#44Jan Wieck
JanWieck@Yahoo.com
In reply to: Markus Wanner (#43)
#45Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Zeugswetter Andreas SB SD (#39)
#46Markus Wanner
markus@bluegap.ch
In reply to: Jan Wieck (#44)
#47José Orlando Pereira
jop@lsd.di.uminho.pt
In reply to: Bruce Momjian (#31)
#48Jan Wieck
JanWieck@Yahoo.com
In reply to: Markus Wanner (#46)
#49Markus Wanner
markus@bluegap.ch
In reply to: Jan Wieck (#48)
#50Richard Troy
rtroy@ScienceTools.com
In reply to: Markus Wanner (#49)
#51Jan Wieck
JanWieck@Yahoo.com
In reply to: Markus Wanner (#49)
#52Markus Wanner
markus@bluegap.ch
In reply to: Jan Wieck (#51)
#53Jan Wieck
JanWieck@Yahoo.com
In reply to: Richard Troy (#50)
#54Jan Wieck
JanWieck@Yahoo.com
In reply to: Markus Wanner (#52)
#55Bruce Momjian
bruce@momjian.us
In reply to: Jan Wieck (#53)
#56Jan Wieck
JanWieck@Yahoo.com
In reply to: Bruce Momjian (#55)
#57Bruce Momjian
bruce@momjian.us
In reply to: Jan Wieck (#56)
#58Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Jan Wieck (#54)
#59Jan Wieck
JanWieck@Yahoo.com
In reply to: Bruce Momjian (#57)
#60Bruce Momjian
bruce@momjian.us
In reply to: Jan Wieck (#59)
#61Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bruce Momjian (#60)
#62Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#61)
#63Jan Wieck
JanWieck@Yahoo.com
In reply to: Bruce Momjian (#62)
#64Bruce Momjian
bruce@momjian.us
In reply to: Jan Wieck (#63)
#65Joshua D. Drake
jd@commandprompt.com
In reply to: Jan Wieck (#63)
#66Bruce Momjian
bruce@momjian.us
In reply to: Joshua D. Drake (#65)
#67Richard Troy
rtroy@ScienceTools.com
In reply to: Joshua D. Drake (#65)
#68Jan Wieck
JanWieck@Yahoo.com
In reply to: Richard Troy (#67)
#69Jan Wieck
JanWieck@Yahoo.com
In reply to: José Orlando Pereira (#47)
#70Kris Jurka
books@ejurka.com
In reply to: Richard Troy (#67)
#71J. Andrew Rogers
jrogers@neopolitan.com
In reply to: Richard Troy (#67)
#72Andrew Hammond
andrew.george.hammond@gmail.com
In reply to: Bruce Momjian (#57)
#73Andrew Dunstan
andrew@dunslane.net
In reply to: Andrew Hammond (#72)
#74Richard Troy
rtroy@ScienceTools.com
In reply to: Jan Wieck (#68)
#75Jan Wieck
JanWieck@Yahoo.com
In reply to: Richard Troy (#74)
#76Jan Wieck
JanWieck@Yahoo.com
In reply to: Andrew Hammond (#72)
#77Richard Troy
rtroy@ScienceTools.com
In reply to: Andrew Dunstan (#73)
#78Richard Troy
rtroy@ScienceTools.com
In reply to: Jan Wieck (#75)
#79Andrew Dunstan
andrew@dunslane.net
In reply to: Richard Troy (#77)
#80Andrew Dunstan
andrew@dunslane.net
In reply to: Andrew Dunstan (#79)
#81Jan Wieck
JanWieck@Yahoo.com
In reply to: Andrew Dunstan (#79)
#82José Orlando Pereira
jop@lsd.di.uminho.pt
In reply to: Jan Wieck (#69)