Re: Causal reads take II

Started by Ants Aasmaalmost 9 years ago4 messages

ants.aasma@eesti.ee

almost 9 years ago

On Tue, Jan 3, 2017 at 3:43 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Here is a new version of my "causal reads" patch (see the earlier
thread from the 9.6 development cycle[1]), which provides a way to
avoid stale reads when load balancing with streaming replication.

Thanks for working on this. It will let us do something a lot of
people have been asking for.

Long term, I think it would be pretty cool if we could develop a set
of features that give you distributed sequential consistency on top of
streaming replication. Something like (this | causality-tokens) +
SERIALIZABLE-DEFERRABLE-on-standbys[3] +
distributed-dirty-read-prevention[4].

Is it necessary that causal writes wait for replication before making
the transaction visible on the master? I'm asking because the per tx
variable wait time between logging commit record and making
transaction visible makes it really hard to provide matching
visibility order on master and standby. In CSN based snapshot
discussions we came to the conclusion that to make standby visibility
order match master while still allowing for async transactions to
become visible before they are durable we need to make the commit
sequence a vector clock and transmit extra visibility ordering
information to standby's. Having one more level of delay between wal
logging of commit and making it visible would make the problem even
worse.

One other thing that might be an issue for some users is that this
patch only ensures that clients observe forwards progress of database
state after a writing transaction. With two consecutive read only
transactions that go to different servers a client could still observe
database state going backwards. It seems that fixing that would
require either keeping some per client state or a global agreement on
what snapshots are safe to provide, both of which you tried to avoid
for this feature.

Regards,
Ants Aasma

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: 94911361.170321.1483407849924@RIAReference msg id not found: 94911361.170321.1483407849924@RIA

Thomas Munro

thomas.munro@enterprisedb.com

almost 9 years ago

In reply to: Ants Aasma (#1)

On Thu, Jan 19, 2017 at 8:11 PM, Ants Aasma <ants.aasma@eesti.ee> wrote:

On Tue, Jan 3, 2017 at 3:43 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Long term, I think it would be pretty cool if we could develop a set
of features that give you distributed sequential consistency on top of
streaming replication. Something like (this | causality-tokens) +
SERIALIZABLE-DEFERRABLE-on-standbys[3] +
distributed-dirty-read-prevention[4].

Is it necessary that causal writes wait for replication before making
the transaction visible on the master? I'm asking because the per tx
variable wait time between logging commit record and making
transaction visible makes it really hard to provide matching
visibility order on master and standby.

Yeah, that does seem problematic. Even with async replication or no
replication, isn't there already a race in CommitTransaction() where
two backends could reach RecordTransactionCommit() in one order but
ProcArrayEndTransaction() in the other order? AFAICS using
synchronous replication in one of the transactions just makes it more
likely you'll experience such a visibility difference between the DO
and REDO histories (!), by making RecordTransactionCommit() wait.
Nothing prevents you getting a snapshot that can see t2 but not t1 in
the DO history, while someone doing PITR or querying an asynchronous
standby gets a snapshot that can see t1 but not t2 because those
replay the REDO history.

In CSN based snapshot
discussions we came to the conclusion that to make standby visibility
order match master while still allowing for async transactions to
become visible before they are durable we need to make the commit
sequence a vector clock and transmit extra visibility ordering
information to standby's. Having one more level of delay between wal
logging of commit and making it visible would make the problem even
worse.

I'd like to read that... could you please point me at the right bit of
that discussion?

One other thing that might be an issue for some users is that this
patch only ensures that clients observe forwards progress of database
state after a writing transaction. With two consecutive read only
transactions that go to different servers a client could still observe
database state going backwards.

True. This patch is about "read your writes", not, erm, "read your
reads". That may indeed be problematic for some users. It's not a
very satisfying answer but I guess you could run a dummy write query
on the primary every time you switch between standbys, or before
telling any other client to run read-only queries after you have done
so, in order to convert your "r r" sequence into a "r w r" sequence...

It seems that fixing that would
require either keeping some per client state or a global agreement on
what snapshots are safe to provide, both of which you tried to avoid
for this feature.

Agreed. You briefly mentioned this problem in the context of pairs of
read-only transactions a while ago[1]/messages/by-id/CA+CSw_u4Vy5FSbjVc7qms6PuZL7QV90+onBEtK9PFqOsNj0Uhw@mail.gmail.com. As you said then, it does seem
plausible to do that with a token system that gives clients the last
commit LSN from the snapshot used by a read only query, so that you
can ask another standby to make sure that LSN has been replayed before
running another read-only transaction. This could be handled
explicitly by a motivated client that is talking to multiple nodes. A
more general problem is client A telling client B to go and run
queries and expecting B to see all transactions that A has seen; it
now has to pass the LSN along with that communication, or rely on some
kind of magic proxy that sees all transactions, or a radically
different system with a GTM.

[1]: /messages/by-id/CA+CSw_u4Vy5FSbjVc7qms6PuZL7QV90+onBEtK9PFqOsNj0Uhw@mail.gmail.com

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Ants Aasma

ants.aasma@eesti.ee

almost 9 years ago

In reply to: Ants Aasma (#1)

On Thu, Jan 19, 2017 at 2:22 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Thu, Jan 19, 2017 at 8:11 PM, Ants Aasma <ants.aasma@eesti.ee> wrote:

On Tue, Jan 3, 2017 at 3:43 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Long term, I think it would be pretty cool if we could develop a set
of features that give you distributed sequential consistency on top of
streaming replication. Something like (this | causality-tokens) +
SERIALIZABLE-DEFERRABLE-on-standbys[3] +
distributed-dirty-read-prevention[4].

Is it necessary that causal writes wait for replication before making
the transaction visible on the master? I'm asking because the per tx
variable wait time between logging commit record and making
transaction visible makes it really hard to provide matching
visibility order on master and standby.

Yeah, that does seem problematic. Even with async replication or no
replication, isn't there already a race in CommitTransaction() where
two backends could reach RecordTransactionCommit() in one order but
ProcArrayEndTransaction() in the other order? AFAICS using
synchronous replication in one of the transactions just makes it more
likely you'll experience such a visibility difference between the DO
and REDO histories (!), by making RecordTransactionCommit() wait.
Nothing prevents you getting a snapshot that can see t2 but not t1 in
the DO history, while someone doing PITR or querying an asynchronous
standby gets a snapshot that can see t1 but not t2 because those
replay the REDO history.

Yes there is a race even with all transactions having the same
synchronization level. But nobody will mind if we some day fix that
race. :) With different synchronization levels it is much trickier to
fix as either async commits must wait behind sync commits before
becoming visible, return without becoming visible or visibility order
must differ from commit record LSN order. The first makes the async
commit feature useless, second seems a no-go for semantic reasons,
third requires extra information sent to standby's so they know the
actual commit order.

In CSN based snapshot
discussions we came to the conclusion that to make standby visibility
order match master while still allowing for async transactions to
become visible before they are durable we need to make the commit
sequence a vector clock and transmit extra visibility ordering
information to standby's. Having one more level of delay between wal
logging of commit and making it visible would make the problem even
worse.

I'd like to read that... could you please point me at the right bit of
that discussion?

Some of that discussion was face to face at pgconf.eu, some of it is
here: /messages/by-id/CA+CSw_vbt=CwLuOgR7gXdpnho_Y4Cz7X97+o_bH-RFo7keNO8Q@mail.gmail.com

Let me know if you have any questions.

It seems that fixing that would
require either keeping some per client state or a global agreement on
what snapshots are safe to provide, both of which you tried to avoid
for this feature.

Agreed. You briefly mentioned this problem in the context of pairs of
read-only transactions a while ago[1]. As you said then, it does seem
plausible to do that with a token system that gives clients the last
commit LSN from the snapshot used by a read only query, so that you
can ask another standby to make sure that LSN has been replayed before
running another read-only transaction. This could be handled
explicitly by a motivated client that is talking to multiple nodes. A
more general problem is client A telling client B to go and run
queries and expecting B to see all transactions that A has seen; it
now has to pass the LSN along with that communication, or rely on some
kind of magic proxy that sees all transactions, or a radically
different system with a GTM.

If/when we do CSN based snapshots, adding a GTM could be relatively
straightforward. It's basically not all that far from what Spanner is
doing by using a timestamp as the snapshot. But this is all relatively
independent of this patch.

Regards,
Ants Aasma

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: 601802605.472749.1484828563165@RIAReference msg id not found: 94911361.170321.1483407849924@RIA

Thomas Munro

thomas.munro@enterprisedb.com

almost 9 years ago

In reply to: Ants Aasma (#3)

On Fri, Jan 20, 2017 at 3:01 AM, Ants Aasma <ants.aasma@eesti.ee> wrote:

Yes there is a race even with all transactions having the same
synchronization level. But nobody will mind if we some day fix that
race. :)

We really should fix that!

With different synchronization levels it is much trickier to
fix as either async commits must wait behind sync commits before
becoming visible, return without becoming visible or visibility order
must differ from commit record LSN order. The first makes the async
commit feature useless, second seems a no-go for semantic reasons,
third requires extra information sent to standby's so they know the
actual commit order.

Thought experiment:

1. Log commit and make visible atomically (so DO and REDO agree on
visibility order).
2. Introduce flag 'not yet visible' to commit record for sync rep commits.
3. Introduce a new log record 'make all invisible commits up to LSN X
visible', which is inserted when enough sync rep standbys reply. Log
this + make visible on primary atomically (again, so DO and REDO agree
on visibility order).
4. Teach GetSnapshotData to deal with this using <insert magic here>.

Now standby and primary agree on visibility order of async and sync
transactions, and no standby will allow you to see transactions that
the primary doesn't yet consider to be durable (ie flushed on a quorum
of standbys etc). But... sync rep has to flush xlog twice on primary,
and standby has to wait to make things visible, and remote_apply would
either need to be changed or supplemented with a new level
remote_apply_and_visible, and it's not obvious how to actually do
atomic visibility + logging (I heard ProcArrayLock is kinda hot...).
Hmm. Doesn't sound too popular...

In CSN based snapshot
discussions we came to the conclusion that to make standby visibility
order match master while still allowing for async transactions to
become visible before they are durable we need to make the commit
sequence a vector clock and transmit extra visibility ordering
information to standby's. Having one more level of delay between wal
logging of commit and making it visible would make the problem even
worse.

I'd like to read that... could you please point me at the right bit of
that discussion?

Some of that discussion was face to face at pgconf.eu, some of it is
here: /messages/by-id/CA+CSw_vbt=CwLuOgR7gXdpnho_Y4Cz7X97+o_bH-RFo7keNO8Q@mail.gmail.com

Let me know if you have any questions.

Thanks! That may take me some time...

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers