BDR - triggers on receiving node?

Started by Peter Mogensenabout 11 years ago11 messagesgeneral
Jump to latest
#1Peter Mogensen
apm@one.com

Hi,

Traditionally it hadn't made much sense to fire a trigger on a receiving
replica node (slave) - for many reasons, including it being read-only.

But with BDRs multi parter, partial replication and the possibility that
some tables are either actually or logically local to a single node,
there's scenarios where this makes sense.

Is it possible?

Say ... I have a table in a BDR replicated database with an "ON UPDATE"
trigger. - and that trigger wants to locally find out the local
txid_snapshot_xmin() when a change was applied to the local node.
Not the upstream txid of the transaction which actually made the change
originally (which only makes sense on that node) - but on the local node
receiving the change via BDR LLSR.

Can that be done?

/Peter

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Craig Ringer
craig@2ndquadrant.com
In reply to: Peter Mogensen (#1)
Re: BDR - triggers on receiving node?

On 25 March 2015 at 19:15, Peter Mogensen <apm@one.com> wrote:

Hi,

Traditionally it hadn't made much sense to fire a trigger on a receiving
replica node (slave) - for many reasons, including it being read-only.

But with BDRs multi parter, partial replication and the possibility that
some tables are either actually or logically local to a single node,
there's scenarios where this makes sense.

Is it possible?

It might be possible to fire FOR EACH ROW triggers, though this is not
currently supported. BDR does its row changes at a lower level than SQL, so
it doesn't invoke INSERT, UPDATE or DELETE statements directly. That helps
with the efficiently of apply, but also means you can't easily fire
triggers on those changes simply by creating ENABLE REPLICA or ENABLE
ALWAYS triggers and using session_replication_role = 'replica'.

It is not practical to fire FOR EACH STATEMENT triggers, because BDR has no
idea what upstream statement caused which change.

Most likely BDR would need to look for ENABLE REPLICA / ENABLE ALWAYS
triggers and fire them directly. Or a new pseudo-trigger callback procedure
for row-apply could be added, somewhat similar to the existing
functionality for conflict handlers.

Say ... I have a table in a BDR replicated database with an "ON UPDATE"
trigger. - and that trigger wants to locally find out the local
txid_snapshot_xmin() when a change was applied to the local node.

Why would you want to do that? Just out of interest?

Not the upstream txid of the transaction which actually made the change
originally (which only makes sense on that node) - but on the local node
receiving the change via BDR LLSR.

Can that be done?

At this time I think you'd have to patch the BDR apply worker to do what
you wanted.

There are already a few areas where downstream filtering or apply hooks are
of interest, so this might be an area that can be enhanced in future.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#3Peter Mogensen
apm@one.com
In reply to: Craig Ringer (#2)
Re: BDR - triggers on receiving node?

On 2015-03-25 12:32, Craig Ringer wrote:

On 25 March 2015 at 19:15, Peter Mogensen <apm@one.com> wrote:

Say ... I have a table in a BDR replicated database with an "ON UPDATE"
trigger. - and that trigger wants to locally find out the local
txid_snapshot_xmin() when a change was applied to the local node.

Why would you want to do that? Just out of interest?

Because I have a bunch of SELECT queries to the local node, which
includes the txid_current() in their result, and I want to be able to
recognize a result which was obtained prior to a change reaching the
local node.

/Peter

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Craig Ringer
craig@2ndquadrant.com
In reply to: Peter Mogensen (#3)
Re: BDR - triggers on receiving node?

On 25 March 2015 at 20:14, Peter Mogensen <apm@one.com> wrote:

On 2015-03-25 12:32, Craig Ringer wrote:

On 25 March 2015 at 19:15, Peter Mogensen <apm@one.com> wrote:

Say ... I have a table in a BDR replicated database with an "ON UPDATE"

trigger. - and that trigger wants to locally find out the local
txid_snapshot_xmin() when a change was applied to the local node.

Why would you want to do that? Just out of interest?

Because I have a bunch of SELECT queries to the local node, which includes
the txid_current() in their result, and I want to be able to recognize a
result which was obtained prior to a change reaching the local node.

So you're trying to do cache invalidation / cache verification of some form?

I'm interested in what you're looking at doing, but would probably need a
more complete description to offer much input.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#5Craig Ringer
craig@2ndquadrant.com
In reply to: Peter Mogensen (#1)
Re: BDR - triggers on receiving node?

On 26 March 2015 at 15:57, Peter Mogensen <apm@one.com> wrote:

In standard Postgres one-way replication you can get the
txid_snapshot_xmin() of the master on the slave. But with BDR, all nodes
have their own txids. So the above scheme doesn't work unless you can get
the txid which applied the change to the local node. ... I can see that
applying BDR replication consumes transactions, so I assumed this value
exists ... and if only there was a way to get it. I you could get it in a
trigger, then you could maintain the invalidation event queue locally for
each node.

I see what you're getting at. You want to prevent stale data from being
reinsterted into a cache by a read from an asynchronous replica after it's
been invalidated and purged by a write on the master. Your application
provides cross-node co-ordination and can set a horizon of validity to
prevent such caching events.

There are a few things you could work with in BDR that may be useful.

First, BDR adds commit timestamps for transactions, giving you wall-clock
time commit information. It's used for last-update-wins conflict
resolution. Commit timestamp information is accessible from SQL using:

pg_get_transaction_committime(xid)

pg_get_latest_transaction_committime_data()

(Note that these are the BDR definitions. The versions committed to
PostgreSQL 9.5 have different names and some interface changes, so
applications using these functions may need changes when updated to run on
PostgreSQL 9.5+bdr when released, though BDR will probably add
backwards-compat wrappers.)

Second, PostgreSQL keeps track of the node-wide log-sequence number (LSN).
The replay position of a node is available on the upstream in
pg_catalog.pg_stat_replication. The last-replayed LSN of an upstream is
kept track of by the BDR extension and recorded as a replication
identifier, but the SQL-level interface to replication identifiers,
pg_catalog.pg_replication_identifier_progress. This view is only accessible
by the superuser at the moment. You can also access the upstream commit's
LSN at the C level from the BDR apply worker during transaction replay,
though. The LSN provides strict ordering for a node.

I'd start by looking into whether commit timestamps can meet your needs.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#6Craig Ringer
craig@2ndquadrant.com
In reply to: Peter Mogensen (#1)
Re: BDR - triggers on receiving node?

(Please reply-to-all to keep the thread on pgsql-general)

On 26 March 2015 at 18:32, Peter Mogensen <apm@one.com> wrote:

On 2015-03-26 10:14, Craig Ringer wrote:

I see what you're getting at. You want to prevent stale data from being
reinsterted into a cache by a read from an asynchronous replica after it's
been invalidated and purged by a write on the master. Your application
provides cross-node co-ordination and can set a horizon of validity to
prevent such caching events.

Ohh... I get why you think of asynchronous replicas.
Yes. the system would also prevent that if the txids are global.
However that's not the goal.
The goal is only to prevent local race conditions wrt. a local node.

If that's the case then BDR shouldn't make any difference.

Earlier you were speaking of (presumably not synchronous) streaming
replicas, and writes on the master vs reads from the replica, and a way you
avoid caching stale data from the read-replica using the application's
cache manager to co-ordinate visibility horizons. It sounded like you were
looking for a way to translate that into something that could work with BDR.

Did I misunderstand?

I've seen in production how stale values can end up in the cache due to
race condition like:

1) Client tries to read from the cache and gets a cache miss causing it to
query the database. The resulting value should now be cached.
2) The value is changed in the database, causing an invalidation event.
3) The invalidation event is applied to the cache (being a NOP since it
was not there).
4) Now the client manages to apply it's read value to the cache, but now
it's an old stale value.

One way to handle that is to SELECT ... FOR SHARE in your reads, then
update the cache and not commit until the cache has been updated.

A concurrent UPDATE can then not proceed until the cache write has been
completed.

That could be tricky to apply in practice though, and prone to deadlocks
caused by lock-ordering issues.

If it's purely local you can continue to use your current approach without
issues. If not, then commit timestamps may offer some assistance when
combined with accurate time sync.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#7Peter Mogensen
apm@one.com
In reply to: Craig Ringer (#6)
Re: BDR - triggers on receiving node?

On 2015-03-26 11:57, Craig Ringer wrote:

If that's the case then BDR shouldn't make any difference.

It does. Because now with BDR you can't compare txid_current() as saved
on the master with txid_snapshot_xmin() as read by the replica.
If however, you could save the txid associated with the application of
the BDR replication on the target, you would again be able to compare.

Earlier you were speaking of (presumably not synchronous) streaming
replicas, and writes on the master vs reads from the replica, and a
way you avoid caching stale data from the read-replica using the
application's cache manager to co-ordinate visibility horizons. It
sounded like you were looking for a way to translate that into
something that could work with BDR. Did I misunderstand?

No. I think you understood, but the goal of using txid from the master
was not to co-ordinate with the master. Only to match changes on the
slave with reads on the slave.
The extra property of txid's being the same on the master was not used.
The problem when using BDR is that the "slave" has it's own txid
sequence and I can't get the txid of all transactions doing changes, -
since some of them are via BDR and doesn't cause triggers.

One way to handle that is to SELECT ... FOR SHARE in your reads, then
update the cache and not commit until the cache has been updated.

A concurrent UPDATE can then not proceed until the cache write has been
completed.

Yes, that would serialize the selects...
It's very complicated to integrate into the cache however, since it
doesn't really know about how the values to cache are retrieved.

/Peter

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8Craig Ringer
craig@2ndquadrant.com
In reply to: Peter Mogensen (#7)
Re: BDR - triggers on receiving node?

On 26 March 2015 at 19:08, Peter Mogensen <apm@one.com> wrote:

On 2015-03-26 11:57, Craig Ringer wrote:

If that's the case then BDR shouldn't make any difference.

It does. Because now with BDR you can't compare txid_current() as saved on
the master with txid_snapshot_xmin() as read by the replica.
If however, you could save the txid associated with the application of the
BDR replication on the target, you would again be able to compare.

My comment was with regards to it being on the local node. A master and
synchronous replica isn't a local-node to local-node scenario.

Earlier you were speaking of (presumably not synchronous) streaming

replicas, and writes on the master vs reads from the replica, and a way you
avoid caching stale data from the read-replica using the application's
cache manager to co-ordinate visibility horizons. It sounded like you were
looking for a way to translate that into something that could work with
BDR. Did I misunderstand?

No. I think you understood, but the goal of using txid from the master was
not to co-ordinate with the master. Only to match changes on the slave with
reads on the slave.
The extra property of txid's being the same on the master was not used.
The problem when using BDR is that the "slave" has it's own txid sequence
and I can't get the txid of all transactions doing changes, - since some of
them are via BDR and doesn't cause triggers.

Right. So that's where I think you need to look into commit timestamps as
an alternative, as outlined in prior mail.

Alternately you could look at firing a C callback in bdr_apply.c when a row
is replayed.

As previously mentioned it's likely to be possible to add per-row apply
callbacks that can be written in plpgsql or other function languages, but
no such feature currently exists in BDR.

Firing real "FOR EACH ROW ... ENABLE REPLICA" triggers may also be
possible, but I haven't done any significant looking into it, and I'd be
concerned about the correctness of doing so for row triggers but not
statement triggers.

One issue is that it'd probably have to be able to restrict writes to
unlogged tables or tables that aren't part of a replication set. If it
could write to replicated tables it'd be doing so in the context of the
apply worker, so the writes would not get replicated to other nodes. That'd
mean the triggers would be creating rows only on one node - and that's a
fast track to data divergence that can cause replication stalls and so on.

At this point I think commit timestamps are likely to be your best bet, and
certainly what you should start looking into first.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#9Andres Freund
andres@anarazel.de
In reply to: Craig Ringer (#8)
Re: BDR - triggers on receiving node?

On 2015-03-26 19:56:23 +0800, Craig Ringer wrote:

Right. So that's where I think you need to look into commit timestamps as
an alternative, as outlined in prior mail.

I've only quickly skimmed this thread, but it sounds to me you just
could create a logical slot in the "standby" and do the cache
invalidation via that.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

In reply to: Craig Ringer (#8)
Re: BDR - triggers on receiving node?

On 2015-03-26 12:56, Craig Ringer wrote:

My comment was with regards to it being on the local node. A master
and synchronous replica isn't a local-node to local-node scenario.

No. But all I'm exploiting is that change events to the local node see
the same logical clock as SELECT statements to the same node.

As previously mentioned it's likely to be possible to add per-row apply
callbacks that can be written in plpgsql or other function languages, but
no such feature currently exists in BDR.

I'll put it on my wish list then :)

Firing real "FOR EACH ROW ... ENABLE REPLICA" triggers may also be
possible, but I haven't done any significant looking into it, and I'd be
concerned about the correctness of doing so for row triggers but not
statement triggers.

That too :)

One issue is that it'd probably have to be able to restrict writes to
unlogged tables or tables that aren't part of a replication set. If it
could write to replicated tables it'd be doing so in the context of the
apply worker, so the writes would not get replicated to other nodes. That'd
mean the triggers would be creating rows only on one node - and that's a
fast track to data divergence that can cause replication stalls and so on.

For the use case at hand, that's not a problem. The trigger would only
need to update a fully local state. - like a table not part of the
replication set and not related to any other tables. - like an FIFO
queue of invalidation events.

/Peter

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

In reply to: Craig Ringer (#8)
Re: BDR - triggers on receiving node?

On 2015-03-26 12:56, Craig Ringer wrote:

At this point I think commit timestamps are likely to be your best
bet, and certainly what you should start looking into first.

I've thought about this, but it seems that since these timestamps are
made on the node doing the change and you have no way on knowing if one
node in the cluster is far behind in it's replication to you, then you
could make a select on the local node being much newer in timestamp than
the transaction changing the value - which would only arrive later (and
making pg_get_latest_transaction_committime_data() go backwards).

But I think I might have found a way to make this work with BDR:

It seems that BDR translates the xmin column of tables from the sending
node to the target node txid values.

So - instead of having a local FIFO table on each node, I could actually
still (like with single-master replication) have a global FIFO table of
invalidation events, being inserted into at the node actually making the
change.

When that invalidation event reaches a node the row gets a local xmin
value which can be compared to txid_snapshot_xmin() kept along every select.

So if I'm correct in observing that the xmin column gives me the a local
txid of the transaction inserting the invalidation event - even if it
was originally inserted on another node, then I think it'll work.

/Peter

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general