BDR, wal segment has been removed, is it possible move forward?

Started by milist ujangover 8 years ago6 messagesgeneral
Jump to latest
#1milist ujang
ujang.milist@gmail.com

Hi all,

due to space issue and high volume transaction, some wal segments removed
from pg_xlog on bdr environment.

warning log at node1 saying "requested WAL segment ..... has already been
removed" following Connection reset by peer.

log at node2 :
Sending replication command: START_REPLICATION SLOT
...
...
XX000: data stream ended

I had played streams and goldengate (oracle product) , that at capture side
we can move forward to certain sequence (archivedlog/redolog - wal segment
in postgres).

So, is it possible to move forward to read recent wal segment in bdr
environment? (assume data reconciliation will be done manually).

--
regards

ujang jaenudin | DBA Consultant (Freelancer)
http://ora62.wordpress.com
http://id.linkedin.com/pub/ujang-jaenudin/12/64/bab

#2Craig Ringer
craig@2ndquadrant.com
In reply to: milist ujang (#1)
Re: BDR, wal segment has been removed, is it possible move forward?

On 6 September 2017 at 01:52, milist ujang <ujang.milist@gmail.com> wrote:

Hi all,

due to space issue and high volume transaction, some wal segments removed
from pg_xlog on bdr environment.

What, you deleted them?

I had played streams and goldengate (oracle product) , that at capture
side we can move forward to certain sequence (archivedlog/redolog - wal
segment in postgres).

So, is it possible to move forward to read recent wal segment in bdr
environment? (assume data reconciliation will be done manually).

BDR can, see bdr.skip_changes_upto .

But PostgreSQL's logical decoding requires a contiguous WAL stream to
maintain a valid catalog_xmin and restart_lsn, so it'll still fail to
advance. So your replication from that node is broken, and you have to part
the node then rejoin. You'll need to manually recover any diverged data.

Don't go in and randomly delete things in the postgres data directory, or
things will break.

The BDR manual warns of the importance of disk space monitoring...

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#3milist ujang
ujang.milist@gmail.com
In reply to: Craig Ringer (#2)
Re: BDR, wal segment has been removed, is it possible move forward?

Hi Craig

On Wed, Sep 6, 2017 at 7:21 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

BDR can, see bdr.skip_changes_upto .

Unluckily my bdr is 0.9.3

But PostgreSQL's logical decoding requires a contiguous WAL stream to
maintain a valid catalog_xmin and restart_lsn, so it'll still fail to
advance. So your replication from that node is broken, and you have to part
the node then rejoin. You'll need to manually recover any diverged data.

Yup, I willing to reconcile data manualy via dblink or else,

is it still possible to move wal segment in 0.9.3?

I've played these at dev env to simulate, but no luck:

# select * from pg_replication_identifier_progress;
local_id | external_id | remote_lsn | local_lsn
----------+----------------------------------------+------------+-----------
1 | bdr_6461744703437035137_1_34424_30406_ | 0/4288950 | 0/3007588
(1 row)

# SELECT
pg_replication_identifier_setup_replaying_from('bdr_6461744703437035137_1_34424_30406_');
pg_replication_identifier_setup_replaying_from
------------------------------------------------

(1 row)

# SELECT
pg_replication_identifier_setup_tx_origin('0/4288960','2017-09-06');
pg_replication_identifier_setup_tx_origin
-------------------------------------------

(1 row)

# select * from pg_replication_identifier_progress;
local_id | external_id | remote_lsn | local_lsn
----------+----------------------------------------+------------+-----------
1 | bdr_6461744703437035137_1_34424_30406_ | 0/4288950 | 0/3007588
(1 row)

--
regards

ujang jaenudin | DBA Consultant (Freelancer)
http://ora62.wordpress.com
http://id.linkedin.com/pub/ujang-jaenudin/12/64/bab

#4Craig Ringer
craig@2ndquadrant.com
In reply to: milist ujang (#3)
Re: BDR, wal segment has been removed, is it possible move forward?

On 6 September 2017 at 08:47, milist ujang <ujang.milist@gmail.com> wrote:

Hi Craig

On Wed, Sep 6, 2017 at 7:21 AM, Craig Ringer <craig@2ndquadrant.com>
wrote:

BDR can, see bdr.skip_changes_upto .

Unluckily my bdr is 0.9.3

But PostgreSQL's logical decoding requires a contiguous WAL stream to
maintain a valid catalog_xmin and restart_lsn, so it'll still fail to
advance. So your replication from that node is broken, and you have to part
the node then rejoin. You'll need to manually recover any diverged data.

Yup, I willing to reconcile data manualy via dblink or else,

is it still possible to move wal segment in 0.9.3?

Well, you can skip changes on the logical decoding stream manually in 0.9.3
using the underlying postgres functions. But it won't help you because what
you broke when you deleted the WAL segments wasn't BDR, it was postgres
logical decoding.

It *requires* a contiguous stream of WAL. It cannot recover if you do not
have that. And you cannot really reset it.

You could drop and re-create the replication slot, I guess. But your nodes
would be hopelessly out of sync and need manual resync (with data
replication disabled) of one node vs another.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#5milist ujang
ujang.milist@gmail.com
In reply to: Craig Ringer (#4)
Re: BDR, wal segment has been removed, is it possible move forward?

Hi Craig,

On Wed, Sep 6, 2017 at 4:07 PM, Craig Ringer <craig@2ndquadrant.com> wrote:

You could drop and re-create the replication slot, I guess. But your nodes
would be hopelessly out of sync and need manual resync (with data
replication disabled) of one node vs another.

Thanks for pointing to replication slot.
I Simulate the similar situation in dev env by remove the wal segment on
node1, when node2 keep inserting into a table, now it perfectly can move
forward to latest wal segment, but the difference situation is at
node_status.

In production node_status is i in node1 but r in node2, where on my dev
both nodes keep r , even I waited to let it may change quite long.

can I safely update the node_status directy on bdr.bdr_nodes?

--
regards

ujang jaenudin | DBA Consultant (Freelancer)
http://ora62.wordpress.com
http://id.linkedin.com/pub/ujang-jaenudin/12/64/bab

#6Craig Ringer
craig@2ndquadrant.com
In reply to: milist ujang (#5)
Re: BDR, wal segment has been removed, is it possible move forward?

On 7 September 2017 at 21:16, milist ujang <ujang.milist@gmail.com> wrote:

Hi Craig,

On Wed, Sep 6, 2017 at 4:07 PM, Craig Ringer <craig@2ndquadrant.com>
wrote:

You could drop and re-create the replication slot, I guess. But your
nodes would be hopelessly out of sync and need manual resync (with data
replication disabled) of one node vs another.

Thanks for pointing to replication slot.
I Simulate the similar situation in dev env by remove the wal segment on
node1, when node2 keep inserting into a table, now it perfectly can move
forward to latest wal segment, but the difference situation is at
node_status.

In production node_status is i in node1

there's a known bug in bdr1 where sometimes the node status doesn't update
from 'i' after joining.

can I safely update the node_status directy on bdr.bdr_nodes?

Usually not. In this one specific case where a node is known to be fully
joined and online, but its status is stuck at 'i', yes.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services