bdr appears to be trying to replicate to itself

Started by Cj Bover 10 years ago5 messagesgeneral
Jump to latest
#1Cj B
blackc2004@gmail.com

I noticed a very strange issue starting about 20 days ago and my pg_xlog has just been filling up since then.

```
HOST A:
postgres=# select * from pg_replication_slots;
slot_name | plugin | slot_type | datoid | database | active | xmin | catalog_xmin | restart_lsn
-----------------------------------------+--------+-----------+--------+-----------+--------+------+--------------+-------------
bdr_16385_6188730679935789649_1_16385__ | bdr | logical | 16385 | database_name | f | | 28174997 | 25/7677F300
bdr_16385_6188733518371128845_2_16385__ | bdr | logical | 16385 | database_name | t | | 38613316 | 41/7B7EDF80

select bdr.bdr_get_local_nodeid();
bdr_get_local_nodeid
-------------------------------
(6188730679935789649,1,16385)

SELECT slot_name, database, active, pg_xlog_location_diff(pg_current_xlog_insert_location(), restart_lsn) AS retained_bytes FROM pg_replication_slots WHERE plugin = 'bdr';
slot_name | database | active | retained_bytes
-----------------------------------------+-----------+--------+----------------
bdr_16385_6188730679935789649_1_16385__ | database_name | f | 120353015152
bdr_16385_6188733518371128845_2_16385__ | database_name | t | 2288
(2 rows)
```

HOST B:

```
select * from pg_replication_slots;
slot_name | plugin | slot_type | datoid | database | active | xmin | catalog_xmin | restart_lsn
-----------------------------------------+--------+-----------+--------+-----------+--------+------+--------------+-------------
bdr_16385_6188730679935789649_1_16385__ | bdr | logical | 16385 | database_name | t | | 3499719 | 3/B53F00A8

select bdr.bdr_get_local_nodeid();
bdr_get_local_nodeid
-------------------------------
(6188733518371128845,2,16385)

SELECT slot_name, database, active, pg_xlog_location_diff(pg_current_xlog_insert_location(), restart_lsn) AS retained_bytes FROM pg_replication_slots WHERE plugin = 'bdr';
slot_name | database | active | retained_bytes
-----------------------------------------+-----------+--------+----------------
bdr_16385_6188730679935789649_1_16385__ | database_name | t | 68736
```

So it almost looks like HOST A is trying to replicate to itself since the replication_slots has the same node_id of itself and it's just filling up the pg_xlog.

This server has been setup since mid-may and I haven't added any new nodes but I did upgrade just this past weekend before I noticed there was a problem:
dpkg -l | grep '^ii' | grep postgre
ii pgdg-keyring 2014.1 all keyring for apt.postgresql.org
ii postgresql-bdr-9.4 9.4.5-1trusty amd64 object-relational SQL database, version 9.4 server
ii postgresql-bdr-9.4-bdr-plugin 0.9.3-1trusty amd64 BDR Plugin for PostgreSQL-BDR 9.4
ii postgresql-bdr-client-9.4 9.4.5-1trusty amd64 front-end programs for PostgreSQL-BDR 9.4
ii postgresql-bdr-contrib-9.4 9.4.5-1trusty amd64 additional facilities for PostgreSQL
ii postgresql-client-common 170.pgdg14.04+1 all manager for multiple PostgreSQL client versions
ii postgresql-common 170.pgdg14.04+1 all PostgreSQL database-cluster manager
ii postgresql-contrib 9.4+170.pgdg14.04+1 all additional facilities for PostgreSQL (supported version)

Before I was on postgresql-bdr-client-9.4 9.4.4-1trusty

Can anyone help me fix this? I'm running out of HD space.

Thanks!

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Cj B (#1)
Re: bdr appears to be trying to replicate to itself

Cj B wrote:

I noticed a very strange issue starting about 20 days ago and my pg_xlog has just been filling up since then.

For reference, this was also asked on github, and answered there.
See https://github.com/2ndQuadrant/bdr/issues/143

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Cj B
blackc2004@gmail.com
In reply to: Alvaro Herrera (#2)
Re: bdr appears to be trying to replicate to itself

Hi,

Yes, I posted on github because I wasn’t sure where to post. And the reason I’m posting here is because I’m not clear about the answer "Drop the slot bdr_16385_6188730679935789649_1_16385__ on the first host.”

Do this just mean to
select pg_drop_replication_slot(‘bdr_16385_6188730679935789649_1_16385__’)

What impact will this have?

Thanks
Cj B

On Nov 16, 2015, at 8:31 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Cj B wrote:

I noticed a very strange issue starting about 20 days ago and my pg_xlog has just been filling up since then.

For reference, this was also asked on github, and answered there.
See https://github.com/2ndQuadrant/bdr/issues/143

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Craig Ringer
craig@2ndquadrant.com
In reply to: Cj B (#3)
Re: bdr appears to be trying to replicate to itself

On 17 November 2015 at 00:33, Cj B <blackc2004@gmail.com> wrote:

select pg_drop_replication_slot(‘bdr_16385_6188730679935789649_1_16385__’)

Correct.

What impact will this have?

If the slot is unused, it'll allow the WAL that's being held by the slot to
be removed. It'll also unpin the catalog xmin to allow autovacuum to clean
up dead tuples in the catalogs.

This doesn't explain how the system got into this state. For that it'd
really be necessary to see the steps taken during setup. BDR tries to
protect against attempts to replicate-from-self. Presumably there's an
oversight in those checks. If you're able to reproduce this state I'd like
to hear details on how.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#5Cj B
blackc2004@gmail.com
In reply to: Craig Ringer (#4)
Re: bdr appears to be trying to replicate to itself

On 17 November 2015 at 00:33, Cj B <blackc2004@gmail.com <mailto:blackc2004@gmail.com>> wrote:

This doesn't explain how the system got into this state. For that it'd really be necessary to see the steps taken during setup. BDR tries to protect against attempts to replicate-from-self. Presumably there's an oversight in those checks. If you're able to reproduce this state I'd like to hear details on how.

Thanks for the help that seems to have done it. I’m not sure how it got into this state either. I do have the commands still that I ran but I doubt that’ll help much. What’s strange is that everything was working fine since May then on Oct 29th it appears to have started keeping copies of the pg_xlog, so something must have happened, but sadly I don’t know what.

I’ll keep an eye on it to see if it happens again.