BDR: ALTER statement hanging

Started by Selim Tuviover 10 years ago11 messagesgeneral
Jump to latest
#1Selim Tuvi
stuvi@ilm.com

Hi, I am running a three node BDR cluster. BDR version is 0.9.3. Postgres version is 9.4.5.

With 0.9.2, I used to be able to issue ALTER statements using psql and it would go through. This time it is just hanging. The statement is this:

alter table pts alter column shot drop not null;

I also tried to add a column but that hangs as well:

alter table pts add column shot_tmp text;

Thanks
-Selim

#2Selim Tuvi
stuvi@ilm.com
In reply to: Selim Tuvi (#1)
Re: ALTER statement hanging

And I tried running the same statement on another node, while one node was running it and I got the following:

ERROR: database is locked against ddl by another node
HINT: Node (6223770712502831127,1,16389) in the cluster is already performing DDL

Terminating the statement in one node and running it on another results in a hang as well.

-Selim

________________________________
From: Selim Tuvi
Sent: Thursday, December 03, 2015 4:03 PM
To: pgsql-general@postgresql.org
Subject: BDR: ALTER statement hanging

Hi, I am running a three node BDR cluster. BDR version is 0.9.3. Postgres version is 9.4.5.

With 0.9.2, I used to be able to issue ALTER statements using psql and it would go through. This time it is just hanging. The statement is this:

alter table pts alter column shot drop not null;

I also tried to add a column but that hangs as well:

alter table pts add column shot_tmp text;

Thanks
-Selim

#3Selim Tuvi
stuvi@ilm.com
In reply to: Selim Tuvi (#2)
Re: ALTER statement hanging

I stopped the other two nodes and restarted the instance and pg_locks shows the following.

deliver=# select * from pg_locks;
locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted |
fastpath
------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-------+-----------------+---------+-
---------
relation | 533136 | 11189 | | | | | | | | 5/68 | 10229 | AccessShareLock | t |
t
virtualxid | | | | | 5/68 | | | | | 5/68 | 10229 | ExclusiveLock | t |
t
(2 rows)

-Selim

________________________________
From: pgsql-general-owner@postgresql.org [pgsql-general-owner@postgresql.org] on behalf of Selim Tuvi [stuvi@ilm.com]
Sent: Thursday, December 03, 2015 4:31 PM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] ALTER statement hanging

And I tried running the same statement on another node, while one node was running it and I got the following:

ERROR: database is locked against ddl by another node
HINT: Node (6223770712502831127,1,16389) in the cluster is already performing DDL

Terminating the statement in one node and running it on another results in a hang as well.

-Selim

________________________________
From: Selim Tuvi
Sent: Thursday, December 03, 2015 4:03 PM
To: pgsql-general@postgresql.org
Subject: BDR: ALTER statement hanging

Hi, I am running a three node BDR cluster. BDR version is 0.9.3. Postgres version is 9.4.5.

With 0.9.2, I used to be able to issue ALTER statements using psql and it would go through. This time it is just hanging. The statement is this:

alter table pts alter column shot drop not null;

I also tried to add a column but that hangs as well:

alter table pts add column shot_tmp text;

Thanks
-Selim

#4Andreas Kretschmer
akretschmer@spamfence.net
In reply to: Selim Tuvi (#1)
Re: BDR: ALTER statement hanging

Selim Tuvi <stuvi@ilm.com> wrote:

Hi, I am running a three node BDR cluster. BDR version is 0.9.3. Postgres
version is 9.4.5.

With 0.9.2, I used to be able to issue ALTER statements using psql and it would
go through. This time it is just hanging. The statement is this:

for ddl-commands all nodes MUST be active in replication, so have you
checked that in pg_replication_slots?

Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect. (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly." (unknown)
Kaufbach, Saxony, Germany, Europe. N 51.05082�, E 13.56889�

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Selim Tuvi
stuvi@ilm.com
In reply to: Andreas Kretschmer (#4)
Re: BDR: ALTER statement hanging

Yes they seem to be active:

deliver=# select * from pg_replication_slots;
slot_name | plugin | slot_type | datoid | database | active | xmin | catalog_xmin | restart_lsn
------------------------------------------+--------+-----------+--------+----------+--------+------+--------------+-------------
bdr_533136_6223770712502831127_1_16389__ | bdr | logical | 533136 | deliver | t | | 182302 | 0/9C8A5598
bdr_533136_6223800735012265413_1_16389__ | bdr | logical | 533136 | deliver | t | | 182302 | 0/9C8A5598
(2 rows)

Although when I look at bdr.bdr_nodes I see the status as still initializing for the other two nodes, I don't know if that could cause this problem:

deliver=# select * from bdr.bdr_nodes;
node_sysid | node_timeline | node_dboid | node_status | node_name | node_local_dsn
| node_init_from_dsn
---------------------+---------------+------------+-------------+-------------------------------------+---------------------------------------------------------------
-----------------------------------------+------------------------------------------------------------------------------------------------------
6212648563684174798 | 1 | 533136 | r | pe-deliverdb-sf-01v | host=pe-deliverdb-sf-01v port=5432 dbname=deliver user=deliver_admin password=xxxxx |
6223770712502831127 | 1 | 16389 | i | pe-deliverdb-sing-01v | host=pe-deliverdb-sing-01v port=5432 dbname=deliver user=deliver_admin password=xxxxx | host=pe-deliverdb-sf-01v port=5432 dbname=deliver user=deliver_admin password=xxxxx
6223800735012265413 | 1 | 16389 | i | pe-deliverdb-lon-01v | host=pe-deliverdb-lon-01v port=5432 dbname=deliver user=deliver_admin password=xxxxx | host=pe-deliverdb-sf-01v port=5432 dbname=deliver user=deliver_admin password=xxxxx

-Selim

________________________________________
From: pgsql-general-owner@postgresql.org [pgsql-general-owner@postgresql.org] on behalf of Andreas Kretschmer [akretschmer@spamfence.net]
Sent: Thursday, December 03, 2015 10:49 PM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] BDR: ALTER statement hanging

Selim Tuvi <stuvi@ilm.com> wrote:

Hi, I am running a three node BDR cluster. BDR version is 0.9.3. Postgres
version is 9.4.5.

With 0.9.2, I used to be able to issue ALTER statements using psql and it would
go through. This time it is just hanging. The statement is this:

for ddl-commands all nodes MUST be active in replication, so have you
checked that in pg_replication_slots?

Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect. (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly." (unknown)
Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Andreas Kretschmer
andreas@a-kretschmer.de
In reply to: Selim Tuvi (#5)
Re: BDR: ALTER statement hanging

Selim Tuvi <stuvi@ilm.com> hat am 4. Dezember 2015 um 18:46 geschrieben:

Yes they seem to be active:

deliver=# select * from pg_replication_slots;
slot_name | plugin | slot_type | datoid |
database | active | xmin | catalog_xmin | restart_lsn
------------------------------------------+--------+-----------+--------+----------+--------+------+--------------+-------------
bdr_533136_6223770712502831127_1_16389__ | bdr | logical | 533136 |
deliver | t | | 182302 | 0/9C8A5598
bdr_533136_6223800735012265413_1_16389__ | bdr | logical | 533136 |
deliver | t | | 182302 | 0/9C8A5598
(2 rows)

Although when I look at bdr.bdr_nodes I see the status as still initializing
for the other two nodes, I don't know if that could cause this problem:

deliver=# select * from bdr.bdr_nodes;
node_sysid | node_timeline | node_dboid | node_status |
node_name |
node_local_dsn
|
node_init_from_dsn
---------------------+---------------+------------+-------------+-------------------------------------+---------------------------------------------------------------
-----------------------------------------+------------------------------------------------------------------------------------------------------
6212648563684174798 | 1 | 533136 | r |
pe-deliverdb-sf-01v | host=pe-deliverdb-sf-01v port=5432 dbname=deliver
user=deliver_admin password=xxxxx |
6223770712502831127 | 1 | 16389 | i |
pe-deliverdb-sing-01v | host=pe-deliverdb-sing-01v port=5432 dbname=deliver
user=deliver_admin password=xxxxx | host=pe-deliverdb-sf-01v port=5432
dbname=deliver user=deliver_admin password=xxxxx
6223800735012265413 | 1 | 16389 | i |
pe-deliverdb-lon-01v | host=pe-deliverdb-lon-01v port=5432 dbname=deliver
user=deliver_admin password=xxxxx | host=pe-deliverdb-sf-01v port=5432
dbname=deliver user=deliver_admin password=xxxxx

-Selim

I think, the state 'i' is the main reason for your problem, because of: "i-
Joining: The node is doing initial slot creation or an initial dump and load".

But i can't tell you why this nodes are in this state.

Regards, Andreas

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Sylvain MARECHAL
marechal.sylvain2@gmail.com
In reply to: Andreas Kretschmer (#6)
Re: BDR: ALTER statement hanging

Le 04/12/2015 18:59, Andreas Kretschmer a écrit :

I think, the state 'i' is the main reason for your problem, because of: "i-
Joining: The node is doing initial slot creation or an initial dump and load".

But i can't tell you why this nodes are in this state.

Regards, Andreas

Did-you check the bdr.bdr_connections table?
It should have as many lines as the bdr.bdr_nodes tables.

Sylvain

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8Selim Tuvi
stuvi@ilm.com
In reply to: Andreas Kretschmer (#6)
Re: BDR: ALTER statement hanging

Thanks, I removed the other nodes from bdr.bdr_nodes table, deleted all the bdr_connections and pg_replication_identifier entries, dropped the pg_replication_slots restarted the instance and then trying the ALTER statement resulted in:

ERROR: No peer nodes or peer node count unknown, cannot acquire DDL lock
HINT: BDR is probably still starting up, wait a while

The only way I could issue the statement is run the following to convert the node to a standalone instance:

BEGIN;
SET LOCAL bdr.permit_unsafe_ddl_commands = true;
SET LOCAL bdr.skip_ddl_locking = true;
security label for 'bdr' on database deliver is '{"bdr": false}';
COMMIT;

I am still puzzled as to why the bdr_nodes node_status was reporting "i" when there were no errors in the logs.

-Selim

________________________________________
From: pgsql-general-owner@postgresql.org [pgsql-general-owner@postgresql.org] on behalf of Andreas Kretschmer [andreas@a-kretschmer.de]
Sent: Friday, December 04, 2015 9:59 AM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] BDR: ALTER statement hanging

Selim Tuvi <stuvi@ilm.com> hat am 4. Dezember 2015 um 18:46 geschrieben:

Yes they seem to be active:

deliver=# select * from pg_replication_slots;
slot_name | plugin | slot_type | datoid |
database | active | xmin | catalog_xmin | restart_lsn
------------------------------------------+--------+-----------+--------+----------+--------+------+--------------+-------------
bdr_533136_6223770712502831127_1_16389__ | bdr | logical | 533136 |
deliver | t | | 182302 | 0/9C8A5598
bdr_533136_6223800735012265413_1_16389__ | bdr | logical | 533136 |
deliver | t | | 182302 | 0/9C8A5598
(2 rows)

Although when I look at bdr.bdr_nodes I see the status as still initializing
for the other two nodes, I don't know if that could cause this problem:

deliver=# select * from bdr.bdr_nodes;
node_sysid | node_timeline | node_dboid | node_status |
node_name |
node_local_dsn
|
node_init_from_dsn
---------------------+---------------+------------+-------------+-------------------------------------+---------------------------------------------------------------
-----------------------------------------+------------------------------------------------------------------------------------------------------
6212648563684174798 | 1 | 533136 | r |
pe-deliverdb-sf-01v | host=pe-deliverdb-sf-01v port=5432 dbname=deliver
user=deliver_admin password=xxxxx |
6223770712502831127 | 1 | 16389 | i |
pe-deliverdb-sing-01v | host=pe-deliverdb-sing-01v port=5432 dbname=deliver
user=deliver_admin password=xxxxx | host=pe-deliverdb-sf-01v port=5432
dbname=deliver user=deliver_admin password=xxxxx
6223800735012265413 | 1 | 16389 | i |
pe-deliverdb-lon-01v | host=pe-deliverdb-lon-01v port=5432 dbname=deliver
user=deliver_admin password=xxxxx | host=pe-deliverdb-sf-01v port=5432
dbname=deliver user=deliver_admin password=xxxxx

-Selim

I think, the state 'i' is the main reason for your problem, because of: "i-
Joining: The node is doing initial slot creation or an initial dump and load".

But i can't tell you why this nodes are in this state.

Regards, Andreas

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#9Selim Tuvi
stuvi@ilm.com
In reply to: Sylvain MARECHAL (#7)
Re: BDR: ALTER statement hanging

Yes, bdr_connections had the same number of rows:

deliver=# select * from bdr.bdr_connections;
conn_sysid | conn_timeline | conn_dboid | conn_origin_sysid | conn_origin_timeline | conn_origin_dboid | conn_is_unidirectional |
conn_dsn | conn_apply_delay | conn_replication_sets
---------------------+---------------+------------+-------------------+----------------------+-------------------+------------------------+---------------------------
-----------------------------------------------------------------------------+------------------+-----------------------
6212648563684174798 | 1 | 533136 | 0 | 0 | 0 | f | host=pe-deliverdb-sf-01v port=5432 dbname=deliver user=deliver_admin password=xxxxx | | {default}
6223770712502831127 | 1 | 16389 | 0 | 0 | 0 | f | host=pe-deliverdb-sing-01v port=5432 dbname=deliver user=deliver_admin password=xxxxx | | {default}
6223800735012265413 | 1 | 16389 | 0 | 0 | 0 | f | host=pe-deliverdb-lon-01v port=5432 dbname=deliver user=deliver_admin password=xxxxx | | {default}
(3 rows)

One other thing I noticed is that the conn_dboid is the same for two of the nodes. Is that normal?

-Selim

________________________________________
From: pgsql-general-owner@postgresql.org [pgsql-general-owner@postgresql.org] on behalf of Sylvain MARECHAL [marechal.sylvain2@gmail.com]
Sent: Friday, December 04, 2015 10:14 AM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] BDR: ALTER statement hanging

Le 04/12/2015 18:59, Andreas Kretschmer a écrit :

I think, the state 'i' is the main reason for your problem, because of: "i-
Joining: The node is doing initial slot creation or an initial dump and load".

But i can't tell you why this nodes are in this state.

Regards, Andreas

Did-you check the bdr.bdr_connections table?
It should have as many lines as the bdr.bdr_nodes tables.

Sylvain

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#10Craig Ringer
craig@2ndquadrant.com
In reply to: Selim Tuvi (#9)
Re: BDR: ALTER statement hanging

​If you're not sure what's going on on a node, look at its logs.

The background worker API and PostgreSQL's lack of autonomous transactions
makes it quite challenging for BDR workers to capture logs and expose them
to users at the SQL level. So always, if in doubt, examine the log files.

#11Selim Tuvi
stuvi@ilm.com
In reply to: Craig Ringer (#10)
Re: BDR: ALTER statement hanging

Thanks Craig, the problem was that (if I remember correctly) there were absolutely no errors or warnings logged when I issued the ALTER statement. Everything seemed to operate normally except that the execution never completed. Even the fact that the node_status was set to 'i' didn't result in any log messages and the replication was working as it should.

-Selim

________________________________
From: Craig Ringer [craig@2ndquadrant.com]
Sent: Sunday, December 06, 2015 7:05 PM
To: Selim Tuvi
Cc: Sylvain MARECHAL; pgsql-general@postgresql.org
Subject: Re: [GENERAL] BDR: ALTER statement hanging

​If you're not sure what's going on on a node, look at its logs.

The background worker API and PostgreSQL's lack of autonomous transactions makes it quite challenging for BDR workers to capture logs and expose them to users at the SQL level. So always, if in doubt, examine the log files.