Alter subscription..SET - NOTICE message is coming for table which is already removed
Hi,
While performing - Alter subscription..SET , I found that NOTICE
message is coming duplicate next time , which is not needed anymore.
X cluster=
create 100 tables
create publication ( create publication pub for all tables;)
Y cluster=
create 100 tables
create subscription ( create subscription sub connection
'dbname=postgres user=centos host=localhost ) publication pub;
X cluster =
create 1 more table (create table r(n int));
create publication for this above table only (create publication pub1
for table r;)
Y cluster=
create table r - create table r(n int);
alter publication -alter subscription sub set publication pub1 refresh;
in the notice message -> table 'r' added / 100 tables removed from the
subscription
postgres=# alter subscription sub set publication pub1 refresh;
NOTICE: added subscription for table public.r
NOTICE: removed subscription for table public.t1
NOTICE: removed subscription for table public.t2
NOTICE: removed subscription for table public.t3
NOTICE: removed subscription for table public.t4
NOTICE: removed subscription for table public.t5
NOTICE: removed subscription for table public.t6
--
--
--
ALTER SUBSCRIPTION
now again fire the same sql query
postgres=# alter subscription sub set publication pub1 refresh;
NOTICE: removed subscription for table public.t78
ALTER SUBSCRIPTION
This notice message should not come as t.78 is already removed from
earlier same command.
--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, May 25, 2017 at 6:08 AM, tushar <tushar.ahuja@enterprisedb.com> wrote:
Hi,
While performing - Alter subscription..SET , I found that NOTICE message is
coming duplicate next time , which is not needed anymore.X cluster=
create 100 tables
create publication ( create publication pub for all tables;)Y cluster=
create 100 tables
create subscription ( create subscription sub connection 'dbname=postgres
user=centos host=localhost ) publication pub;X cluster =
create 1 more table (create table r(n int));
create publication for this above table only (create publication pub1 for
table r;)Y cluster=
create table r - create table r(n int);
alter publication -alter subscription sub set publication pub1 refresh;in the notice message -> table 'r' added / 100 tables removed from the
subscription
I think you did ALTER SUBSCRIPTION while table sync for 100 tables is
running, right?
postgres=# alter subscription sub set publication pub1 refresh;
NOTICE: added subscription for table public.r
NOTICE: removed subscription for table public.t1
NOTICE: removed subscription for table public.t2
NOTICE: removed subscription for table public.t3
NOTICE: removed subscription for table public.t4
NOTICE: removed subscription for table public.t5
NOTICE: removed subscription for table public.t6
--
--
--
ALTER SUBSCRIPTIONnow again fire the same sql query
postgres=# alter subscription sub set publication pub1 refresh;
NOTICE: removed subscription for table public.t78
ALTER SUBSCRIPTIONThis notice message should not come as t.78 is already removed from
earlier same command.
I'm investigating the cause of this issue but it seems to me that
first ALTER SUBSCRIPTION surely removed t78 record from
pg_subscription_rel but table sync worker inserts new record of t78
when updating its state. SetSubscriptionRelState inserts a new record
if the table doesn't exist otherwise updates the record but we can
divide it into two different functions and call appropriate function
in each place.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 05/25/2017 04:40 PM, Masahiko Sawada wrote:
I think you did ALTER SUBSCRIPTION while table sync for 100 tables is
running, right?
Yes, i didn't wait too much while executing the commands.
--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 05/25/2017 03:38 PM, tushar wrote:
While performing - Alter subscription..SET , I found that NOTICE
message is coming duplicate next time , which is not needed anymore.
There is an another example - where i am getting "ERROR: subscription
table 16435 in subscription 16684 does not exist" in standby log file
2017-05-25 13:51:48.825 BST [32138] NOTICE: removed subscription for
table public.t96
2017-05-25 13:51:48.825 BST [32138] NOTICE: removed subscription for
table public.t97
2017-05-25 13:51:48.826 BST [32138] NOTICE: removed subscription for
table public.t98
2017-05-25 13:51:48.826 BST [32138] NOTICE: removed subscription for
table public.t99
2017-05-25 13:51:48.826 BST [32138] NOTICE: removed subscription for
table public.t100
2017-05-25 13:51:48.827 BST [32138] LOG: duration: 35.404 ms statement:
alter subscription c1 set publication p1 refresh;
2017-05-25 13:51:49.192 BST [32347] LOG: starting logical replication
worker for subscription "c1"
2017-05-25 13:51:49.198 BST [32368] LOG: logical replication sync for
subscription c1, table t16 started
2017-05-25 13:51:49.198 BST [32368] ERROR: subscription table 16429 in
subscription 16684 does not exist
2017-05-25 13:51:49.199 BST [32347] LOG: starting logical replication
worker for subscription "c1"
2017-05-25 13:51:49.200 BST [32065] LOG: worker process: logical
replication worker for subscription 16684 sync 16429 (PID 32368) exited
with exit code 1
2017-05-25 13:51:49.204 BST [32369] LOG: logical replication sync for
subscription c1, table t17 started
2017-05-25 13:51:49.204 BST [32369] ERROR: subscription table 16432 in
subscription 16684 does not exist
2017-05-25 13:51:49.205 BST [32347] LOG: starting logical replication
worker for subscription "c1"
2017-05-25 13:51:49.205 BST [32065] LOG: worker process: logical
replication worker for subscription 16684 sync 16432 (PID 32369) exited
with exit code 1
2017-05-25 13:51:49.209 BST [32370] LOG: logical replication sync for
subscription c1, table t18 started
2017-05-25 13:51:49.209 BST [32370] ERROR: subscription table 16435 in
subscription 16684 does not exist
2017-05-25 13:51:49.210 BST [32347] LOG: starting logical replication
worker for subscription "c1"
2017-05-25 13:51:49.210 BST [32065] LOG: worker process: logical
replication worker for subscription 16684 sync 16435 (PID 32370) exited
with exit code 1
2017-05-25 13:51:49.213 BST [32371] LOG: logical replication sync for
subscription c1, table t19 started
2017-05-25 13:51:49.213 BST [32371] ERROR: subscription table 16438 in
subscription 16684 does not exist
2017-05-25 13:51:49.214 BST [32347] LOG: starting logical replication
worker for subscription "c1"
Steps to reproduce -
X cluster ->
create 100 tables , publish all tables (create publication pub for table
t1,t2,t2..........t100;)
create one more table (create table t101(n int), create publication ,
publish only that table (create publication p1 for table t101;)
Y Cluster ->
create subscription (create subscription c1 connection 'host=localhost
port=5432 user=centos ' publication pub;
alter subscription c1 set publication p1 refresh;
alter subscription c1 set publication pub refresh;
alter subscription c1 set publication p1 refresh;
check the log file.
--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, May 25, 2017 at 8:15 PM, tushar <tushar.ahuja@enterprisedb.com> wrote:
On 05/25/2017 04:40 PM, Masahiko Sawada wrote:
I think you did ALTER SUBSCRIPTION while table sync for 100 tables is
running, right?Yes, i didn't wait too much while executing the commands.
I think it's better to add this to open items so that it doesn't get
missed. Barring any objections I'll add it.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, May 30, 2017 at 10:12 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Thu, May 25, 2017 at 8:15 PM, tushar <tushar.ahuja@enterprisedb.com> wrote:
On 05/25/2017 04:40 PM, Masahiko Sawada wrote:
I think you did ALTER SUBSCRIPTION while table sync for 100 tables is
running, right?Yes, i didn't wait too much while executing the commands.
I think it's better to add this to open items so that it doesn't get
missed. Barring any objections I'll add it.
Added.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, May 25, 2017 at 9:54 PM, tushar <tushar.ahuja@enterprisedb.com> wrote:
On 05/25/2017 03:38 PM, tushar wrote:
While performing - Alter subscription..SET , I found that NOTICE message
is coming duplicate next time , which is not needed anymore.There is an another example - where i am getting "ERROR: subscription table
16435 in subscription 16684 does not exist" in standby log file2017-05-25 13:51:48.825 BST [32138] NOTICE: removed subscription for table
public.t96
2017-05-25 13:51:48.825 BST [32138] NOTICE: removed subscription for table
public.t97
2017-05-25 13:51:48.826 BST [32138] NOTICE: removed subscription for table
public.t98
2017-05-25 13:51:48.826 BST [32138] NOTICE: removed subscription for table
public.t99
2017-05-25 13:51:48.826 BST [32138] NOTICE: removed subscription for table
public.t100
2017-05-25 13:51:48.827 BST [32138] LOG: duration: 35.404 ms statement:
alter subscription c1 set publication p1 refresh;
2017-05-25 13:51:49.192 BST [32347] LOG: starting logical replication
worker for subscription "c1"
2017-05-25 13:51:49.198 BST [32368] LOG: logical replication sync for
subscription c1, table t16 started
2017-05-25 13:51:49.198 BST [32368] ERROR: subscription table 16429 in
subscription 16684 does not exist
2017-05-25 13:51:49.199 BST [32347] LOG: starting logical replication
worker for subscription "c1"
2017-05-25 13:51:49.200 BST [32065] LOG: worker process: logical
replication worker for subscription 16684 sync 16429 (PID 32368) exited with
exit code 1
2017-05-25 13:51:49.204 BST [32369] LOG: logical replication sync for
subscription c1, table t17 started
2017-05-25 13:51:49.204 BST [32369] ERROR: subscription table 16432 in
subscription 16684 does not exist
2017-05-25 13:51:49.205 BST [32347] LOG: starting logical replication
worker for subscription "c1"
2017-05-25 13:51:49.205 BST [32065] LOG: worker process: logical
replication worker for subscription 16684 sync 16432 (PID 32369) exited with
exit code 1
2017-05-25 13:51:49.209 BST [32370] LOG: logical replication sync for
subscription c1, table t18 started
2017-05-25 13:51:49.209 BST [32370] ERROR: subscription table 16435 in
subscription 16684 does not exist
2017-05-25 13:51:49.210 BST [32347] LOG: starting logical replication
worker for subscription "c1"
2017-05-25 13:51:49.210 BST [32065] LOG: worker process: logical
replication worker for subscription 16684 sync 16435 (PID 32370) exited with
exit code 1
2017-05-25 13:51:49.213 BST [32371] LOG: logical replication sync for
subscription c1, table t19 started
2017-05-25 13:51:49.213 BST [32371] ERROR: subscription table 16438 in
subscription 16684 does not exist
2017-05-25 13:51:49.214 BST [32347] LOG: starting logical replication
worker for subscription "c1"Steps to reproduce -
X cluster ->
create 100 tables , publish all tables (create publication pub for table
t1,t2,t2..........t100;)
create one more table (create table t101(n int), create publication ,
publish only that table (create publication p1 for table t101;)Y Cluster ->
create subscription (create subscription c1 connection 'host=localhost
port=5432 user=centos ' publication pub;
alter subscription c1 set publication p1 refresh;
alter subscription c1 set publication pub refresh;
alter subscription c1 set publication p1 refresh;check the log file.
Thanks.
I think this cause is that the relation status entry could be deleted
by ALTER SUBSCRIPTION REFRESH before corresponding table sync worker
starting. Attached patch fixes issues reported on this thread so far.
However there is one more problem here; if the relation status entry
is deleted while corresponding table sync worker is waiting to be
changed its status, the table sync worker can be orphaned in waiting
status. In this case, should table sync worker check the relation
status and exits if the relation status record gets removed? Or should
ALTER SUBSCRIPTION update status of table sync worker to UNKNOWN?
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
Attachments:
fix_ALTER_SUB_REFRESH_and_table_sync.patchapplication/octet-stream; name=fix_ALTER_SUB_REFRESH_and_table_sync.patchDownload
diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c
index ab5f371..695b70a 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -230,10 +230,12 @@ textarray_to_stringlist(ArrayType *textarray)
* The insert-or-update logic in this function is not concurrency safe so it
* might raise an error in rare circumstances. But if we took a stronger lock
* such as ShareRowExclusiveLock, we would risk more deadlocks.
+ * If insert_ok is true and the record for given table doesn't exist we insert
+ * new record.
*/
Oid
SetSubscriptionRelState(Oid subid, Oid relid, char state,
- XLogRecPtr sublsn)
+ XLogRecPtr sublsn, bool insert_ok)
{
Relation rel;
HeapTuple tup;
@@ -249,10 +251,10 @@ SetSubscriptionRelState(Oid subid, Oid relid, char state,
ObjectIdGetDatum(subid));
/*
- * If the record for given table does not exist yet create new record,
- * otherwise update the existing one.
+ * If the record for given table does not exist yet and insertion is requested
+ * create new record, otherwise update the existing one.
*/
- if (!HeapTupleIsValid(tup))
+ if (!HeapTupleIsValid(tup) && insert_ok)
{
/* Form the tuple. */
memset(values, 0, sizeof(values));
@@ -272,7 +274,7 @@ SetSubscriptionRelState(Oid subid, Oid relid, char state,
heap_freetuple(tup);
}
- else
+ else if (HeapTupleIsValid(tup))
{
bool replaces[Natts_pg_subscription_rel];
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 86eb31d..f6d8e83 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -435,8 +435,9 @@ CreateSubscription(CreateSubscriptionStmt *stmt, bool isTopLevel)
CheckSubscriptionRelkind(get_rel_relkind(relid),
rv->schemaname, rv->relname);
+ /* Insert new entry if entry doesn't exist, otherwise update it */
SetSubscriptionRelState(subid, relid, table_state,
- InvalidXLogRecPtr);
+ InvalidXLogRecPtr, true);
}
ereport(NOTICE,
@@ -557,9 +558,10 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data)
if (!bsearch(&relid, subrel_local_oids,
list_length(subrel_states), sizeof(Oid), oid_cmp))
{
+ /* Insert entry if the entry doesn't exist, otherwise update it */
SetSubscriptionRelState(sub->oid, relid,
copy_data ? SUBREL_STATE_INIT : SUBREL_STATE_READY,
- InvalidXLogRecPtr);
+ InvalidXLogRecPtr, true);
ereport(NOTICE,
(errmsg("added subscription for table %s.%s",
quote_identifier(rv->schemaname),
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index fe45fb8..6280c95 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -228,10 +228,12 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn)
SpinLockRelease(&MyLogicalRepWorker->relmutex);
+ /* Just update the subscription relation state */
SetSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
MyLogicalRepWorker->relstate,
- MyLogicalRepWorker->relstate_lsn);
+ MyLogicalRepWorker->relstate_lsn,
+ false);
walrcv_endstreaming(wrconn, &tli);
finish_sync_worker();
@@ -360,9 +362,10 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
StartTransactionCommand();
started_tx = true;
}
+ /* Just update the subscription relation state */
SetSubscriptionRelState(MyLogicalRepWorker->subid,
rstate->relid, rstate->state,
- rstate->lsn);
+ rstate->lsn, false);
}
}
else
@@ -763,7 +766,7 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
StartTransactionCommand();
relstate = GetSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
- &relstate_lsn, false);
+ &relstate_lsn, true);
CommitTransactionCommand();
SpinLockAcquire(&MyLogicalRepWorker->relmutex);
@@ -808,7 +811,8 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
SetSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
MyLogicalRepWorker->relstate,
- MyLogicalRepWorker->relstate_lsn);
+ MyLogicalRepWorker->relstate_lsn,
+ true);
CommitTransactionCommand();
pgstat_report_stat(false);
@@ -882,11 +886,12 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
MyLogicalRepWorker->relstate);
if (MyLogicalRepWorker->relstate != SUBREL_STATE_CATCHUP)
{
- /* Update the new state. */
+ /* Update the new state, not insert new entry */
SetSubscriptionRelState(MyLogicalRepWorker->subid,
MyLogicalRepWorker->relid,
MyLogicalRepWorker->relstate,
- MyLogicalRepWorker->relstate_lsn);
+ MyLogicalRepWorker->relstate_lsn,
+ false);
finish_sync_worker();
}
break;
@@ -896,6 +901,14 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
/* Nothing to do here but finish. */
finish_sync_worker();
break;
+ case SUBREL_STATE_UNKNOWN:
+ ereport(LOG,
+ (errmsg("relation status of subscription table %u in subscription %u "
+ "might have been removed before starting",
+ MyLogicalRepWorker->relid,
+ MyLogicalRepWorker->subid)));
+ finish_sync_worker();
+ break;
default:
elog(ERROR, "unknown relation state \"%c\"",
MyLogicalRepWorker->relstate);
diff --git a/src/include/catalog/pg_subscription_rel.h b/src/include/catalog/pg_subscription_rel.h
index 391f96b..dbc2d16 100644
--- a/src/include/catalog/pg_subscription_rel.h
+++ b/src/include/catalog/pg_subscription_rel.h
@@ -71,7 +71,7 @@ typedef struct SubscriptionRelState
} SubscriptionRelState;
extern Oid SetSubscriptionRelState(Oid subid, Oid relid, char state,
- XLogRecPtr sublsn);
+ XLogRecPtr sublsn, bool insert_ok);
extern char GetSubscriptionRelState(Oid subid, Oid relid,
XLogRecPtr *sublsn, bool missing_ok);
extern void RemoveSubscriptionRel(Oid subid, Oid relid);
On 5/30/17 13:25, Masahiko Sawada wrote:
I think this cause is that the relation status entry could be deleted
by ALTER SUBSCRIPTION REFRESH before corresponding table sync worker
starting. Attached patch fixes issues reported on this thread so far.
This looks like a reasonable change, but it conflicts with the change
discussed on "logical replication - still unstable after all these
months". I think I'll deal with that one first.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, May 31, 2017 at 10:15:50PM -0400, Peter Eisentraut wrote:
On 5/30/17 13:25, Masahiko Sawada wrote:
I think this cause is that the relation status entry could be deleted
by ALTER SUBSCRIPTION REFRESH before corresponding table sync worker
starting. Attached patch fixes issues reported on this thread so far.This looks like a reasonable change, but it conflicts with the change
discussed on "logical replication - still unstable after all these
months". I think I'll deal with that one first.
The above-described topic is currently a PostgreSQL 10 open item. You own
this open item. Please observe the policy on open item ownership and send a
status update within three calendar days of this message. Include a date for
your subsequent status update.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 6/3/17 01:04, Noah Misch wrote:
On Wed, May 31, 2017 at 10:15:50PM -0400, Peter Eisentraut wrote:
On 5/30/17 13:25, Masahiko Sawada wrote:
I think this cause is that the relation status entry could be deleted
by ALTER SUBSCRIPTION REFRESH before corresponding table sync worker
starting. Attached patch fixes issues reported on this thread so far.This looks like a reasonable change, but it conflicts with the change
discussed on "logical replication - still unstable after all these
months". I think I'll deal with that one first.The above-described topic is currently a PostgreSQL 10 open item. You own
this open item. Please observe the policy on open item ownership and send a
status update within three calendar days of this message. Include a date for
your subsequent status update.
I'm working on this now and will report back tomorrow.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 5/30/17 13:25, Masahiko Sawada wrote:
I think this cause is that the relation status entry could be deleted
by ALTER SUBSCRIPTION REFRESH before corresponding table sync worker
starting. Attached patch fixes issues reported on this thread so far.
I have committed the part of the patch that changes the
SetSubscriptionRelState() calls.
I think there was a mistake in your patch, in that the calls in
LogicalRepSyncTableStart() used true once and false once. I think all
the calls in tablesync.c should be the same.
(If you look at the patch again, notice that I have changed the
insert_ok argument to update_only, so true and false are flipped.)
I'm not convinced about the change to the GetSubscriptionRelState()
argument. In the examples given, no tables are removed from any
publications, so I don't see how the claimed situation can happen. I
would like to see more reproducible examples.
Right now, if the subscription rel state disappears before the sync
worker starts, the error kills the sync worker, so things should
continue working correctly. Perhaps the error message isn't the best.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jun 8, 2017 at 5:36 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
On 5/30/17 13:25, Masahiko Sawada wrote:
I think this cause is that the relation status entry could be deleted
by ALTER SUBSCRIPTION REFRESH before corresponding table sync worker
starting. Attached patch fixes issues reported on this thread so far.I have committed the part of the patch that changes the
SetSubscriptionRelState() calls.
Thank you!
I think there was a mistake in your patch, in that the calls in
LogicalRepSyncTableStart() used true once and false once. I think all
the calls in tablesync.c should be the same.
Yes, you're right.
(If you look at the patch again, notice that I have changed the
insert_ok argument to update_only, so true and false are flipped.)
Okay.
I'm not convinced about the change to the GetSubscriptionRelState()
argument. In the examples given, no tables are removed from any
publications, so I don't see how the claimed situation can happen. I
would like to see more reproducible examples.
In process_syncing_tables_for_apply(), apply worker gets the list of
all non-ready tables and tries to launch table sync workers
accordingly. But after got the list but before launch workers these
tables can be removed from publication, so launched table sync worker
cannot found its rel state from pg_subscription_rel. It completely
depends on timing and it happens rarely.
The reproduction step is provided by tushar but I could reproduced it
with following step.
X cluster ->
=# select 'create table t' || generate_series(1,100) || '(c
int);';\gexec -- create 100 tables
=# create table t (c int); -- create one more table
=# create publication all_pub for all tables;
=# create publication one_pub for table t;
Y Cluster ->
(create the same 101 tables as well)
=# create subscription hoge_sub connection 'host=localhost port=5432'
publication one_pub;
=# alter subscription hoge_sub set publication all_pub; select
pg_sleep(1); alter subscription hoge_sub set publication one_pub;
*Error occurs here*
Right now, if the subscription rel state disappears before the sync
worker starts, the error kills the sync worker, so things should
continue working correctly. Perhaps the error message isn't the best.
The change to GetSubscriptionRelState in that patch solves the error
message problem you mentioned. Returning SUBREL_STATE_UNKNOWN by
GetSubscriptionRelState means that the subscription rel state could
not found at the time. So we can emit the error with appropriate
message.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jun 06, 2017 at 03:21:34PM -0400, Peter Eisentraut wrote:
On 6/3/17 01:04, Noah Misch wrote:
On Wed, May 31, 2017 at 10:15:50PM -0400, Peter Eisentraut wrote:
On 5/30/17 13:25, Masahiko Sawada wrote:
I think this cause is that the relation status entry could be deleted
by ALTER SUBSCRIPTION REFRESH before corresponding table sync worker
starting. Attached patch fixes issues reported on this thread so far.This looks like a reasonable change, but it conflicts with the change
discussed on "logical replication - still unstable after all these
months". I think I'll deal with that one first.The above-described topic is currently a PostgreSQL 10 open item. You own
this open item. Please observe the policy on open item ownership and send a
status update within three calendar days of this message. Include a date for
your subsequent status update.I'm working on this now and will report back tomorrow.
IMMEDIATE ATTENTION REQUIRED. This PostgreSQL 10 open item is past due for
your status update. Please reacquaint yourself with the policy on open item
ownership[1]/messages/by-id/20170404140717.GA2675809@tornado.leadboat.com and then reply immediately. If I do not hear from you by
2017-06-10 07:00 UTC, I will transfer this item to release management team
ownership without further notice.
[1]: /messages/by-id/20170404140717.GA2675809@tornado.leadboat.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 6/8/17 03:54, Masahiko Sawada wrote:
The reproduction step is provided by tushar but I could reproduced it
with following step.X cluster ->
=# select 'create table t' || generate_series(1,100) || '(c
int);';\gexec -- create 100 tables
=# create table t (c int); -- create one more table
=# create publication all_pub for all tables;
=# create publication one_pub for table t;Y Cluster ->
(create the same 101 tables as well)
=# create subscription hoge_sub connection 'host=localhost port=5432'
publication one_pub;
=# alter subscription hoge_sub set publication all_pub; select
pg_sleep(1); alter subscription hoge_sub set publication one_pub;
*Error occurs here*
Thanks for that explanation. I have committed the rest of your changes.
I didn't add the log message, just exit silently, since it doesn't seem
necessary.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 5/30/17 13:25, Masahiko Sawada wrote:
However there is one more problem here; if the relation status entry
is deleted while corresponding table sync worker is waiting to be
changed its status, the table sync worker can be orphaned in waiting
status. In this case, should table sync worker check the relation
status and exits if the relation status record gets removed? Or should
ALTER SUBSCRIPTION update status of table sync worker to UNKNOWN?
I think this would be fixed with the attached patch.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
0001-Stop-table-sync-workers-when-subscription-relation-e.patchtext/plain; charset=UTF-8; name=0001-Stop-table-sync-workers-when-subscription-relation-e.patch; x-mac-creator=0; x-mac-type=0Download
From f34ed996af298d731551695da562d17be5d6b248 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter_e@gmx.net>
Date: Fri, 9 Jun 2017 09:47:52 -0400
Subject: [PATCH] Stop table sync workers when subscription relation entry is
removed
When a table sync worker is in waiting state and the subscription table
entry is removed because of a concurrent subscription refresh, the
worker could be left orphaned. To avoid that, explicitly stop the
worker when the pg_subscription_rel entry is removed.
Reported-by: Masahiko Sawada <sawada.mshk@gmail.com>
---
src/backend/commands/subscriptioncmds.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 49737a9042..8ec8742480 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -600,6 +600,8 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data)
RemoveSubscriptionRel(sub->oid, relid);
+ logicalrep_worker_stop(sub->oid, relid);
+
namespace = get_namespace_name(get_rel_namespace(relid));
ereport(NOTICE,
(errmsg("removed subscription for table %s.%s",
--
2.13.1
On 6/9/17 02:07, Noah Misch wrote:
On Tue, Jun 06, 2017 at 03:21:34PM -0400, Peter Eisentraut wrote:
On 6/3/17 01:04, Noah Misch wrote:
On Wed, May 31, 2017 at 10:15:50PM -0400, Peter Eisentraut wrote:
On 5/30/17 13:25, Masahiko Sawada wrote:
I think this cause is that the relation status entry could be deleted
by ALTER SUBSCRIPTION REFRESH before corresponding table sync worker
starting. Attached patch fixes issues reported on this thread so far.This looks like a reasonable change, but it conflicts with the change
discussed on "logical replication - still unstable after all these
months". I think I'll deal with that one first.The above-described topic is currently a PostgreSQL 10 open item. You own
this open item. Please observe the policy on open item ownership and send a
status update within three calendar days of this message. Include a date for
your subsequent status update.I'm working on this now and will report back tomorrow.
IMMEDIATE ATTENTION REQUIRED.
Work is ongoing and progress is being made. I will check back in on Monday.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jun 9, 2017 at 10:50 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
On 5/30/17 13:25, Masahiko Sawada wrote:
However there is one more problem here; if the relation status entry
is deleted while corresponding table sync worker is waiting to be
changed its status, the table sync worker can be orphaned in waiting
status. In this case, should table sync worker check the relation
status and exits if the relation status record gets removed? Or should
ALTER SUBSCRIPTION update status of table sync worker to UNKNOWN?I think this would be fixed with the attached patch.
Thank you for the patch. The patch fixes this issue but it takes a
long time to done ALTER SUBSCRIPTION SET PUBLICATION when
max_sync_workers_per_subscription is set high value. Because the
removing entry from pg_subscription_rel and launching a new table sync
worker select a subscription relation state in the same order, the
former doesn't catch up with latter.
For example in my environment, when I test the following step with
max_sync_workers_per_subscription = 15, all table sync workers were
launched once and then killed. How about removing the entry from
pg_subscription_rel in the inverse order?
X cluster ->
=# select 'create table t' || generate_series(1,100) || '(c
int);';\gexec -- create 100 tables
=# create table t (c int); -- create one more table
=# create publication all_pub for all tables;
=# create publication one_pub for table t;
Y Cluster ->
(create the same 101 tables as well)
=# create subscription hoge_sub connection 'host=localhost port=5432'
publication all_pub;
=# alter subscription hoge_sub set publication one_pub; -- execute
this during synchronizing the table, it takes a long time
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 6/11/17 21:40, Masahiko Sawada wrote:
Thank you for the patch. The patch fixes this issue but it takes a
long time to done ALTER SUBSCRIPTION SET PUBLICATION when
max_sync_workers_per_subscription is set high value. Because the
removing entry from pg_subscription_rel and launching a new table sync
worker select a subscription relation state in the same order, the
former doesn't catch up with latter.
For example in my environment, when I test the following step with
max_sync_workers_per_subscription = 15, all table sync workers were
launched once and then killed. How about removing the entry from
pg_subscription_rel in the inverse order?
I have committed the patch as is. Optimizations might be possible, but
let's keep in mind that the use case of changing the subscription right
after it was created is a pretty marginal case to begin with.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jun 12, 2017 at 9:56 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
On 6/11/17 21:40, Masahiko Sawada wrote:
Thank you for the patch. The patch fixes this issue but it takes a
long time to done ALTER SUBSCRIPTION SET PUBLICATION when
max_sync_workers_per_subscription is set high value. Because the
removing entry from pg_subscription_rel and launching a new table sync
worker select a subscription relation state in the same order, the
former doesn't catch up with latter.
For example in my environment, when I test the following step with
max_sync_workers_per_subscription = 15, all table sync workers were
launched once and then killed. How about removing the entry from
pg_subscription_rel in the inverse order?I have committed the patch as is. Optimizations might be possible, but
let's keep in mind that the use case of changing the subscription right
after it was created is a pretty marginal case to begin with.
Thank you for committing the patch. Yes, I understood.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers