Interval for launching the table sync worker

Started by Masahiko Sawadaabout 9 years ago65 messageshackers
Jump to latest
#1Masahiko Sawada
sawada.mshk@gmail.com

Hi all,

While testing table sync worker for logical replication I noticed that
if the table sync worker of logical replication failed to insert the
data for whatever reason, the table sync worker process exits with
error. And then the main apply worker launches the table sync worker
again soon without interval. This routine is executed at very high
frequency without interval.

Should we do put a interval (wal_retrieve_interval or make a new GUC
parameter?) for launching the table sync worker?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Masahiko Sawada (#1)
Re: Interval for launching the table sync worker

I was thinking the same.

At Thu, 6 Apr 2017 11:33:22 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoDCnyRJDUY=ESVVe68AukvOP2dFomTeBFpAd1TiFbjsGg@mail.gmail.com>

Hi all,

While testing table sync worker for logical replication I noticed that
if the table sync worker of logical replication failed to insert the
data for whatever reason, the table sync worker process exits with
error. And then the main apply worker launches the table sync worker
again soon without interval. This routine is executed at very high
frequency without interval.

Should we do put a interval (wal_retrieve_interval or make a new GUC
parameter?) for launching the table sync worker?

After introducing encoding conversion, untranslatable characters
in a published table causes this situation. Interval can reduce
the frequence of reconnecting, but I think that walreciever
should refrain from reconnecting on unrecoverable(or repeating)
error in walsender.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Kyotaro Horiguchi (#2)
Re: Interval for launching the table sync worker

On Thu, Apr 6, 2017 at 3:45 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

I was thinking the same.

At Thu, 6 Apr 2017 11:33:22 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoDCnyRJDUY=ESVVe68AukvOP2dFomTeBFpAd1TiFbjsGg@mail.gmail.com>

Hi all,

While testing table sync worker for logical replication I noticed that
if the table sync worker of logical replication failed to insert the
data for whatever reason, the table sync worker process exits with
error. And then the main apply worker launches the table sync worker
again soon without interval. This routine is executed at very high
frequency without interval.

Should we do put a interval (wal_retrieve_interval or make a new GUC
parameter?) for launching the table sync worker?

After introducing encoding conversion, untranslatable characters
in a published table causes this situation.

I think it's better to make a new GUC parameter for the table sync
worker. Having multiple behaviors in wal_retrieve_retry_interval is
not good idea. Thought?

Interval can reduce
the frequence of reconnecting, but I think that walreciever
should refrain from reconnecting on unrecoverable(or repeating)
error in walsender.

Yeah, that's also considerable issue.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Masahiko Sawada (#3)
Re: Interval for launching the table sync worker

At Thu, 6 Apr 2017 16:15:33 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoCrcwi3SwKKOW_efwW0finxyycLgsbw09n5uy2sxneO_A@mail.gmail.com>

On Thu, Apr 6, 2017 at 3:45 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

I was thinking the same.

At Thu, 6 Apr 2017 11:33:22 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoDCnyRJDUY=ESVVe68AukvOP2dFomTeBFpAd1TiFbjsGg@mail.gmail.com>

Hi all,

While testing table sync worker for logical replication I noticed that
if the table sync worker of logical replication failed to insert the
data for whatever reason, the table sync worker process exits with
error. And then the main apply worker launches the table sync worker
again soon without interval. This routine is executed at very high
frequency without interval.

Should we do put a interval (wal_retrieve_interval or make a new GUC
parameter?) for launching the table sync worker?

After introducing encoding conversion, untranslatable characters
in a published table causes this situation.

I think it's better to make a new GUC parameter for the table sync
worker. Having multiple behaviors in wal_retrieve_retry_interval is
not good idea. Thought?

I prefer subscription option than GUC. Something like following.

CREATE SUBSCRIPTION s1 CONNECTION 'blah'
PUBLICATION p1 WITH (noreconnect = true);

Stored in pg_subscription?

Interval can reduce
the frequence of reconnecting, but I think that walreciever
should refrain from reconnecting on unrecoverable(or repeating)
error in walsender.

Yeah, that's also considerable issue.

But not to do now.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#4)
Re: Interval for launching the table sync worker

At Thu, 06 Apr 2017 17:02:14 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20170406.170214.263553093.horiguchi.kyotaro@lab.ntt.co.jp>

At Thu, 6 Apr 2017 16:15:33 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoCrcwi3SwKKOW_efwW0finxyycLgsbw09n5uy2sxneO_A@mail.gmail.com>

On Thu, Apr 6, 2017 at 3:45 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

I was thinking the same.

At Thu, 6 Apr 2017 11:33:22 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoDCnyRJDUY=ESVVe68AukvOP2dFomTeBFpAd1TiFbjsGg@mail.gmail.com>

Hi all,

While testing table sync worker for logical replication I noticed that
if the table sync worker of logical replication failed to insert the
data for whatever reason, the table sync worker process exits with
error. And then the main apply worker launches the table sync worker
again soon without interval. This routine is executed at very high
frequency without interval.

Should we do put a interval (wal_retrieve_interval or make a new GUC
parameter?) for launching the table sync worker?

Hmm. I was playing with something wrong. Now I see the invervals
5 seconds. No problem.

After introducing encoding conversion, untranslatable characters
in a published table causes this situation.

I think it's better to make a new GUC parameter for the table sync
worker. Having multiple behaviors in wal_retrieve_retry_interval is
not good idea. Thought?

So, this is working, sorry.

I prefer subscription option than GUC. Something like following.

CREATE SUBSCRIPTION s1 CONNECTION 'blah'
PUBLICATION p1 WITH (noreconnect = true);

Stored in pg_subscription?

Interval can reduce
the frequence of reconnecting, but I think that walreciever
should refrain from reconnecting on unrecoverable(or repeating)
error in walsender.

Yeah, that's also considerable issue.

But not to do now.

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Kyotaro Horiguchi (#5)
Re: Interval for launching the table sync worker

On Thu, Apr 6, 2017 at 9:06 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

At Thu, 06 Apr 2017 17:02:14 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20170406.170214.263553093.horiguchi.kyotaro@lab.ntt.co.jp>

At Thu, 6 Apr 2017 16:15:33 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoCrcwi3SwKKOW_efwW0finxyycLgsbw09n5uy2sxneO_A@mail.gmail.com>

On Thu, Apr 6, 2017 at 3:45 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

I was thinking the same.

At Thu, 6 Apr 2017 11:33:22 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoDCnyRJDUY=ESVVe68AukvOP2dFomTeBFpAd1TiFbjsGg@mail.gmail.com>

Hi all,

While testing table sync worker for logical replication I noticed that
if the table sync worker of logical replication failed to insert the
data for whatever reason, the table sync worker process exits with
error. And then the main apply worker launches the table sync worker
again soon without interval. This routine is executed at very high
frequency without interval.

Should we do put a interval (wal_retrieve_interval or make a new GUC
parameter?) for launching the table sync worker?

Hmm. I was playing with something wrong. Now I see the invervals
5 seconds. No problem.

Yeah, this issue happens only in the table sync worker.

After introducing encoding conversion, untranslatable characters
in a published table causes this situation.

I think it's better to make a new GUC parameter for the table sync
worker. Having multiple behaviors in wal_retrieve_retry_interval is
not good idea. Thought?

So, this is working, sorry.

I prefer subscription option than GUC. Something like following.

CREATE SUBSCRIPTION s1 CONNECTION 'blah'
PUBLICATION p1 WITH (noreconnect = true);

Stored in pg_subscription?

Interval can reduce
the frequence of reconnecting, but I think that walreciever
should refrain from reconnecting on unrecoverable(or repeating)
error in walsender.

Yeah, that's also considerable issue.

But not to do now.

I've added this as an open item, and sent a patch for this.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Petr Jelinek
petr@2ndquadrant.com
In reply to: Masahiko Sawada (#6)
Re: Interval for launching the table sync worker

On 06/04/17 16:44, Masahiko Sawada wrote:

On Thu, Apr 6, 2017 at 9:06 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

At Thu, 06 Apr 2017 17:02:14 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20170406.170214.263553093.horiguchi.kyotaro@lab.ntt.co.jp>

At Thu, 6 Apr 2017 16:15:33 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoCrcwi3SwKKOW_efwW0finxyycLgsbw09n5uy2sxneO_A@mail.gmail.com>

On Thu, Apr 6, 2017 at 3:45 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

I was thinking the same.

At Thu, 6 Apr 2017 11:33:22 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoDCnyRJDUY=ESVVe68AukvOP2dFomTeBFpAd1TiFbjsGg@mail.gmail.com>

Hi all,

While testing table sync worker for logical replication I noticed that
if the table sync worker of logical replication failed to insert the
data for whatever reason, the table sync worker process exits with
error. And then the main apply worker launches the table sync worker
again soon without interval. This routine is executed at very high
frequency without interval.

Should we do put a interval (wal_retrieve_interval or make a new GUC
parameter?) for launching the table sync worker?

Hmm. I was playing with something wrong. Now I see the invervals
5 seconds. No problem.

Yeah, this issue happens only in the table sync worker.

After introducing encoding conversion, untranslatable characters
in a published table causes this situation.

I think it's better to make a new GUC parameter for the table sync
worker. Having multiple behaviors in wal_retrieve_retry_interval is
not good idea. Thought?

So, this is working, sorry.

I prefer subscription option than GUC. Something like following.

CREATE SUBSCRIPTION s1 CONNECTION 'blah'
PUBLICATION p1 WITH (noreconnect = true);

Stored in pg_subscription?

I don't think that's a very good solution, you'd lose replication on
every network glitch, upstream server restart, etc.

Interval can reduce
the frequence of reconnecting, but I think that walreciever
should refrain from reconnecting on unrecoverable(or repeating)
error in walsender.

Yeah, that's also considerable issue.

But not to do now.

I've added this as an open item, and sent a patch for this.

I am not exactly sure what's the open item from this thread. To use the
wal_retrieve_interval to limit table sync restarts?

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Petr Jelinek (#7)
Re: Interval for launching the table sync worker

At Thu, 6 Apr 2017 18:42:37 +0200, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in <8c7afb37-be73-c6bd-80bc-e87522f0461a@2ndquadrant.com>

On 06/04/17 16:44, Masahiko Sawada wrote:

On Thu, Apr 6, 2017 at 9:06 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

I prefer subscription option than GUC. Something like following.

CREATE SUBSCRIPTION s1 CONNECTION 'blah'
PUBLICATION p1 WITH (noreconnect = true);

Stored in pg_subscription?

I don't think that's a very good solution, you'd lose replication on
every network glitch, upstream server restart, etc.

Yes, you're right. This would work if apply worker distinguishes
permanent error. But it is overkill so far.

I've added this as an open item, and sent a patch for this.

I am not exactly sure what's the open item from this thread. To use the
wal_retrieve_interval to limit table sync restarts?

It's not me. I also don't think this critical.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Kyotaro Horiguchi (#8)
Re: Interval for launching the table sync worker

On Fri, Apr 7, 2017 at 1:23 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

At Thu, 6 Apr 2017 18:42:37 +0200, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in <8c7afb37-be73-c6bd-80bc-e87522f0461a@2ndquadrant.com>

On 06/04/17 16:44, Masahiko Sawada wrote:

On Thu, Apr 6, 2017 at 9:06 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

I prefer subscription option than GUC. Something like following.

CREATE SUBSCRIPTION s1 CONNECTION 'blah'
PUBLICATION p1 WITH (noreconnect = true);

Stored in pg_subscription?

I don't think that's a very good solution, you'd lose replication on
every network glitch, upstream server restart, etc.

Yes, you're right. This would work if apply worker distinguishes
permanent error. But it is overkill so far.

I've added this as an open item, and sent a patch for this.

I am not exactly sure what's the open item from this thread. To use the
wal_retrieve_interval to limit table sync restarts?

It's not me. I also don't think this critical.

Thank you for the comment.

It's not critical but it could be problem. So I thought we should fix
it before the PostgreSQL 10 release. If it's not appropriate as an
open item I'll remove it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Peter Eisentraut
peter_e@gmx.net
In reply to: Masahiko Sawada (#9)
Re: Interval for launching the table sync worker

On 4/7/17 01:10, Masahiko Sawada wrote:

It's not critical but it could be problem. So I thought we should fix
it before the PostgreSQL 10 release. If it's not appropriate as an
open item I'll remove it.

You wrote that you "sent" a patch, but I don't see a patch anywhere.

I think a nonintrusive patch for this could be considered.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Peter Eisentraut (#10)
Re: Interval for launching the table sync worker

On Sat, Apr 8, 2017 at 8:13 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 4/7/17 01:10, Masahiko Sawada wrote:

It's not critical but it could be problem. So I thought we should fix
it before the PostgreSQL 10 release. If it's not appropriate as an
open item I'll remove it.

You wrote that you "sent" a patch, but I don't see a patch anywhere.

I think a nonintrusive patch for this could be considered.

Oops, I made a mistake. I'll send a patch tomorrow.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#11)
Re: Interval for launching the table sync worker

On Sun, Apr 9, 2017 at 6:25 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Apr 8, 2017 at 8:13 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 4/7/17 01:10, Masahiko Sawada wrote:

It's not critical but it could be problem. So I thought we should fix
it before the PostgreSQL 10 release. If it's not appropriate as an
open item I'll remove it.

You wrote that you "sent" a patch, but I don't see a patch anywhere.

I think a nonintrusive patch for this could be considered.

Oops, I made a mistake. I'll send a patch tomorrow.

I've attached the patch. This patch introduces GUC parameter
table_sync_retry_interval which controls the interval of launching the
table sync worker process.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

table_sync_retry_interval.patchapplication/octet-stream; name=table_sync_retry_interval.patchDownload+46-5
#13Petr Jelinek
petr@2ndquadrant.com
In reply to: Masahiko Sawada (#12)
Re: Interval for launching the table sync worker

On 10/04/17 11:02, Masahiko Sawada wrote:

On Sun, Apr 9, 2017 at 6:25 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Apr 8, 2017 at 8:13 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 4/7/17 01:10, Masahiko Sawada wrote:

It's not critical but it could be problem. So I thought we should fix
it before the PostgreSQL 10 release. If it's not appropriate as an
open item I'll remove it.

You wrote that you "sent" a patch, but I don't see a patch anywhere.

I think a nonintrusive patch for this could be considered.

Oops, I made a mistake. I'll send a patch tomorrow.

I've attached the patch. This patch introduces GUC parameter
table_sync_retry_interval which controls the interval of launching the
table sync worker process.

I don't think solution is quite this simple. This will cause all table
sync workers to be delayed which means concurrency will suffer and the
initial sync of all tables will take much longer especially if there is
little data. We need a way to either detect if we are launching same
worker that was already launched before, or alternatively if we are
launching crashed worker and only then apply the delay.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Peter Eisentraut
peter_e@gmx.net
In reply to: Petr Jelinek (#13)
Re: Interval for launching the table sync worker

On 4/10/17 08:10, Petr Jelinek wrote:

I don't think solution is quite this simple. This will cause all table
sync workers to be delayed which means concurrency will suffer and the
initial sync of all tables will take much longer especially if there is
little data. We need a way to either detect if we are launching same
worker that was already launched before, or alternatively if we are
launching crashed worker and only then apply the delay.

Perhaps instead of a global last_start_time, we store a per relation
last_start_time in SubscriptionRelState?

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Peter Eisentraut (#14)
Re: Interval for launching the table sync worker

On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 4/10/17 08:10, Petr Jelinek wrote:

I don't think solution is quite this simple. This will cause all table
sync workers to be delayed which means concurrency will suffer and the
initial sync of all tables will take much longer especially if there is
little data. We need a way to either detect if we are launching same
worker that was already launched before, or alternatively if we are
launching crashed worker and only then apply the delay.

Perhaps instead of a global last_start_time, we store a per relation
last_start_time in SubscriptionRelState?

I was thinking the same. But a problem is that the list of
SubscriptionRelState is refreshed whenever the syncing table state
becomes invalid (table_state_valid = false). I guess we need to
improve these logic including GetSubscriptionNotReadyRelations().

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Peter Eisentraut
peter_e@gmx.net
In reply to: Masahiko Sawada (#15)
Re: Interval for launching the table sync worker

On 4/12/17 00:48, Masahiko Sawada wrote:

On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut

Perhaps instead of a global last_start_time, we store a per relation
last_start_time in SubscriptionRelState?

I was thinking the same. But a problem is that the list of
SubscriptionRelState is refreshed whenever the syncing table state
becomes invalid (table_state_valid = false). I guess we need to
improve these logic including GetSubscriptionNotReadyRelations().

The table states are invalidated on a syscache callback from
pg_subscription_rel, which happens roughly speaking when a table
finishes the initial sync. So if we're worried about failing tablesync
workers relaunching to quickly, this would only be a problem if a
tablesync of another table finishes right in that restart window. That
doesn't seem a terrible issue to me.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Peter Eisentraut (#16)
Re: Interval for launching the table sync worker

On Wed, Apr 12, 2017 at 11:46 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 4/12/17 00:48, Masahiko Sawada wrote:

On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut

Perhaps instead of a global last_start_time, we store a per relation
last_start_time in SubscriptionRelState?

I was thinking the same. But a problem is that the list of
SubscriptionRelState is refreshed whenever the syncing table state
becomes invalid (table_state_valid = false). I guess we need to
improve these logic including GetSubscriptionNotReadyRelations().

The table states are invalidated on a syscache callback from
pg_subscription_rel, which happens roughly speaking when a table
finishes the initial sync. So if we're worried about failing tablesync
workers relaunching to quickly, this would only be a problem if a
tablesync of another table finishes right in that restart window. That
doesn't seem a terrible issue to me.

I think the table states are invalidated whenever the table sync
worker starts, because the table sync worker updates its status of
pg_subscription_rel and commits it before starting actual copy. So we
cannot rely on that. I thought we can store last_start_time into
pg_subscription_rel but it might be overkill. I'm now thinking to
change GetSubscriptionNotReadyRealtions so that last_start_time in
SubscriptionRelState is taken over to new list.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#17)
Re: Interval for launching the table sync worker

On Thu, Apr 13, 2017 at 11:53 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Apr 12, 2017 at 11:46 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 4/12/17 00:48, Masahiko Sawada wrote:

On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut

Perhaps instead of a global last_start_time, we store a per relation
last_start_time in SubscriptionRelState?

I was thinking the same. But a problem is that the list of
SubscriptionRelState is refreshed whenever the syncing table state
becomes invalid (table_state_valid = false). I guess we need to
improve these logic including GetSubscriptionNotReadyRelations().

The table states are invalidated on a syscache callback from
pg_subscription_rel, which happens roughly speaking when a table
finishes the initial sync. So if we're worried about failing tablesync
workers relaunching to quickly, this would only be a problem if a
tablesync of another table finishes right in that restart window. That
doesn't seem a terrible issue to me.

I think the table states are invalidated whenever the table sync
worker starts, because the table sync worker updates its status of
pg_subscription_rel and commits it before starting actual copy. So we
cannot rely on that. I thought we can store last_start_time into
pg_subscription_rel but it might be overkill. I'm now thinking to
change GetSubscriptionNotReadyRealtions so that last_start_time in
SubscriptionRelState is taken over to new list.

Attached the latest patch. It didn't actually necessary to change
GetSubscriptionNotReadyRelations. I just changed the logic refreshing
the sync table state list.
Please give me feedback.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

table_sync_retry_interval_v2.patchapplication/octet-stream; name=table_sync_retry_interval_v2.patchDownload+83-8
#19Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Masahiko Sawada (#17)
Re: Interval for launching the table sync worker

I confused sync and apply workers.
sync worker failure at start causes immediate retries.

At Thu, 13 Apr 2017 11:53:27 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoCR6eHgk0vaHShjO4Bre_VDKjHUbL9EuWHaUgRPSPPyVQ@mail.gmail.com>

On Wed, Apr 12, 2017 at 11:46 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 4/12/17 00:48, Masahiko Sawada wrote:

On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut

Perhaps instead of a global last_start_time, we store a per relation
last_start_time in SubscriptionRelState?

I was thinking the same. But a problem is that the list of
SubscriptionRelState is refreshed whenever the syncing table state
becomes invalid (table_state_valid = false). I guess we need to
improve these logic including GetSubscriptionNotReadyRelations().

The table states are invalidated on a syscache callback from
pg_subscription_rel, which happens roughly speaking when a table
finishes the initial sync. So if we're worried about failing tablesync
workers relaunching to quickly, this would only be a problem if a
tablesync of another table finishes right in that restart window. That
doesn't seem a terrible issue to me.

I think the table states are invalidated whenever the table sync
worker starts, because the table sync worker updates its status of
pg_subscription_rel and commits it before starting actual copy. So we
cannot rely on that. I thought we can store last_start_time into
pg_subscription_rel but it might be overkill. I'm now thinking to
change GetSubscriptionNotReadyRealtions so that last_start_time in
SubscriptionRelState is taken over to new list.

This resolves the problem but, if I understand correctly, the
many pallocs in process_syncing_tables_for_apply() is working on
ApplyContext and the context is reset before the next visit here
(in LogicalRepApplyLoop).

Although this is not a problem of this patch, this is a problem
generally.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#19)
Re: Interval for launching the table sync worker

Ouch! I replied to wrong mail.

At Thu, 13 Apr 2017 19:55:04 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20170413.195504.89348773.horiguchi.kyotaro@lab.ntt.co.jp>

I confused sync and apply workers.
sync worker failure at start causes immediate retries.

At Thu, 13 Apr 2017 11:53:27 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoCR6eHgk0vaHShjO4Bre_VDKjHUbL9EuWHaUgRPSPPyVQ@mail.gmail.com>

On Wed, Apr 12, 2017 at 11:46 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 4/12/17 00:48, Masahiko Sawada wrote:

On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut

Perhaps instead of a global last_start_time, we store a per relation
last_start_time in SubscriptionRelState?

I was thinking the same. But a problem is that the list of
SubscriptionRelState is refreshed whenever the syncing table state
becomes invalid (table_state_valid = false). I guess we need to
improve these logic including GetSubscriptionNotReadyRelations().

The table states are invalidated on a syscache callback from
pg_subscription_rel, which happens roughly speaking when a table
finishes the initial sync. So if we're worried about failing tablesync
workers relaunching to quickly, this would only be a problem if a
tablesync of another table finishes right in that restart window. That
doesn't seem a terrible issue to me.

I think the table states are invalidated whenever the table sync
worker starts, because the table sync worker updates its status of
pg_subscription_rel and commits it before starting actual copy. So we
cannot rely on that. I thought we can store last_start_time into
pg_subscription_rel but it might be overkill. I'm now thinking to
change GetSubscriptionNotReadyRealtions so that last_start_time in
SubscriptionRelState is taken over to new list.

The right target of "This" below is found at the following URL.

/messages/by-id/CAD21AoBt_XUdppddFak661_LBM2t3CfK52aLKHG+ekd7SkzLmg@mail.gmail.com

This resolves the problem but, if I understand correctly, the
many pallocs in process_syncing_tables_for_apply() is working on
ApplyContext and the context is reset before the next visit here
(in LogicalRepApplyLoop).

Although this is not a problem of this patch, this is a problem
generally.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Petr Jelinek
petr@2ndquadrant.com
In reply to: Kyotaro Horiguchi (#20)
#22Petr Jelinek
petr@2ndquadrant.com
In reply to: Masahiko Sawada (#18)
#23Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Petr Jelinek (#21)
#24Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Petr Jelinek (#22)
#25Petr Jelinek
petr@2ndquadrant.com
In reply to: Masahiko Sawada (#24)
#26Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Petr Jelinek (#25)
#27Peter Eisentraut
peter_e@gmx.net
In reply to: Masahiko Sawada (#18)
#28Peter Eisentraut
peter_e@gmx.net
In reply to: Petr Jelinek (#22)
#29Noah Misch
noah@leadboat.com
In reply to: Masahiko Sawada (#1)
#30Peter Eisentraut
peter_e@gmx.net
In reply to: Masahiko Sawada (#18)
#31Petr Jelinek
petr@2ndquadrant.com
In reply to: Peter Eisentraut (#30)
#32Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Peter Eisentraut (#30)
#33Peter Eisentraut
peter_e@gmx.net
In reply to: Petr Jelinek (#31)
#34Petr Jelinek
petr@2ndquadrant.com
In reply to: Peter Eisentraut (#33)
#35Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Petr Jelinek (#34)
#36Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Kyotaro Horiguchi (#35)
#37Petr Jelinek
petr@2ndquadrant.com
In reply to: Masahiko Sawada (#36)
#38Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Petr Jelinek (#37)
#39Petr Jelinek
petr@2ndquadrant.com
In reply to: Masahiko Sawada (#38)
#40Noah Misch
noah@leadboat.com
In reply to: Noah Misch (#29)
#41Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Petr Jelinek (#39)
#42Petr Jelinek
petr@2ndquadrant.com
In reply to: Masahiko Sawada (#41)
#43Peter Eisentraut
peter_e@gmx.net
In reply to: Noah Misch (#40)
#44Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Petr Jelinek (#42)
#45Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Masahiko Sawada (#41)
#46Petr Jelinek
petr@2ndquadrant.com
In reply to: Masahiko Sawada (#44)
#47Petr Jelinek
petr@2ndquadrant.com
In reply to: Kyotaro Horiguchi (#45)
#48Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Kyotaro Horiguchi (#45)
#49Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#48)
#50Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Masahiko Sawada (#49)
#51Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Kyotaro Horiguchi (#50)
#52Petr Jelinek
petr@2ndquadrant.com
In reply to: Masahiko Sawada (#51)
#53Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Petr Jelinek (#52)
#54Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Masahiko Sawada (#51)
#55Peter Eisentraut
peter_e@gmx.net
In reply to: Petr Jelinek (#46)
#56Peter Eisentraut
peter_e@gmx.net
In reply to: Kyotaro Horiguchi (#54)
#57Peter Eisentraut
peter_e@gmx.net
In reply to: Peter Eisentraut (#43)
#58Petr Jelinek
petr@2ndquadrant.com
In reply to: Peter Eisentraut (#56)
#59Peter Eisentraut
peter_e@gmx.net
In reply to: Petr Jelinek (#58)
#60Petr Jelinek
petr@2ndquadrant.com
In reply to: Peter Eisentraut (#59)
#61Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Peter Eisentraut (#59)
#62Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Masahiko Sawada (#61)
#63Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Kyotaro Horiguchi (#62)
#64Peter Eisentraut
peter_e@gmx.net
In reply to: Petr Jelinek (#60)
#65Peter Eisentraut
peter_e@gmx.net
In reply to: Masahiko Sawada (#61)