Interval for launching the table sync worker

Started by Masahiko Sawadaabout 9 years ago65 messageshackers

sawada.mshk@gmail.com

about 9 years ago

Hi all,

While testing table sync worker for logical replication I noticed that
if the table sync worker of logical replication failed to insert the
data for whatever reason, the table sync worker process exits with
error. And then the main apply worker launches the table sync worker
again soon without interval. This routine is executed at very high
frequency without interval.

Should we do put a interval (wal_retrieve_interval or make a new GUC
parameter?) for launching the table sync worker?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Kyotaro Horiguchi

horikyota.ntt@gmail.com

about 9 years ago

In reply to: Masahiko Sawada (#1)

Re: Interval for launching the table sync worker

I was thinking the same.

At Thu, 6 Apr 2017 11:33:22 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoDCnyRJDUY=ESVVe68AukvOP2dFomTeBFpAd1TiFbjsGg@mail.gmail.com>

Hi all,

While testing table sync worker for logical replication I noticed that
if the table sync worker of logical replication failed to insert the
data for whatever reason, the table sync worker process exits with
error. And then the main apply worker launches the table sync worker
again soon without interval. This routine is executed at very high
frequency without interval.

Should we do put a interval (wal_retrieve_interval or make a new GUC
parameter?) for launching the table sync worker?

After introducing encoding conversion, untranslatable characters
in a published table causes this situation. Interval can reduce
the frequence of reconnecting, but I think that walreciever
should refrain from reconnecting on unrecoverable(or repeating)
error in walsender.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Masahiko Sawada

sawada.mshk@gmail.com

about 9 years ago

In reply to: Kyotaro Horiguchi (#2)

Re: Interval for launching the table sync worker

On Thu, Apr 6, 2017 at 3:45 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

I was thinking the same.

At Thu, 6 Apr 2017 11:33:22 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoDCnyRJDUY=ESVVe68AukvOP2dFomTeBFpAd1TiFbjsGg@mail.gmail.com>

Hi all,

While testing table sync worker for logical replication I noticed that
if the table sync worker of logical replication failed to insert the
data for whatever reason, the table sync worker process exits with
error. And then the main apply worker launches the table sync worker
again soon without interval. This routine is executed at very high
frequency without interval.

Should we do put a interval (wal_retrieve_interval or make a new GUC
parameter?) for launching the table sync worker?

After introducing encoding conversion, untranslatable characters
in a published table causes this situation.

I think it's better to make a new GUC parameter for the table sync
worker. Having multiple behaviors in wal_retrieve_retry_interval is
not good idea. Thought?

Interval can reduce
the frequence of reconnecting, but I think that walreciever
should refrain from reconnecting on unrecoverable(or repeating)
error in walsender.

Yeah, that's also considerable issue.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Kyotaro Horiguchi

horikyota.ntt@gmail.com

about 9 years ago

In reply to: Masahiko Sawada (#3)

Re: Interval for launching the table sync worker

At Thu, 6 Apr 2017 16:15:33 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoCrcwi3SwKKOW_efwW0finxyycLgsbw09n5uy2sxneO_A@mail.gmail.com>

On Thu, Apr 6, 2017 at 3:45 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

I was thinking the same.

At Thu, 6 Apr 2017 11:33:22 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoDCnyRJDUY=ESVVe68AukvOP2dFomTeBFpAd1TiFbjsGg@mail.gmail.com>

Hi all,

While testing table sync worker for logical replication I noticed that
if the table sync worker of logical replication failed to insert the
data for whatever reason, the table sync worker process exits with
error. And then the main apply worker launches the table sync worker
again soon without interval. This routine is executed at very high
frequency without interval.

Should we do put a interval (wal_retrieve_interval or make a new GUC
parameter?) for launching the table sync worker?

After introducing encoding conversion, untranslatable characters
in a published table causes this situation.

I think it's better to make a new GUC parameter for the table sync
worker. Having multiple behaviors in wal_retrieve_retry_interval is
not good idea. Thought?

I prefer subscription option than GUC. Something like following.

CREATE SUBSCRIPTION s1 CONNECTION 'blah'
PUBLICATION p1 WITH (noreconnect = true);

Stored in pg_subscription?

Interval can reduce
the frequence of reconnecting, but I think that walreciever
should refrain from reconnecting on unrecoverable(or repeating)
error in walsender.

Yeah, that's also considerable issue.

But not to do now.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Kyotaro Horiguchi

horikyota.ntt@gmail.com

about 9 years ago

In reply to: Kyotaro Horiguchi (#4)

Re: Interval for launching the table sync worker

At Thu, 06 Apr 2017 17:02:14 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20170406.170214.263553093.horiguchi.kyotaro@lab.ntt.co.jp>

At Thu, 6 Apr 2017 16:15:33 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoCrcwi3SwKKOW_efwW0finxyycLgsbw09n5uy2sxneO_A@mail.gmail.com>

On Thu, Apr 6, 2017 at 3:45 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

I was thinking the same.

At Thu, 6 Apr 2017 11:33:22 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoDCnyRJDUY=ESVVe68AukvOP2dFomTeBFpAd1TiFbjsGg@mail.gmail.com>

Hi all,

While testing table sync worker for logical replication I noticed that
if the table sync worker of logical replication failed to insert the
data for whatever reason, the table sync worker process exits with
error. And then the main apply worker launches the table sync worker
again soon without interval. This routine is executed at very high
frequency without interval.

Should we do put a interval (wal_retrieve_interval or make a new GUC
parameter?) for launching the table sync worker?

Hmm. I was playing with something wrong. Now I see the invervals
5 seconds. No problem.

After introducing encoding conversion, untranslatable characters
in a published table causes this situation.

I think it's better to make a new GUC parameter for the table sync
worker. Having multiple behaviors in wal_retrieve_retry_interval is
not good idea. Thought?

So, this is working, sorry.

I prefer subscription option than GUC. Something like following.

CREATE SUBSCRIPTION s1 CONNECTION 'blah'
PUBLICATION p1 WITH (noreconnect = true);

Stored in pg_subscription?

Interval can reduce
the frequence of reconnecting, but I think that walreciever
should refrain from reconnecting on unrecoverable(or repeating)
error in walsender.

Yeah, that's also considerable issue.

But not to do now.

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Masahiko Sawada

sawada.mshk@gmail.com

about 9 years ago

In reply to: Kyotaro Horiguchi (#5)

Re: Interval for launching the table sync worker

On Thu, Apr 6, 2017 at 9:06 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

At Thu, 06 Apr 2017 17:02:14 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20170406.170214.263553093.horiguchi.kyotaro@lab.ntt.co.jp>

At Thu, 6 Apr 2017 16:15:33 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoCrcwi3SwKKOW_efwW0finxyycLgsbw09n5uy2sxneO_A@mail.gmail.com>

On Thu, Apr 6, 2017 at 3:45 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

I was thinking the same.

At Thu, 6 Apr 2017 11:33:22 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoDCnyRJDUY=ESVVe68AukvOP2dFomTeBFpAd1TiFbjsGg@mail.gmail.com>

Hi all,

While testing table sync worker for logical replication I noticed that
if the table sync worker of logical replication failed to insert the
data for whatever reason, the table sync worker process exits with
error. And then the main apply worker launches the table sync worker
again soon without interval. This routine is executed at very high
frequency without interval.

Should we do put a interval (wal_retrieve_interval or make a new GUC
parameter?) for launching the table sync worker?

Hmm. I was playing with something wrong. Now I see the invervals
5 seconds. No problem.

Yeah, this issue happens only in the table sync worker.

After introducing encoding conversion, untranslatable characters
in a published table causes this situation.

I think it's better to make a new GUC parameter for the table sync
worker. Having multiple behaviors in wal_retrieve_retry_interval is
not good idea. Thought?

So, this is working, sorry.

I prefer subscription option than GUC. Something like following.

CREATE SUBSCRIPTION s1 CONNECTION 'blah'
PUBLICATION p1 WITH (noreconnect = true);

Stored in pg_subscription?

Interval can reduce
the frequence of reconnecting, but I think that walreciever
should refrain from reconnecting on unrecoverable(or repeating)
error in walsender.

Yeah, that's also considerable issue.

But not to do now.

I've added this as an open item, and sent a patch for this.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Petr Jelinek

petr@2ndquadrant.com

about 9 years ago

In reply to: Masahiko Sawada (#6)

Re: Interval for launching the table sync worker

On 06/04/17 16:44, Masahiko Sawada wrote:

On Thu, Apr 6, 2017 at 9:06 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

At Thu, 06 Apr 2017 17:02:14 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20170406.170214.263553093.horiguchi.kyotaro@lab.ntt.co.jp>

At Thu, 6 Apr 2017 16:15:33 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoCrcwi3SwKKOW_efwW0finxyycLgsbw09n5uy2sxneO_A@mail.gmail.com>

On Thu, Apr 6, 2017 at 3:45 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

I was thinking the same.

At Thu, 6 Apr 2017 11:33:22 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoDCnyRJDUY=ESVVe68AukvOP2dFomTeBFpAd1TiFbjsGg@mail.gmail.com>

Hi all,

While testing table sync worker for logical replication I noticed that
if the table sync worker of logical replication failed to insert the
data for whatever reason, the table sync worker process exits with
error. And then the main apply worker launches the table sync worker
again soon without interval. This routine is executed at very high
frequency without interval.

Should we do put a interval (wal_retrieve_interval or make a new GUC
parameter?) for launching the table sync worker?

Hmm. I was playing with something wrong. Now I see the invervals
5 seconds. No problem.

Yeah, this issue happens only in the table sync worker.

After introducing encoding conversion, untranslatable characters
in a published table causes this situation.

I think it's better to make a new GUC parameter for the table sync
worker. Having multiple behaviors in wal_retrieve_retry_interval is
not good idea. Thought?

So, this is working, sorry.

I prefer subscription option than GUC. Something like following.

CREATE SUBSCRIPTION s1 CONNECTION 'blah'
PUBLICATION p1 WITH (noreconnect = true);

Stored in pg_subscription?

I don't think that's a very good solution, you'd lose replication on
every network glitch, upstream server restart, etc.

Interval can reduce
the frequence of reconnecting, but I think that walreciever
should refrain from reconnecting on unrecoverable(or repeating)
error in walsender.

Yeah, that's also considerable issue.

But not to do now.

I've added this as an open item, and sent a patch for this.

I am not exactly sure what's the open item from this thread. To use the
wal_retrieve_interval to limit table sync restarts?

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Kyotaro Horiguchi

horikyota.ntt@gmail.com

about 9 years ago

In reply to: Petr Jelinek (#7)

Re: Interval for launching the table sync worker

At Thu, 6 Apr 2017 18:42:37 +0200, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in <8c7afb37-be73-c6bd-80bc-e87522f0461a@2ndquadrant.com>

On 06/04/17 16:44, Masahiko Sawada wrote:

On Thu, Apr 6, 2017 at 9:06 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

I prefer subscription option than GUC. Something like following.

CREATE SUBSCRIPTION s1 CONNECTION 'blah'
PUBLICATION p1 WITH (noreconnect = true);

Stored in pg_subscription?

I don't think that's a very good solution, you'd lose replication on
every network glitch, upstream server restart, etc.

Yes, you're right. This would work if apply worker distinguishes
permanent error. But it is overkill so far.

I've added this as an open item, and sent a patch for this.

I am not exactly sure what's the open item from this thread. To use the
wal_retrieve_interval to limit table sync restarts?

It's not me. I also don't think this critical.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Masahiko Sawada

sawada.mshk@gmail.com

about 9 years ago

In reply to: Kyotaro Horiguchi (#8)

Re: Interval for launching the table sync worker

On Fri, Apr 7, 2017 at 1:23 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

At Thu, 6 Apr 2017 18:42:37 +0200, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in <8c7afb37-be73-c6bd-80bc-e87522f0461a@2ndquadrant.com>

On 06/04/17 16:44, Masahiko Sawada wrote:

On Thu, Apr 6, 2017 at 9:06 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

I prefer subscription option than GUC. Something like following.

CREATE SUBSCRIPTION s1 CONNECTION 'blah'
PUBLICATION p1 WITH (noreconnect = true);

Stored in pg_subscription?

I don't think that's a very good solution, you'd lose replication on
every network glitch, upstream server restart, etc.

Yes, you're right. This would work if apply worker distinguishes
permanent error. But it is overkill so far.

I've added this as an open item, and sent a patch for this.

I am not exactly sure what's the open item from this thread. To use the
wal_retrieve_interval to limit table sync restarts?

It's not me. I also don't think this critical.

Thank you for the comment.

It's not critical but it could be problem. So I thought we should fix
it before the PostgreSQL 10 release. If it's not appropriate as an
open item I'll remove it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Peter Eisentraut

peter_e@gmx.net

about 9 years ago

In reply to: Masahiko Sawada (#9)

Re: Interval for launching the table sync worker

On 4/7/17 01:10, Masahiko Sawada wrote:

It's not critical but it could be problem. So I thought we should fix
it before the PostgreSQL 10 release. If it's not appropriate as an
open item I'll remove it.

You wrote that you "sent" a patch, but I don't see a patch anywhere.

I think a nonintrusive patch for this could be considered.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Masahiko Sawada

sawada.mshk@gmail.com

about 9 years ago

In reply to: Peter Eisentraut (#10)

Re: Interval for launching the table sync worker

On Sat, Apr 8, 2017 at 8:13 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 4/7/17 01:10, Masahiko Sawada wrote:

It's not critical but it could be problem. So I thought we should fix
it before the PostgreSQL 10 release. If it's not appropriate as an
open item I'll remove it.

You wrote that you "sent" a patch, but I don't see a patch anywhere.

I think a nonintrusive patch for this could be considered.

Oops, I made a mistake. I'll send a patch tomorrow.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Masahiko Sawada

sawada.mshk@gmail.com

about 9 years ago

In reply to: Masahiko Sawada (#11)

Re: Interval for launching the table sync worker

On Sun, Apr 9, 2017 at 6:25 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Apr 8, 2017 at 8:13 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 4/7/17 01:10, Masahiko Sawada wrote:

It's not critical but it could be problem. So I thought we should fix
it before the PostgreSQL 10 release. If it's not appropriate as an
open item I'll remove it.

You wrote that you "sent" a patch, but I don't see a patch anywhere.

I think a nonintrusive patch for this could be considered.

Oops, I made a mistake. I'll send a patch tomorrow.

I've attached the patch. This patch introduces GUC parameter
table_sync_retry_interval which controls the interval of launching the
table sync worker process.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#13

Petr Jelinek

petr@2ndquadrant.com

about 9 years ago

In reply to: Masahiko Sawada (#12)

Re: Interval for launching the table sync worker

On 10/04/17 11:02, Masahiko Sawada wrote:

On Sun, Apr 9, 2017 at 6:25 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Apr 8, 2017 at 8:13 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 4/7/17 01:10, Masahiko Sawada wrote:

It's not critical but it could be problem. So I thought we should fix
it before the PostgreSQL 10 release. If it's not appropriate as an
open item I'll remove it.

You wrote that you "sent" a patch, but I don't see a patch anywhere.

I think a nonintrusive patch for this could be considered.

Oops, I made a mistake. I'll send a patch tomorrow.

I've attached the patch. This patch introduces GUC parameter
table_sync_retry_interval which controls the interval of launching the
table sync worker process.

I don't think solution is quite this simple. This will cause all table
sync workers to be delayed which means concurrency will suffer and the
initial sync of all tables will take much longer especially if there is
little data. We need a way to either detect if we are launching same
worker that was already launched before, or alternatively if we are
launching crashed worker and only then apply the delay.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Peter Eisentraut

peter_e@gmx.net

about 9 years ago

In reply to: Petr Jelinek (#13)

Re: Interval for launching the table sync worker

On 4/10/17 08:10, Petr Jelinek wrote:

I don't think solution is quite this simple. This will cause all table
sync workers to be delayed which means concurrency will suffer and the
initial sync of all tables will take much longer especially if there is
little data. We need a way to either detect if we are launching same
worker that was already launched before, or alternatively if we are
launching crashed worker and only then apply the delay.

Perhaps instead of a global last_start_time, we store a per relation
last_start_time in SubscriptionRelState?

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Masahiko Sawada

sawada.mshk@gmail.com

about 9 years ago

In reply to: Peter Eisentraut (#14)

Re: Interval for launching the table sync worker

On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 4/10/17 08:10, Petr Jelinek wrote:

I don't think solution is quite this simple. This will cause all table
sync workers to be delayed which means concurrency will suffer and the
initial sync of all tables will take much longer especially if there is
little data. We need a way to either detect if we are launching same
worker that was already launched before, or alternatively if we are
launching crashed worker and only then apply the delay.

Perhaps instead of a global last_start_time, we store a per relation
last_start_time in SubscriptionRelState?

I was thinking the same. But a problem is that the list of
SubscriptionRelState is refreshed whenever the syncing table state
becomes invalid (table_state_valid = false). I guess we need to
improve these logic including GetSubscriptionNotReadyRelations().

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Peter Eisentraut

peter_e@gmx.net

about 9 years ago

In reply to: Masahiko Sawada (#15)

Re: Interval for launching the table sync worker

On 4/12/17 00:48, Masahiko Sawada wrote:

On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut

Perhaps instead of a global last_start_time, we store a per relation
last_start_time in SubscriptionRelState?

I was thinking the same. But a problem is that the list of
SubscriptionRelState is refreshed whenever the syncing table state
becomes invalid (table_state_valid = false). I guess we need to
improve these logic including GetSubscriptionNotReadyRelations().

The table states are invalidated on a syscache callback from
pg_subscription_rel, which happens roughly speaking when a table
finishes the initial sync. So if we're worried about failing tablesync
workers relaunching to quickly, this would only be a problem if a
tablesync of another table finishes right in that restart window. That
doesn't seem a terrible issue to me.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Masahiko Sawada

sawada.mshk@gmail.com

about 9 years ago

In reply to: Peter Eisentraut (#16)

Re: Interval for launching the table sync worker

On Wed, Apr 12, 2017 at 11:46 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 4/12/17 00:48, Masahiko Sawada wrote:

On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut

Perhaps instead of a global last_start_time, we store a per relation
last_start_time in SubscriptionRelState?

I was thinking the same. But a problem is that the list of
SubscriptionRelState is refreshed whenever the syncing table state
becomes invalid (table_state_valid = false). I guess we need to
improve these logic including GetSubscriptionNotReadyRelations().

The table states are invalidated on a syscache callback from
pg_subscription_rel, which happens roughly speaking when a table
finishes the initial sync. So if we're worried about failing tablesync
workers relaunching to quickly, this would only be a problem if a
tablesync of another table finishes right in that restart window. That
doesn't seem a terrible issue to me.

I think the table states are invalidated whenever the table sync
worker starts, because the table sync worker updates its status of
pg_subscription_rel and commits it before starting actual copy. So we
cannot rely on that. I thought we can store last_start_time into
pg_subscription_rel but it might be overkill. I'm now thinking to
change GetSubscriptionNotReadyRealtions so that last_start_time in
SubscriptionRelState is taken over to new list.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Masahiko Sawada

sawada.mshk@gmail.com

about 9 years ago

In reply to: Masahiko Sawada (#17)

Re: Interval for launching the table sync worker

On Thu, Apr 13, 2017 at 11:53 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Apr 12, 2017 at 11:46 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 4/12/17 00:48, Masahiko Sawada wrote:

On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut

Perhaps instead of a global last_start_time, we store a per relation
last_start_time in SubscriptionRelState?

I was thinking the same. But a problem is that the list of
SubscriptionRelState is refreshed whenever the syncing table state
becomes invalid (table_state_valid = false). I guess we need to
improve these logic including GetSubscriptionNotReadyRelations().

The table states are invalidated on a syscache callback from
pg_subscription_rel, which happens roughly speaking when a table
finishes the initial sync. So if we're worried about failing tablesync
workers relaunching to quickly, this would only be a problem if a
tablesync of another table finishes right in that restart window. That
doesn't seem a terrible issue to me.

I think the table states are invalidated whenever the table sync
worker starts, because the table sync worker updates its status of
pg_subscription_rel and commits it before starting actual copy. So we
cannot rely on that. I thought we can store last_start_time into
pg_subscription_rel but it might be overkill. I'm now thinking to
change GetSubscriptionNotReadyRealtions so that last_start_time in
SubscriptionRelState is taken over to new list.

Attached the latest patch. It didn't actually necessary to change
GetSubscriptionNotReadyRelations. I just changed the logic refreshing
the sync table state list.
Please give me feedback.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#19

Kyotaro Horiguchi

horikyota.ntt@gmail.com

about 9 years ago

In reply to: Masahiko Sawada (#17)

Re: Interval for launching the table sync worker

I confused sync and apply workers.
sync worker failure at start causes immediate retries.

At Thu, 13 Apr 2017 11:53:27 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoCR6eHgk0vaHShjO4Bre_VDKjHUbL9EuWHaUgRPSPPyVQ@mail.gmail.com>

On Wed, Apr 12, 2017 at 11:46 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 4/12/17 00:48, Masahiko Sawada wrote:

On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut

Perhaps instead of a global last_start_time, we store a per relation
last_start_time in SubscriptionRelState?

I was thinking the same. But a problem is that the list of
SubscriptionRelState is refreshed whenever the syncing table state
becomes invalid (table_state_valid = false). I guess we need to
improve these logic including GetSubscriptionNotReadyRelations().

The table states are invalidated on a syscache callback from
pg_subscription_rel, which happens roughly speaking when a table
finishes the initial sync. So if we're worried about failing tablesync
workers relaunching to quickly, this would only be a problem if a
tablesync of another table finishes right in that restart window. That
doesn't seem a terrible issue to me.

I think the table states are invalidated whenever the table sync
worker starts, because the table sync worker updates its status of
pg_subscription_rel and commits it before starting actual copy. So we
cannot rely on that. I thought we can store last_start_time into
pg_subscription_rel but it might be overkill. I'm now thinking to
change GetSubscriptionNotReadyRealtions so that last_start_time in
SubscriptionRelState is taken over to new list.

This resolves the problem but, if I understand correctly, the
many pallocs in process_syncing_tables_for_apply() is working on
ApplyContext and the context is reset before the next visit here
(in LogicalRepApplyLoop).

Although this is not a problem of this patch, this is a problem
generally.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Kyotaro Horiguchi

horikyota.ntt@gmail.com

about 9 years ago

In reply to: Kyotaro Horiguchi (#19)

Re: Interval for launching the table sync worker

Ouch! I replied to wrong mail.

At Thu, 13 Apr 2017 19:55:04 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20170413.195504.89348773.horiguchi.kyotaro@lab.ntt.co.jp>

I confused sync and apply workers.
sync worker failure at start causes immediate retries.

At Thu, 13 Apr 2017 11:53:27 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoCR6eHgk0vaHShjO4Bre_VDKjHUbL9EuWHaUgRPSPPyVQ@mail.gmail.com>

On Wed, Apr 12, 2017 at 11:46 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 4/12/17 00:48, Masahiko Sawada wrote:

On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut

Perhaps instead of a global last_start_time, we store a per relation
last_start_time in SubscriptionRelState?

I was thinking the same. But a problem is that the list of
SubscriptionRelState is refreshed whenever the syncing table state
becomes invalid (table_state_valid = false). I guess we need to
improve these logic including GetSubscriptionNotReadyRelations().

The table states are invalidated on a syscache callback from
pg_subscription_rel, which happens roughly speaking when a table
finishes the initial sync. So if we're worried about failing tablesync
workers relaunching to quickly, this would only be a problem if a
tablesync of another table finishes right in that restart window. That
doesn't seem a terrible issue to me.

I think the table states are invalidated whenever the table sync
worker starts, because the table sync worker updates its status of
pg_subscription_rel and commits it before starting actual copy. So we
cannot rely on that. I thought we can store last_start_time into
pg_subscription_rel but it might be overkill. I'm now thinking to
change GetSubscriptionNotReadyRealtions so that last_start_time in
SubscriptionRelState is taken over to new list.

The right target of "This" below is found at the following URL.

/messages/by-id/CAD21AoBt_XUdppddFak661_LBM2t3CfK52aLKHG+ekd7SkzLmg@mail.gmail.com

This resolves the problem but, if I understand correctly, the
many pallocs in process_syncing_tables_for_apply() is working on
ApplyContext and the context is reset before the next visit here
(in LogicalRepApplyLoop).

Although this is not a problem of this patch, this is a problem
generally.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21

Petr Jelinek

petr@2ndquadrant.com

about 9 years ago

In reply to: Kyotaro Horiguchi (#20)

#22

Petr Jelinek

petr@2ndquadrant.com

about 9 years ago

In reply to: Masahiko Sawada (#18)

#23

Kyotaro Horiguchi

horikyota.ntt@gmail.com

about 9 years ago

In reply to: Petr Jelinek (#21)

#24

Masahiko Sawada

sawada.mshk@gmail.com

about 9 years ago

In reply to: Petr Jelinek (#22)

#25

Petr Jelinek

petr@2ndquadrant.com

about 9 years ago

In reply to: Masahiko Sawada (#24)

#26

Masahiko Sawada

sawada.mshk@gmail.com

about 9 years ago

In reply to: Petr Jelinek (#25)

#27

Peter Eisentraut

peter_e@gmx.net

about 9 years ago

In reply to: Masahiko Sawada (#18)

#28

Peter Eisentraut

peter_e@gmx.net

about 9 years ago

In reply to: Petr Jelinek (#22)

#29

Noah Misch

noah@leadboat.com

about 9 years ago

In reply to: Masahiko Sawada (#1)

#30

Peter Eisentraut

peter_e@gmx.net

about 9 years ago

In reply to: Masahiko Sawada (#18)

#31

Petr Jelinek

petr@2ndquadrant.com

about 9 years ago

In reply to: Peter Eisentraut (#30)

#32

Masahiko Sawada

sawada.mshk@gmail.com

about 9 years ago

In reply to: Peter Eisentraut (#30)

#33

Peter Eisentraut

peter_e@gmx.net

about 9 years ago

In reply to: Petr Jelinek (#31)

#34

Petr Jelinek

petr@2ndquadrant.com

about 9 years ago

In reply to: Peter Eisentraut (#33)

#35

Kyotaro Horiguchi

horikyota.ntt@gmail.com

about 9 years ago

In reply to: Petr Jelinek (#34)

#36

Masahiko Sawada

sawada.mshk@gmail.com

about 9 years ago

In reply to: Kyotaro Horiguchi (#35)

#37

Petr Jelinek

petr@2ndquadrant.com

about 9 years ago

In reply to: Masahiko Sawada (#36)

#38

Masahiko Sawada

sawada.mshk@gmail.com

about 9 years ago

In reply to: Petr Jelinek (#37)

#39

Petr Jelinek

petr@2ndquadrant.com

about 9 years ago

In reply to: Masahiko Sawada (#38)

#40

Noah Misch

noah@leadboat.com

about 9 years ago

In reply to: Noah Misch (#29)

#41

Masahiko Sawada

sawada.mshk@gmail.com

about 9 years ago

In reply to: Petr Jelinek (#39)

#42

Petr Jelinek

petr@2ndquadrant.com

about 9 years ago

In reply to: Masahiko Sawada (#41)

#43

Peter Eisentraut

peter_e@gmx.net

about 9 years ago

In reply to: Noah Misch (#40)

#44

Masahiko Sawada

sawada.mshk@gmail.com

about 9 years ago

In reply to: Petr Jelinek (#42)

#45

Kyotaro Horiguchi

horikyota.ntt@gmail.com

about 9 years ago

In reply to: Masahiko Sawada (#41)

#46

Petr Jelinek

petr@2ndquadrant.com

about 9 years ago

In reply to: Masahiko Sawada (#44)

#47

Petr Jelinek

petr@2ndquadrant.com

about 9 years ago

In reply to: Kyotaro Horiguchi (#45)

#48

Masahiko Sawada

sawada.mshk@gmail.com

about 9 years ago

In reply to: Kyotaro Horiguchi (#45)

#49

Masahiko Sawada

sawada.mshk@gmail.com

about 9 years ago

In reply to: Masahiko Sawada (#48)

#50

Kyotaro Horiguchi

horikyota.ntt@gmail.com

about 9 years ago

In reply to: Masahiko Sawada (#49)

#51

Masahiko Sawada

sawada.mshk@gmail.com

about 9 years ago

In reply to: Kyotaro Horiguchi (#50)

#52

Petr Jelinek

petr@2ndquadrant.com

about 9 years ago

In reply to: Masahiko Sawada (#51)

#53

Masahiko Sawada

sawada.mshk@gmail.com

about 9 years ago

In reply to: Petr Jelinek (#52)

#54

Kyotaro Horiguchi

horikyota.ntt@gmail.com

about 9 years ago

In reply to: Masahiko Sawada (#51)

#55

Peter Eisentraut

peter_e@gmx.net

about 9 years ago

In reply to: Petr Jelinek (#46)

#56

Peter Eisentraut

peter_e@gmx.net

about 9 years ago

In reply to: Kyotaro Horiguchi (#54)

#57

Peter Eisentraut

peter_e@gmx.net

about 9 years ago

In reply to: Peter Eisentraut (#43)

#58

Petr Jelinek

petr@2ndquadrant.com

about 9 years ago

In reply to: Peter Eisentraut (#56)

#59

Peter Eisentraut

peter_e@gmx.net

about 9 years ago

In reply to: Petr Jelinek (#58)

#60

Petr Jelinek

petr@2ndquadrant.com

about 9 years ago

In reply to: Peter Eisentraut (#59)

#61

Masahiko Sawada

sawada.mshk@gmail.com

about 9 years ago

In reply to: Peter Eisentraut (#59)

#62

Kyotaro Horiguchi

horikyota.ntt@gmail.com

about 9 years ago

In reply to: Masahiko Sawada (#61)

#63

Masahiko Sawada

sawada.mshk@gmail.com

about 9 years ago

In reply to: Kyotaro Horiguchi (#62)

#64

Peter Eisentraut

peter_e@gmx.net

about 9 years ago

In reply to: Petr Jelinek (#60)

#65

Peter Eisentraut

peter_e@gmx.net

about 9 years ago

In reply to: Masahiko Sawada (#61)

Interval for launching the table sync worker

Attachments:

Attachments: