Logical Decoding Failover

Started by Colin Morelliover 9 years ago9 messagesgeneral
Jump to latest
#1Colin Morelli
colin.morelli@gmail.com

Hey all,

I'm using logical decoding in my application to capture change streams and
ship them to Kafka. However, logical replication slots aren't included in
the WAL and thus don't make it to replicas. In the case of a failover, it's
not clear what (if anything) can be done to continue receiving a change
stream.

I noticed there was a failover slot patch that looks like it won't be
making it for 9.6 (if at all), but does anyone else have suggestions for
handling failover + logical decoding? Is there an answer, or is it limited
to being a very manual process for now?

Thanks in advance.

Best,
Colin

#2Venkata B Nagothi
nag1010@gmail.com
In reply to: Colin Morelli (#1)
Re: Logical Decoding Failover

On Sat, Aug 6, 2016 at 1:17 PM, Colin Morelli <colin.morelli@gmail.com>
wrote:

Hey all,

I'm using logical decoding in my application to capture change streams and
ship them to Kafka. However, logical replication slots aren't included in
the WAL and thus don't make it to replicas. In the case of a failover, it's
not clear what (if anything) can be done to continue receiving a change
stream.

I noticed there was a failover slot patch that looks like it won't be
making it for 9.6 (if at all), but does anyone else have suggestions for
handling failover + logical decoding? Is there an answer, or is it limited
to being a very manual process for now?

Yes, as of i know, It is a manual process for now. Another alternative you
can consider (if possible at all) would be a tool like PgPool-II which is
actually used for connection pooling, load balancing and automatic
failover. It may not make sense if you are do not wish to pool you
connections.

Regards,
Venkata B N

Fujitsu Australia

#3Colin Morelli
colin.morelli@gmail.com
In reply to: Venkata B Nagothi (#2)
Re: Logical Decoding Failover

Venkata,

Thanks for the reply. Unfortunately something like PgPool still won't
create the replication slots on all hosts, and record the LSN in a way that
is reusable on the secondary.

This sort of puts logical decoding at odds with HA, correct? In the case of
master failover, there's no way to: a) know where in the stream you read
to, or b) convert that to something that can be understood by a replica
that's about to be promoted.

Are there any scheduled improvements here either for 9.6 or 9.7?

Best,
Colin
On Sun, Aug 7, 2016 at 1:50 AM Venkata Balaji N <nag1010@gmail.com> wrote:

Show quoted text

On Sat, Aug 6, 2016 at 1:17 PM, Colin Morelli <colin.morelli@gmail.com>
wrote:

Hey all,

I'm using logical decoding in my application to capture change streams
and ship them to Kafka. However, logical replication slots aren't included
in the WAL and thus don't make it to replicas. In the case of a failover,
it's not clear what (if anything) can be done to continue receiving a
change stream.

I noticed there was a failover slot patch that looks like it won't be
making it for 9.6 (if at all), but does anyone else have suggestions for
handling failover + logical decoding? Is there an answer, or is it limited
to being a very manual process for now?

Yes, as of i know, It is a manual process for now. Another alternative you
can consider (if possible at all) would be a tool like PgPool-II which is
actually used for connection pooling, load balancing and automatic
failover. It may not make sense if you are do not wish to pool you
connections.

Regards,
Venkata B N

Fujitsu Australia

#4Venkata B Nagothi
nag1010@gmail.com
In reply to: Colin Morelli (#3)
Re: Logical Decoding Failover

On Sun, Aug 7, 2016 at 9:29 PM, Colin Morelli <colin.morelli@gmail.com>
wrote:

Venkata,

Thanks for the reply. Unfortunately something like PgPool still won't
create the replication slots on all hosts, and record the LSN in a way that
is reusable on the secondary.

Yes, thats correct, pgPool does not have anything to do with replication
slots. That is something which you need to manually configure.

This sort of puts logical decoding at odds with HA, correct? In the case
of master failover, there's no way to: a) know where in the stream you read
to, or b) convert that to something that can be understood by a replica
that's about to be promoted.

Can you please clarify, what did you exactly mean here ? are you referring
to cascading standbys or something ? Failover is something which
application has to do, to reconnect to the promoted standby. As far as
logical decoding is concerned, you need to have a replication slot
configured for the new master to ensure all the WALs which are not
replicated to standby are retained. After promotion, the standby database
becomes a standalone instance and the replication must be re-enabled. You
can automate the standby promotion using tools like pgpool-II or pacemaker.

I am trying to differentiate standby-promotion and failover mechanism here.

Are there any scheduled improvements here either for 9.6 or 9.7?

I am not 100% sure, if there is anything developed from an failover
mechanism perspective.

Regards,
Venkata B N

Fujitsu Australia

#5Colin Morelli
colin.morelli@gmail.com
In reply to: Venkata B Nagothi (#4)
Re: Logical Decoding Failover

Venkata,

No problem - thanks for replying. I'll try to clarify.

Say I have databases A and B, where A is primary and B is a replica. I have
an application that executes 3 transactions against A, resulting in WAL log
sequence numbers 1, 2, and 3. The WAL is successfully replicated to B.

I have a logical deciding application which is pulling the logical stream,
and reads LSN 1 and 2 before the primary (A) fails. I decide to fail over
to B, which has all data on it and things work great.

Now the logical decoding client connects to B (the new primary). The
replication slot doesn't exist. So, it creates it and starts streaming.
This is where the problem lies - as it would begin streaming from LSN 4
(anything after what has already been committed), because I have no way
(that I can find) of restoring my "progress" through the WAL on a the
replicas.

As a result, my application never sees the event at LSN 3. In fact, I'm not
even sure how I could manually do this.

Hopefully that makes more sense.

Best,
Colin
On Mon, Aug 8, 2016 at 10:47 PM Venkata Balaji N <nag1010@gmail.com> wrote:

Show quoted text

On Sun, Aug 7, 2016 at 9:29 PM, Colin Morelli <colin.morelli@gmail.com>
wrote:

Venkata,

Thanks for the reply. Unfortunately something like PgPool still won't
create the replication slots on all hosts, and record the LSN in a way that
is reusable on the secondary.

Yes, thats correct, pgPool does not have anything to do with replication
slots. That is something which you need to manually configure.

This sort of puts logical decoding at odds with HA, correct? In the case
of master failover, there's no way to: a) know where in the stream you read
to, or b) convert that to something that can be understood by a replica
that's about to be promoted.

Can you please clarify, what did you exactly mean here ? are you referring
to cascading standbys or something ? Failover is something which
application has to do, to reconnect to the promoted standby. As far as
logical decoding is concerned, you need to have a replication slot
configured for the new master to ensure all the WALs which are not
replicated to standby are retained. After promotion, the standby database
becomes a standalone instance and the replication must be re-enabled. You
can automate the standby promotion using tools like pgpool-II or pacemaker.

I am trying to differentiate standby-promotion and failover mechanism here.

Are there any scheduled improvements here either for 9.6 or 9.7?

I am not 100% sure, if there is anything developed from an failover
mechanism perspective.

Regards,
Venkata B N

Fujitsu Australia

#6Venkata B Nagothi
nag1010@gmail.com
In reply to: Colin Morelli (#5)
Re: Logical Decoding Failover

Now the logical decoding client connects to B (the new primary). The
replication slot doesn't exist. So, it creates it and starts streaming.
This is where the problem lies - as it would begin streaming from LSN 4
(anything after what has already been committed), because I have no way
(that I can find) of restoring my "progress" through the WAL on a the
replicas.

As a result, my application never sees the event at LSN 3. In fact, I'm
not even sure how I could manually do this.

Yes, that is the current limitation and as of i know, there is no such
patch being developed at the moment. I leave it to any of the developers to
respond. This cannot be done manually as the new master will not read
through from
LSN 3 (which is a thing of past unknown to new master) and replication slot
at standby is a new one with the latest starting from LSN 4 which, the
logical decoder does not know.

Hope that helps !

Regards,
Venkata B N

Fujitsu Australia

#7Simon Riggs
simon@2ndQuadrant.com
In reply to: Colin Morelli (#1)
Re: Logical Decoding Failover

On 6 August 2016 at 04:17, Colin Morelli <colin.morelli@gmail.com> wrote:

I'm using logical decoding in my application to capture change streams and
ship them to Kafka. However, logical replication slots aren't included in
the WAL and thus don't make it to replicas. In the case of a failover, it's
not clear what (if anything) can be done to continue receiving a change
stream.

I noticed there was a failover slot patch that looks like it won't be making
it for 9.6 (if at all), but does anyone else have suggestions for handling
failover + logical decoding? Is there an answer, or is it limited to being a
very manual process for now?

Hi Colin,

Yes, its a current issue.

A patch was submitted but not applied to 9.6 for this, so the issue
continues. As a result, 2Q support that as a custom solution for a
number of customers.

A different design will be submitted for the next release, 10.0.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8Colin Morelli
colin.morelli@gmail.com
In reply to: Simon Riggs (#7)
Re: Logical Decoding Failover

Simon,

That's too bad. Logical decoding is incredibly powerful, but if it
ultimately comes down to picking between HA or logical decoding the choice
to use it becomes a lot more difficult.

Are you saying that logical decoding is receiving a different design in
10.0? (For clarify - is 10.0 the next scheduled release after 9.6 - so next
year?). If so, I'd greatly appreciate any more info you have on the new
design (even patches if you have them).

Best,
Colin

On Wed, Aug 10, 2016 at 12:53 AM Simon Riggs <simon@2ndquadrant.com> wrote:

Show quoted text

On 6 August 2016 at 04:17, Colin Morelli <colin.morelli@gmail.com> wrote:

I'm using logical decoding in my application to capture change streams

and

ship them to Kafka. However, logical replication slots aren't included in
the WAL and thus don't make it to replicas. In the case of a failover,

it's

not clear what (if anything) can be done to continue receiving a change
stream.

I noticed there was a failover slot patch that looks like it won't be

making

it for 9.6 (if at all), but does anyone else have suggestions for

handling

failover + logical decoding? Is there an answer, or is it limited to

being a

very manual process for now?

Hi Colin,

Yes, its a current issue.

A patch was submitted but not applied to 9.6 for this, so the issue
continues. As a result, 2Q support that as a custom solution for a
number of customers.

A different design will be submitted for the next release, 10.0.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#9Simon Riggs
simon@2ndQuadrant.com
In reply to: Colin Morelli (#8)
Re: Logical Decoding Failover

On 10 August 2016 at 09:03, Colin Morelli <colin.morelli@gmail.com> wrote:

That's too bad. Logical decoding is incredibly powerful, but if it
ultimately comes down to picking between HA or logical decoding the choice
to use it becomes a lot more difficult.

Agreed.

Are you saying that logical decoding is receiving a different design in
10.0? (For clarify - is 10.0 the next scheduled release after 9.6 - so next
year?). If so, I'd greatly appreciate any more info you have on the new
design (even patches if you have them).

A new design for the failover aspect of logical replication will be in 10.0.

Yes, 10.0 is the next release, due 2017.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general