Logical archiving

Started by Andrey Borodinabout 5 years ago9 messages
#1Andrey Borodin
Andrey Borodin
x4mmm@yandex-team.ru

Hi all

I was discussing problems of CDC with scientific community and they asked this simple question: "So you have efficient WAL archive on a very cheap storage, why don't you have a logical archive too?"
This seems like a wild idea. But really, we have a super expensive NVMe drives for OLTP workload. And use this devices to store buffer for data to be dumped into MapReduce\YT analytical system.
If OLAP cannot consume data fast enough - we are out of space due to repl slot.
If we have a WAL HA switchover - OLAP has a hole in the stream and have to resync data from the scratch.

If we could just run archive command ```archive-tool wal-push 0000000900000F2C000000E1.logical``` with contents of logical replication - this would be super cool for OLAP. I'd prefer even avoid writing 0000000900000F2C000000E1.logical to disk, i.e. push data on stdio or something like that.

What do you think?

Best regards, Andrey Borodin.

#2Euler Taveira
Euler Taveira
euler.taveira@2ndquadrant.com
In reply to: Andrey Borodin (#1)
Re: Logical archiving

On Fri, 4 Dec 2020 at 04:33, Andrey Borodin <x4mmm@yandex-team.ru> wrote:

I was discussing problems of CDC with scientific community and they asked
this simple question: "So you have efficient WAL archive on a very cheap
storage, why don't you have a logical archive too?"

WAL archive doesn't process data; it just copies from one location into
another one. However, "logical archive" must process data.

If we could just run archive command ```archive-tool wal-push
0000000900000F2C000000E1.logical``` with contents of logical replication -
this would be super cool for OLAP. I'd prefer even avoid writing
0000000900000F2C000000E1.logical to disk, i.e. push data on stdio or
something like that.

The most time consuming process is logical decoding, mainly due to long

running transactions. In order to minimize your issue, we should improve
the logical decoding mechanism. There was a discussion about allowing
logical decoding on the replica that would probably help your use case a
lot.

--
Euler Taveira http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#3Andrey Borodin
Andrey Borodin
x4mmm@yandex-team.ru
In reply to: Euler Taveira (#2)
Re: Logical archiving

Hi Euler!

Thanks for your response.

4 дек. 2020 г., в 22:14, Euler Taveira <euler.taveira@2ndquadrant.com> написал(а):

On Fri, 4 Dec 2020 at 04:33, Andrey Borodin <x4mmm@yandex-team.ru> wrote:

I was discussing problems of CDC with scientific community and they asked this simple question: "So you have efficient WAL archive on a very cheap storage, why don't you have a logical archive too?"

WAL archive doesn't process data; it just copies from one location into another one. However, "logical archive" must process data.

WAL archiving processes data: it does compression, encryption and digesting. Only minimal impractical setup will copy data as is. However I agree, that all processing is done outside postgres.

If we could just run archive command ```archive-tool wal-push 0000000900000F2C000000E1.logical``` with contents of logical replication - this would be super cool for OLAP. I'd prefer even avoid writing 0000000900000F2C000000E1.logical to disk, i.e. push data on stdio or something like that.

The most time consuming process is logical decoding, mainly due to long running transactions.

Currently I do not experience problem of high CPU utilisation.

In order to minimize your issue, we should improve the logical decoding mechanism.

No, the issue I'm facing comes from the fact that corner cases of failover are not solved properly for logical replication. Timelines, partial segments, archiving along with streaming, starting from arbitrary LSN (within available WAL), rewind, named restore points, cascade replication etc etc. All these nice things are there for WAL and are missing for LR. I'm just trying to find shortest path through this to make CDC(changed data capture) work.

There was a discussion about allowing logical decoding on the replica that would probably help your use case a lot.

I will look there more closely, thanks! But it's only part of a solution.

Best regards, Andrey Borodin.

#4Euler Taveira
Euler Taveira
euler.taveira@2ndquadrant.com
In reply to: Andrey Borodin (#3)
Re: Logical archiving

On Fri, 4 Dec 2020 at 14:36, Andrey Borodin <x4mmm@yandex-team.ru> wrote:

The most time consuming process is logical decoding, mainly due to long

running transactions.
Currently I do not experience problem of high CPU utilisation.

I'm wondering why the LSN isn't moving fast enough for your use case.

In order to minimize your issue, we should improve the logical decoding

mechanism.
No, the issue I'm facing comes from the fact that corner cases of failover
are not solved properly for logical replication. Timelines, partial
segments, archiving along with streaming, starting from arbitrary LSN
(within available WAL), rewind, named restore points, cascade replication
etc etc. All these nice things are there for WAL and are missing for LR.
I'm just trying to find shortest path through this to make CDC(changed data
capture) work.

Craig started a thread a few days ago [1] that described some of these

issues and possible solutions [2]https://wiki.postgresql.org/wiki/Logical_replication_and_physical_standby_failover. The lack of HA with logical replication
reduces the number of solutions that could possibly use this technology.
Some of the facilities such as logical replication slots and replication
origin on failover-candidate subscribers should encourage users to adopt
such solutions.

[1]: /messages/by-id/CAGRY4nx0-ZVnFJV5749QCqwmqBMkjQpcFkYY56a9U6Vf+f7-7Q@mail.gmail.com
/messages/by-id/CAGRY4nx0-ZVnFJV5749QCqwmqBMkjQpcFkYY56a9U6Vf+f7-7Q@mail.gmail.com
[2]: https://wiki.postgresql.org/wiki/Logical_replication_and_physical_standby_failover
https://wiki.postgresql.org/wiki/Logical_replication_and_physical_standby_failover

--
Euler Taveira http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#5Craig Ringer
Craig Ringer
craig.ringer@enterprisedb.com
In reply to: Andrey Borodin (#1)
Re: Logical archiving

Reply follows inline. I addressed your last point first, so it's out of
order.

On Fri, 4 Dec 2020 at 15:33, Andrey Borodin <x4mmm@yandex-team.ru> wrote

If OLAP cannot consume data fast enough - we are out of space due to repl

slot.

There is a much simpler solution to this than logical PITR.

What we should be doing is teaching xlogreader how to invoke the
restore_command to fetch archived WALs for decoding.

Replication slots already have a WAL retention limit, but right now when
that limit is reached the slot is invalidated and becomes useless, it's
effectively dropped. Instead, if WAL archiving is enabled, we should leave
the slot as valid. If a consumer of the slot needs WAL that no longer
exists in pg_wal, we should have the walsender invoke the restore_command
to read the missing WAL segment, decode it, and remove it again.

This would not be a technically difficult patch, and it's IMO one of the
more important ones for improving logical replication.

I was discussing problems of CDC with scientific community and they asked

this simple question: "So you have efficient WAL archive on a very cheap
storage, why don't you have a logical archive too?"

I've done work in this area, as has Petr (CC'd).

In short, logical archiving and PITR is very much desirable, but we're not
nearly ready for it yet and we're missing a lot of the foundations needed
to make it really useful.

IMO the strongest pre-requisite is that we need integrated DDL capture and
replication in Pg. While this could be implemented in the
publisher/subscriber logic for logical replication, it would make much more
sense (IMO) to make it easier to feed DDL events into any logical
replication output plugin.

pglogical3 (the closed one) has quite comprehensive DDL replication
support. Doing it is not simple though - there are plenty of complexities:

* Reliably identifying the target objects and mapping them to replication
set memberships for DML-replication
* Capturing, replicating and managing the search_path and other DDL
execution context (DateStyle and much more) reliably

- Each statement type needs specific logic to indicate whether it needs
DDL replication (and often filter functions since we have lots of sub-types
where some need replication and some don't)
- Handling DDL affecting global objects in pg_global correctly, like
those affecting roles, grants, database security labels etc. There's no one
right answer for this, it depends on the deployment and requires the user
to cooperate.
- Correct handling of transactions that mix DDL and DML (mostly only an
issue for multimaster).
- Identifying statements that target a mix of replicated and
non-replicated objects and handling them appropriately, including for
CASCADEs
- Gracefully handling DDL statements that mix TEMPORARY and persistent
targets. We can do this ok for DROPs but it still requires care. Anything
else gets messier.
- Lack of hooks into table rewrite operations and the extremely clumsy
and inefficient way logical decoding currently exposes decoding of the
temp-table data during decoding of rewrites means handling table-rewriting
DDL is difficult and impractical to do correctly. In pglogical we punt on
it entirely and refuse to permit DDL that would rewrite a table except
where we can prove it's reliant only on immutable inputs so we can discard
the upstream rewrite and rely on statement replication.
- As a consequence of the above, reliably determining whether a given
statement will cause a table rewrite.
- Handling re-entrant ProcessUtility_hook calls for ALTER TABLE etc.
- Handling TRUNCATE's pseudo-DDL pseudo-DML halfway state, doing
something sensible for truncate cascade.
- Probably more I've forgotten

If we don't handle these, then any logical change-log archives will become
largely useless as soon as there's any schema change.

So we kind of have to solve DDL replication first IMO.

Some consideration is also required for metadata management. Right now
relation and type metadata has session-lifetime, but you'd want to be able
to discard old logical change-stream archives and have the later ones still
be usable. So we'd need to define some kind of restartpoint where we repeat
the metadata, or we'd have to support externalizing the metadata so it can
be retained when the main change archives get aged out.

We'd also need to separate the existing apply worker into a "receiver" and
"apply/writer" part, so the wire-protocol handling isn't tightly coupled
with the actual change apply code, in order to make it possible to actually
consume those archives and apply them to the database. In pglogical3 we did
that by splitting them into two processes, connected by a shm_mq.
Originally the process split was optional and you could run a combined
receiver/writer process without the shm_mq if you wanted, but we quickly
found it difficult to reliably handle locking issues etc that way so the
writers all moved out-of-process.

That was done mainly to make it possible to support parallelism in logical
decoding apply. But we also have the intention of supporting an alternative
reader process that can ingest "logical archives" and send them to the
writer to apply them, as if they'd been received from the on-wire stream.
That's not implemented at this time though. It'd be useful for a number of
things:

* PITR-style logical replay and recovery
* Ability to pre-decode a txn once on the upstream then send the buffered
protocol-stream to multiple subscribers, saving on logical decoding and
reorder buffering overheads and write-multiplication costs
* ability to ingest change-streams generated by non-postgres sources so we
could support streaming foreign-data ingestion, streaming OLAP and data
warehousing, etc

To make logical PITR more useful we'd also want to be a bit more tolerant
of schema divergence, though that's not overly hard to do:

- fill defaults for downstream columns if no value is present for the
column in the upstream row and the downstream column is nullable or has a
default (I think built-in logical rep does this one already)
- ignore values for columns in upstream data if the downstream table
lacks the column and the upstream value is null
- optionally allow apply to be configured to ignore non-null data in
upstream columns that're missing on downstream
- optionally allow apply to be configured to drop rows on the floor if
the downstream table is missing
- policies for handling data conflicts like duplicate PKs

and we'd probably want ways to filter the apply data-stream to apply
changes for only a subset of tables, rows, etc at least in a later version.

None of this is insurmountable. Most or all of the DDL replication support
and divergence-tolerance stuff is already done in production deployments
using pglogical3 and bdr3.

While I can't share the code, I am happy to share the experience I have
gained from my part in working on these things. As you've probably recently
seen with the wiki article I wrote on physical/logical failover interop.

You're free to take information like this and use it in wiki articles too.

Right now I won't be able to launch into writing big patches for these
things, but I'll do my best to share what I can and review things.

This seems like a wild idea. But really, we have a super expensive NVMe

drives for OLTP workload. And use this devices to store buffer for data to
be dumped into MapReduce\YT analytical system.

It's not a wild idea at all, as noted above.

In pglogical3 we already support streaming decoded WAL data to alternative
writer downstreams including RabbitMQ and Kafka via writer plugins.

#6Craig Ringer
Craig Ringer
craig.ringer@enterprisedb.com
In reply to: Craig Ringer (#5)
Re: Logical archiving

Actually CC'd Petr this time.

On Mon, 7 Dec 2020 at 11:05, Craig Ringer <craig.ringer@enterprisedb.com>
wrote:

Show quoted text

Reply follows inline. I addressed your last point first, so it's out of
order.

On Fri, 4 Dec 2020 at 15:33, Andrey Borodin <x4mmm@yandex-team.ru> wrote

If OLAP cannot consume data fast enough - we are out of space due to

repl slot.

There is a much simpler solution to this than logical PITR.

What we should be doing is teaching xlogreader how to invoke the
restore_command to fetch archived WALs for decoding.

Replication slots already have a WAL retention limit, but right now when
that limit is reached the slot is invalidated and becomes useless, it's
effectively dropped. Instead, if WAL archiving is enabled, we should leave
the slot as valid. If a consumer of the slot needs WAL that no longer
exists in pg_wal, we should have the walsender invoke the restore_command
to read the missing WAL segment, decode it, and remove it again.

This would not be a technically difficult patch, and it's IMO one of the
more important ones for improving logical replication.

I was discussing problems of CDC with scientific community and they

asked this simple question: "So you have efficient WAL archive on a very
cheap storage, why don't you have a logical archive too?"

I've done work in this area, as has Petr (CC'd).

In short, logical archiving and PITR is very much desirable, but we're not
nearly ready for it yet and we're missing a lot of the foundations needed
to make it really useful.

IMO the strongest pre-requisite is that we need integrated DDL capture and
replication in Pg. While this could be implemented in the
publisher/subscriber logic for logical replication, it would make much more
sense (IMO) to make it easier to feed DDL events into any logical
replication output plugin.

pglogical3 (the closed one) has quite comprehensive DDL replication
support. Doing it is not simple though - there are plenty of complexities:

* Reliably identifying the target objects and mapping them to replication
set memberships for DML-replication
* Capturing, replicating and managing the search_path and other DDL
execution context (DateStyle and much more) reliably

- Each statement type needs specific logic to indicate whether it
needs DDL replication (and often filter functions since we have lots of
sub-types where some need replication and some don't)
- Handling DDL affecting global objects in pg_global correctly, like
those affecting roles, grants, database security labels etc. There's no one
right answer for this, it depends on the deployment and requires the user
to cooperate.
- Correct handling of transactions that mix DDL and DML (mostly only
an issue for multimaster).
- Identifying statements that target a mix of replicated and
non-replicated objects and handling them appropriately, including for
CASCADEs
- Gracefully handling DDL statements that mix TEMPORARY and persistent
targets. We can do this ok for DROPs but it still requires care. Anything
else gets messier.
- Lack of hooks into table rewrite operations and the extremely clumsy
and inefficient way logical decoding currently exposes decoding of the
temp-table data during decoding of rewrites means handling table-rewriting
DDL is difficult and impractical to do correctly. In pglogical we punt on
it entirely and refuse to permit DDL that would rewrite a table except
where we can prove it's reliant only on immutable inputs so we can discard
the upstream rewrite and rely on statement replication.
- As a consequence of the above, reliably determining whether a given
statement will cause a table rewrite.
- Handling re-entrant ProcessUtility_hook calls for ALTER TABLE etc.
- Handling TRUNCATE's pseudo-DDL pseudo-DML halfway state, doing
something sensible for truncate cascade.
- Probably more I've forgotten

If we don't handle these, then any logical change-log archives will become
largely useless as soon as there's any schema change.

So we kind of have to solve DDL replication first IMO.

Some consideration is also required for metadata management. Right now
relation and type metadata has session-lifetime, but you'd want to be able
to discard old logical change-stream archives and have the later ones still
be usable. So we'd need to define some kind of restartpoint where we repeat
the metadata, or we'd have to support externalizing the metadata so it can
be retained when the main change archives get aged out.

We'd also need to separate the existing apply worker into a "receiver" and
"apply/writer" part, so the wire-protocol handling isn't tightly coupled
with the actual change apply code, in order to make it possible to actually
consume those archives and apply them to the database. In pglogical3 we did
that by splitting them into two processes, connected by a shm_mq.
Originally the process split was optional and you could run a combined
receiver/writer process without the shm_mq if you wanted, but we quickly
found it difficult to reliably handle locking issues etc that way so the
writers all moved out-of-process.

That was done mainly to make it possible to support parallelism in logical
decoding apply. But we also have the intention of supporting an alternative
reader process that can ingest "logical archives" and send them to the
writer to apply them, as if they'd been received from the on-wire stream.
That's not implemented at this time though. It'd be useful for a number of
things:

* PITR-style logical replay and recovery
* Ability to pre-decode a txn once on the upstream then send the buffered
protocol-stream to multiple subscribers, saving on logical decoding and
reorder buffering overheads and write-multiplication costs
* ability to ingest change-streams generated by non-postgres sources so we
could support streaming foreign-data ingestion, streaming OLAP and data
warehousing, etc

To make logical PITR more useful we'd also want to be a bit more tolerant
of schema divergence, though that's not overly hard to do:

- fill defaults for downstream columns if no value is present for the
column in the upstream row and the downstream column is nullable or has a
default (I think built-in logical rep does this one already)
- ignore values for columns in upstream data if the downstream table
lacks the column and the upstream value is null
- optionally allow apply to be configured to ignore non-null data in
upstream columns that're missing on downstream
- optionally allow apply to be configured to drop rows on the floor if
the downstream table is missing
- policies for handling data conflicts like duplicate PKs

and we'd probably want ways to filter the apply data-stream to apply
changes for only a subset of tables, rows, etc at least in a later version.

None of this is insurmountable. Most or all of the DDL replication support
and divergence-tolerance stuff is already done in production deployments
using pglogical3 and bdr3.

While I can't share the code, I am happy to share the experience I have
gained from my part in working on these things. As you've probably recently
seen with the wiki article I wrote on physical/logical failover interop.

You're free to take information like this and use it in wiki articles too.

Right now I won't be able to launch into writing big patches for these
things, but I'll do my best to share what I can and review things.

This seems like a wild idea. But really, we have a super expensive NVMe

drives for OLTP workload. And use this devices to store buffer for data to
be dumped into MapReduce\YT analytical system.

It's not a wild idea at all, as noted above.

In pglogical3 we already support streaming decoded WAL data to alternative
writer downstreams including RabbitMQ and Kafka via writer plugins.

#7Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Craig Ringer (#5)
Re: Logical archiving

On Mon, Dec 7, 2020 at 8:35 AM Craig Ringer
<craig.ringer@enterprisedb.com> wrote:

Reply follows inline. I addressed your last point first, so it's out of order.

On Fri, 4 Dec 2020 at 15:33, Andrey Borodin <x4mmm@yandex-team.ru> wrote

We'd also need to separate the existing apply worker into a "receiver" and "apply/writer" part, so the wire-protocol handling isn't tightly coupled with the actual change apply code, in order to make it possible to actually consume those archives and apply them to the database. In pglogical3 we did that by splitting them into two processes, connected by a shm_mq. Originally the process split was optional and you could run a combined receiver/writer process without the shm_mq if you wanted, but we quickly found it difficult to reliably handle locking issues etc that way so the writers all moved out-of-process.

That was done mainly to make it possible to support parallelism in logical decoding apply. But we also have the intention of supporting an alternative reader process that can ingest "logical archives" and send them to the writer to apply them, as if they'd been received from the on-wire stream. That's not implemented at this time though. It'd be useful for a number of things:

* PITR-style logical replay and recovery
* Ability to pre-decode a txn once on the upstream then send the buffered protocol-stream to multiple subscribers, saving on logical decoding and reorder buffering overheads and write-multiplication costs

I think doing parallel apply and ability to decode a txn once are
really good improvements independent of all the work you listed.
Thanks for sharing your knowledge.

--
With Regards,
Amit Kapila.

#8Andrey Borodin
Andrey Borodin
x4mmm@yandex-team.ru
In reply to: Craig Ringer (#5)
Re: Logical archiving

Thanks Craig!
Probably, I should better ask in your nearby thread about logical replication, it just seemed to me that logical archiving is somewhat small independent piece of functionality...

7 дек. 2020 г., в 08:05, Craig Ringer <craig.ringer@enterprisedb.com> написал(а):

Reply follows inline. I addressed your last point first, so it's out of order.

On Fri, 4 Dec 2020 at 15:33, Andrey Borodin <x4mmm@yandex-team.ru> wrote

If OLAP cannot consume data fast enough - we are out of space due to repl slot.

There is a much simpler solution to this than logical PITR.

What we should be doing is teaching xlogreader how to invoke the restore_command to fetch archived WALs for decoding.

Replication slots already have a WAL retention limit, but right now when that limit is reached the slot is invalidated and becomes useless, it's effectively dropped. Instead, if WAL archiving is enabled, we should leave the slot as valid. If a consumer of the slot needs WAL that no longer exists in pg_wal, we should have the walsender invoke the restore_command to read the missing WAL segment, decode it, and remove it again.

This would not be a technically difficult patch, and it's IMO one of the more important ones for improving logical replication.

Currently we have restore_command in regular config, not in recovery.conf, so, probably, it should not be a very big deal to implement this.

I was discussing problems of CDC with scientific community and they asked this simple question: "So you have efficient WAL archive on a very cheap storage, why don't you have a logical archive too?"

I've done work in this area, as has Petr (CC'd).

In short, logical archiving and PITR is very much desirable, but we're not nearly ready for it yet and we're missing a lot of the foundations needed to make it really useful.

IMO the strongest pre-requisite is that we need integrated DDL capture and replication in Pg. While this could be implemented in the publisher/subscriber logic for logical replication, it would make much more sense (IMO) to make it easier to feed DDL events into any logical replication output plugin.

pglogical3 (the closed one) has quite comprehensive DDL replication support. Doing it is not simple though - there are plenty of complexities:

* Reliably identifying the target objects and mapping them to replication set memberships for DML-replication
* Capturing, replicating and managing the search_path and other DDL execution context (DateStyle and much more) reliably
• Each statement type needs specific logic to indicate whether it needs DDL replication (and often filter functions since we have lots of sub-types where some need replication and some don't)
• Handling DDL affecting global objects in pg_global correctly, like those affecting roles, grants, database security labels etc. There's no one right answer for this, it depends on the deployment and requires the user to cooperate.
• Correct handling of transactions that mix DDL and DML (mostly only an issue for multimaster).
• Identifying statements that target a mix of replicated and non-replicated objects and handling them appropriately, including for CASCADEs
• Gracefully handling DDL statements that mix TEMPORARY and persistent targets. We can do this ok for DROPs but it still requires care. Anything else gets messier.
• Lack of hooks into table rewrite operations and the extremely clumsy and inefficient way logical decoding currently exposes decoding of the temp-table data during decoding of rewrites means handling table-rewriting DDL is difficult and impractical to do correctly. In pglogical we punt on it entirely and refuse to permit DDL that would rewrite a table except where we can prove it's reliant only on immutable inputs so we can discard the upstream rewrite and rely on statement replication.
• As a consequence of the above, reliably determining whether a given statement will cause a table rewrite.
• Handling re-entrant ProcessUtility_hook calls for ALTER TABLE etc.
• Handling TRUNCATE's pseudo-DDL pseudo-DML halfway state, doing something sensible for truncate cascade.
• Probably more I've forgotten

If we don't handle these, then any logical change-log archives will become largely useless as soon as there's any schema change.

So we kind of have to solve DDL replication first IMO.

Some consideration is also required for metadata management. Right now relation and type metadata has session-lifetime, but you'd want to be able to discard old logical change-stream archives and have the later ones still be usable. So we'd need to define some kind of restartpoint where we repeat the metadata, or we'd have to support externalizing the metadata so it can be retained when the main change archives get aged out.

We'd also need to separate the existing apply worker into a "receiver" and "apply/writer" part, so the wire-protocol handling isn't tightly coupled with the actual change apply code, in order to make it possible to actually consume those archives and apply them to the database. In pglogical3 we did that by splitting them into two processes, connected by a shm_mq. Originally the process split was optional and you could run a combined receiver/writer process without the shm_mq if you wanted, but we quickly found it difficult to reliably handle locking issues etc that way so the writers all moved out-of-process.

That was done mainly to make it possible to support parallelism in logical decoding apply. But we also have the intention of supporting an alternative reader process that can ingest "logical archives" and send them to the writer to apply them, as if they'd been received from the on-wire stream. That's not implemented at this time though. It'd be useful for a number of things:

* PITR-style logical replay and recovery
* Ability to pre-decode a txn once on the upstream then send the buffered protocol-stream to multiple subscribers, saving on logical decoding and reorder buffering overheads and write-multiplication costs
* ability to ingest change-streams generated by non-postgres sources so we could support streaming foreign-data ingestion, streaming OLAP and data warehousing, etc

To make logical PITR more useful we'd also want to be a bit more tolerant of schema divergence, though that's not overly hard to do:
• fill defaults for downstream columns if no value is present for the column in the upstream row and the downstream column is nullable or has a default (I think built-in logical rep does this one already)
• ignore values for columns in upstream data if the downstream table lacks the column and the upstream value is null
• optionally allow apply to be configured to ignore non-null data in upstream columns that're missing on downstream
• optionally allow apply to be configured to drop rows on the floor if the downstream table is missing
• policies for handling data conflicts like duplicate PKs
and we'd probably want ways to filter the apply data-stream to apply changes for only a subset of tables, rows, etc at least in a later version.

None of this is insurmountable. Most or all of the DDL replication support and divergence-tolerance stuff is already done in production deployments using pglogical3 and bdr3.

I really like this wording for "divergence-tolerance" stuff, it captures problems I want to solve. I believe it's somewhat orthogonal to other issues.

While I can't share the code, I am happy to share the experience I have gained from my part in working on these things. As you've probably recently seen with the wiki article I wrote on physical/logical failover interop.

You're free to take information like this and use it in wiki articles too.

Right now I won't be able to launch into writing big patches for these things, but I'll do my best to share what I can and review things.

This seems like a wild idea. But really, we have a super expensive NVMe drives for OLTP workload. And use this devices to store buffer for data to be dumped into MapReduce\YT analytical system.

It's not a wild idea at all, as noted above.

In pglogical3 we already support streaming decoded WAL data to alternative writer downstreams including RabbitMQ and Kafka via writer plugins.

Yes, Yandex.Cloud Transfer Manger supports it too. But it has to be resynced after physical failover. And internal installation of YC have mandatory drills: few times in a month one datacenter is disconnected and failover happens for thousands a DBS.

Thank you for your input. Probably, I'll put some efforts into loading missing WAL as a first step towards bright future :)

Best regards, Andrey Borodin.

#9Craig Ringer
Craig Ringer
craig.ringer@enterprisedb.com
In reply to: Andrey Borodin (#8)
Re: Logical archiving

On Tue, 8 Dec 2020 at 17:54, Andrey Borodin <x4mmm@yandex-team.ru> wrote:

In pglogical3 we already support streaming decoded WAL data to

alternative writer downstreams including RabbitMQ and Kafka via writer
plugins.
Yes, Yandex.Cloud Transfer Manger supports it too. But it has to be
resynced after physical failover. And internal installation of YC have
mandatory drills: few times in a month one datacenter is disconnected and
failover happens for thousands a DBS.

You'll want to look at
https://wiki.postgresql.org/wiki/Logical_replication_and_physical_standby_failover#All-logical-replication_HA
then.