Built-in plugin for logical decoding output

Started by Gregory Brailover 8 years ago58 messageshackers
Jump to latest
#1Gregory Brail
gregbrail@google.com

We have been working on a project that makes extensive use of logical
replication for use inside Apigee (which is a very small part of Google):

https://github.com/apigee-labs/transicator

In order to do this, we had to write our own logical replication plugin
because the supplied "test_decoding" plugin from the "contrib" directory
was hard for us to work with. Primarily:

1) It doesn't include all the fields that we need for Transicator (most
importantly we need the LSN and the 64-bit transaction ID).
2) It outputs a text format that is hard to parse.

I imagine that other users of logical decoding are facing similar problems.

Would the community support the development of another plugin that is
distributed as part of "contrib" that addresses these issues? I'd be happy
to submit a patch, or GitHub repo, or whatever works best as an example.
(Also, although Transicator uses protobuf, I'm happy to have it output a
simple binary format as well.)

As a side note, doing this would also help making logical decoding a useful
feature for customers of Amazon and Google's built-in Postgres hosting
options. In those environments, there's no way to add a custom plugin to
Postgres, so anything not built in the product tends to be harder for
someone to consume.

If anyone is interested in looking more:

The plugin code is here:
https://github.com/apigee-labs/transicator/tree/master/pgoutput

and produces output defined by the "ChangePb" structure defined here:
https://github.com/apigee-labs/transicator/blob/master/common/transicator.proto

#2Alvaro Hernandez
aht@ongres.com
In reply to: Gregory Brail (#1)
Re: Built-in plugin for logical decoding output

On 23/09/17 00:28, Gregory Brail wrote:

We have been working on a project that makes extensive use of logical
replication for use inside Apigee (which is a very small part of Google):

https://github.com/apigee-labs/transicator

In order to do this, we had to write our own logical replication
plugin because the supplied "test_decoding" plugin from the "contrib"
directory was hard for us to work with. Primarily:

1) It doesn't include all the fields that we need for Transicator
(most importantly we need the LSN and the 64-bit transaction ID).
2) It outputs a text format that is hard to parse.

I imagine that other users of logical decoding are facing similar
problems.

Would the community support the development of another plugin that is
distributed as part of "contrib" that addresses these issues? I'd be
happy to submit a patch, or GitHub repo, or whatever works best as an
example. (Also, although Transicator uses protobuf, I'm happy to have
it output a simple binary format as well.)

As a side note, doing this would also help making logical decoding a
useful feature for customers of Amazon and Google's built-in Postgres
hosting options. In those environments, there's no way to add a custom
plugin to Postgres, so anything not built in the product tends to be
harder for someone to consume.

If anyone is interested in looking more:

The plugin code is here:
https://github.com/apigee-labs/transicator/tree/master/pgoutput

and produces output defined by the "ChangePb" structure defined here:
https://github.com/apigee-labs/transicator/blob/master/common/transicator.proto

    How about using pgoutput, which is included by default in
PostgreSQL 10, as the basis for logical replication?

    Cheers,

    Álvaro

--

Alvaro Hernandez

-----------
OnGres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Gregory Brail
gregbrail@google.com
In reply to: Alvaro Hernandez (#2)
Re: Built-in plugin for logical decoding output

Thanks! I didn't read the version 10 RC docs carefully enough.

I found the code that generates the protocol message (looks like its
replication/proto/logical.c). Are there docs somewhere on the format, or is
it just the code?

Also in lieu of the new snapshot mechanism for logical replication, which
might not work for us, we were using the transaction ID to calculate what
was committed in a client's snapshot and what they need to apply to their
own local database. That relied on the transaction ID, and we wanted to use
a 64-bit ID so that we could handle rollover. We ended up doing this:

https://github.com/apigee-labs/transicator/blob/2d5dc596a5f2f5e13967e0f1943f248d88eac1e7/pgoutput/transicator_output.c#L151

It looks to me like the new stuff only puts a 32-bit "xid" in there. Would
there be a way to include the epoch as well? (And yes, I realize it might
require a more detailed explanation which I'm happy to put together.)

On Fri, Sep 22, 2017 at 4:01 PM, Alvaro Hernandez <aht@ongres.com> wrote:

Show quoted text

On 23/09/17 00:28, Gregory Brail wrote:

We have been working on a project that makes extensive use of logical
replication for use inside Apigee (which is a very small part of Google):

https://github.com/apigee-labs/transicator

In order to do this, we had to write our own logical replication plugin
because the supplied "test_decoding" plugin from the "contrib" directory
was hard for us to work with. Primarily:

1) It doesn't include all the fields that we need for Transicator (most
importantly we need the LSN and the 64-bit transaction ID).
2) It outputs a text format that is hard to parse.

I imagine that other users of logical decoding are facing similar
problems.

Would the community support the development of another plugin that is
distributed as part of "contrib" that addresses these issues? I'd be happy
to submit a patch, or GitHub repo, or whatever works best as an example.
(Also, although Transicator uses protobuf, I'm happy to have it output a
simple binary format as well.)

As a side note, doing this would also help making logical decoding a
useful feature for customers of Amazon and Google's built-in Postgres
hosting options. In those environments, there's no way to add a custom
plugin to Postgres, so anything not built in the product tends to be harder
for someone to consume.

If anyone is interested in looking more:

The plugin code is here:
https://github.com/apigee-labs/transicator/tree/master/pgoutput

and produces output defined by the "ChangePb" structure defined here:
https://github.com/apigee-labs/transicator/blob/master/commo
n/transicator.proto

How about using pgoutput, which is included by default in PostgreSQL
10, as the basis for logical replication?

Cheers,

Álvaro

--

Alvaro Hernandez

-----------
OnGres

#4Andres Freund
andres@anarazel.de
In reply to: Gregory Brail (#3)
Re: Built-in plugin for logical decoding output

Hi,

On 2017-09-22 17:11:47 -0700, Gregory Brail wrote:

Also in lieu of the new snapshot mechanism for logical replication, which
might not work for us

This needs context...

, we were using the transaction ID to calculate what
was committed in a client's snapshot and what they need to apply to their
own local database. That relied on the transaction ID, and we wanted to use
a 64-bit ID so that we could handle rollover. We ended up doing this:

https://github.com/apigee-labs/transicator/blob/2d5dc596a5f2f5e13967e0f1943f248d88eac1e7/pgoutput/transicator_output.c#L151

It looks to me like the new stuff only puts a 32-bit "xid" in there. Would
there be a way to include the epoch as well? (And yes, I realize it might
require a more detailed explanation which I'm happy to put together.)

It'd be good to see some more detail here, indeed. Especially if you
could look at what pgoutput provides, and whether that's sufficient.
I'm not entirely sure how much we want to make pgoutput configurable, in
contrast to adding something that's intended to be very configurable at
the price of some performance and bandwidth...

Regards,

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Gregory Brail (#1)
Re: Built-in plugin for logical decoding output

2017-09-22 19:28 GMT-03:00 Gregory Brail <gregbrail@google.com>:

We have been working on a project that makes extensive use of logical
replication for use inside Apigee (which is a very small part of Google):

https://github.com/apigee-labs/transicator

In order to do this, we had to write our own logical replication plugin
because the supplied "test_decoding" plugin from the "contrib" directory was
hard for us to work with. Primarily:

test_decoding is a proof of concept to illustrate the logical decoding
potential. It is not intended for production. I developed wal2json [1]https://github.com/eulerto/wal2json
for general use. It outputs changes in JSON. It was one of the first
logical decoding plugins.

1) It doesn't include all the fields that we need for Transicator (most
importantly we need the LSN and the 64-bit transaction ID).

wal2json includes both.

2) It outputs a text format that is hard to parse.

There are a lot of JSON parsers.

I imagine that other users of logical decoding are facing similar problems.

Would the community support the development of another plugin that is
distributed as part of "contrib" that addresses these issues? I'd be happy
to submit a patch, or GitHub repo, or whatever works best as an example.
(Also, although Transicator uses protobuf, I'm happy to have it output a
simple binary format as well.)

There was a prior discussion and it was suggestted that we have a
ready-for-production plugin in core (besides pgoutput). It was
suggested [1]https://github.com/eulerto/wal2json that I submit wal2json for 11. I'm in process to clean
up the code and hope to submit it to CF2.

[1]: https://github.com/eulerto/wal2json
[2]: /messages/by-id/CAHE3wggh6ucSCB+hnSK04xEQx75f3DTz0wPSjRMJFjum6pRrPQ@mail.gmail.com

--
Euler Taveira Timbira -
http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Alvaro Hernandez
aht@ongres.com
In reply to: Euler Taveira de Oliveira (#5)
Re: Built-in plugin for logical decoding output

On 23/09/17 18:42, Euler Taveira wrote:

2017-09-22 19:28 GMT-03:00 Gregory Brail <gregbrail@google.com>:

We have been working on a project that makes extensive use of logical
replication for use inside Apigee (which is a very small part of Google):

https://github.com/apigee-labs/transicator

In order to do this, we had to write our own logical replication plugin
because the supplied "test_decoding" plugin from the "contrib" directory was
hard for us to work with. Primarily:

test_decoding is a proof of concept to illustrate the logical decoding
potential. It is not intended for production.

    However, AFAIK, AWS's DMS uses it for production purposes (see
http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_PostgreSQL.html).

I developed wal2json [1]
for general use. It outputs changes in JSON. It was one of the first
logical decoding plugins.

1) It doesn't include all the fields that we need for Transicator (most
importantly we need the LSN and the 64-bit transaction ID).

wal2json includes both.

2) It outputs a text format that is hard to parse.

There are a lot of JSON parsers.

I imagine that other users of logical decoding are facing similar problems.

Would the community support the development of another plugin that is
distributed as part of "contrib" that addresses these issues? I'd be happy
to submit a patch, or GitHub repo, or whatever works best as an example.
(Also, although Transicator uses protobuf, I'm happy to have it output a
simple binary format as well.)

There was a prior discussion and it was suggestted that we have a
ready-for-production plugin in core (besides pgoutput). It was
suggested [1] that I submit wal2json for 11. I'm in process to clean
up the code and hope to submit it to CF2.

    I would be happy to see another logical decoding plugin into core
starting on 11. However, this also poses a bit of a challenge for
middleware implementors: you need to support one for 9.4-9.5
(test_decoding), another for 10 (pgoutput) and maybe another for 11
onwards. The idea of asking users to install a binary plugin is very
unsexy, so these are the options available.

    However, having said that, and while json is a great output format
for interoperability, if there's a discussion on which plugin to include
next, I'd also favor one that has some more compact representation
format (or that supports several formats, not only json).

    Regards,

    Álvaro

--

Alvaro Hernandez

-----------
OnGres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Alvaro Hernandez (#6)
Re: Built-in plugin for logical decoding output

2017-09-23 14:01 GMT-03:00 Alvaro Hernandez <aht@ongres.com>:

However, AFAIK, AWS's DMS uses it for production purposes (see
http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_PostgreSQL.html).

It seems a bad idea. AFAICS test_decoding was not designed to be a
ready-for-production plugin. It is just a proof of concept for logical
decoding.

I would be happy to see another logical decoding plugin into core
starting on 11. However, this also poses a bit of a challenge for middleware
implementors: you need to support one for 9.4-9.5 (test_decoding), another
for 10 (pgoutput) and maybe another for 11 onwards. The idea of asking users
to install a binary plugin is very unsexy, so these are the options
available.

wal2json works for 9.4+ (besides the WAL messages I committed a month
ago). Since this boat was already shipped we can arrange some packages
for 9.4-10 (an external project) and ask vendors to support the
backward-compatible plugin. The middleware implementor will have to
support this new plugin format. Being JSON a widespread format, it is
easier to refactor the code to parse JSON.

However, having said that, and while json is a great output format for
interoperability, if there's a discussion on which plugin to include next,
I'd also favor one that has some more compact representation format (or that
supports several formats, not only json).

We could certainly extend pgoutput to support more than one format
(like pglogical did AFAIR), however, we wouldn't reuse code (different
formats) and will have a fat plugin (I don't foresee a plugin using
different formats in the same connection. It is difficult to
coordinate a change like that having only one-way communication).

--
Euler Taveira Timbira -
http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Alvaro Hernandez
aht@ongres.com
In reply to: Euler Taveira de Oliveira (#7)
Re: Built-in plugin for logical decoding output

On 24/09/17 02:41, Euler Taveira wrote:

2017-09-23 14:01 GMT-03:00 Alvaro Hernandez <aht@ongres.com>:

However, AFAIK, AWS's DMS uses it for production purposes (see
http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_PostgreSQL.html).

It seems a bad idea. AFAICS test_decoding was not designed to be a
ready-for-production plugin. It is just a proof of concept for logical
decoding.

    Yes, this is what I heard and read.

    However, if DMS uses it for what I'd call production use, I assume
it is actually production quality. I bet they do enough testing, and
don't ship software to potentially millions of customers if it doesn't
work well. So... first, I'd consider this a a sign of robustness.
Second..... my hats off for the plugin code ;)

I would be happy to see another logical decoding plugin into core
starting on 11. However, this also poses a bit of a challenge for middleware
implementors: you need to support one for 9.4-9.5 (test_decoding), another
for 10 (pgoutput) and maybe another for 11 onwards. The idea of asking users
to install a binary plugin is very unsexy, so these are the options
available.

wal2json works for 9.4+ (besides the WAL messages I committed a month
ago). Since this boat was already shipped we can arrange some packages
for 9.4-10 (an external project) and ask vendors to support the
backward-compatible plugin. The middleware implementor will have to
support this new plugin format. Being JSON a widespread format, it is
easier to refactor the code to parse JSON.

    I agree its far better to parse JSON than the test_decoding output.
But asking any potential user to install a dynamic library, from a third
party website, which will need to be compiled for many potential
OSes/Archs, or even impossible if running on a managed environment... is
not a great experience. Unless PostgreSQL would backport a plugin and
ship it in newer releases, if test_decoding is good enough, I'd rather
stick to it.

However, having said that, and while json is a great output format for
interoperability, if there's a discussion on which plugin to include next,
I'd also favor one that has some more compact representation format (or that
supports several formats, not only json).

We could certainly extend pgoutput to support more than one format
(like pglogical did AFAIR), however, we wouldn't reuse code (different
formats) and will have a fat plugin (I don't foresee a plugin using
different formats in the same connection. It is difficult to
coordinate a change like that having only one-way communication).

    I think pgoutput is what it is. Maybe instead than growing it, my
+1 would be to add a new plugin that rather than being json only, would
also support other formats, like an efficient binary serialization.

    Álvaro

--

Alvaro Hernandez

-----------
OnGres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Craig Ringer
craig@2ndquadrant.com
In reply to: Gregory Brail (#1)
Re: Built-in plugin for logical decoding output

On 23 September 2017 at 06:28, Gregory Brail <gregbrail@google.com> wrote:

Would the community support the development of another plugin that is
distributed as part of "contrib" that addresses these issues?

Petr Jelinek and I tried just that with pglogical. Our submission was
knocked back with the complaint that there was no in-core user of the code,
and it couldn't be evaluated usefully without an in-core consumer/receiver.

It's possible we'd make more progress if we tried again now, since we could
probably write a test suite using the TAP test framework and a small
src/test/modules consumer. But now we'd probably instead get blocked with
the complaint that the output plugin used for logical replication should be
sufficient for any reasonable need. I anticipate that we'd have some
disagreements about what a reasonable need is, but ... *shrug*.

I personally think we _should_ have such a thing, and that it should be
separate to the logical replication plugin to allow us to evolve that
without worrying about out of core dependencies etc.

There's some common functionality that needs factoring out into the logical
decoding framework, like some sort of relation metadata cache, some concept
of "replication sets" or a set of tables to include/exclude, etc. Doing
that is non-trivial work, but it's unlikely that two plugins with similar
and overlapping implementations of such things would be accepted; in that
case I'd be firmly in the "no" camp too.

Code in Pg has a cost, and we do have to justify that cost when we drop
things in contrib/. It's not a free slush pile. So a solid argument does
need to be made for why having this module living in github/whatever isn't
good enough.

I'd be happy to submit a patch, or GitHub repo, or whatever works best as

an example. (Also, although Transicator uses protobuf, I'm happy to have it
output a simple binary format as well.)

PostgreSQL tends to be very, very conservative about dependencies and
favours (not-)-invented-here rather heavily. Optional dependencies are
accepted sometimes when they can be neatly isolated to one portion of the
codebase and/or abstracted away, so it's not impossible you'd get
acceptance for something like protocol buffers. But there's pretty much
zero chance you'll get it as a hard dependency, you'll need a simple text
and binary protocol too.

At which point the question will arise, why aren't these 3 separate output
plugins? The text one, the binary one for in-core and the protobuf one to
be maintained out of core.

That's a pretty sensible question. The answer is that they'll all need to
share quite a bit of common infrastructure. But if that's infrastructure
all plugins need, shouldn't it be pushed "up" into the logical decoding
layer's supporting framework? Patches welcome for the next major release
cycle.

Thus, that's where I think you should actually start. Extract (and where
necessary generalize) key parts of your code that should be provided by
postgres its self, not implemented by each plugin. And submit it so all
plugins can share it and yours can be simpler. Eventually to the point
where output plugins are often simple format wrappers.

You might want to look at

* pglogical's output plugin; and
* bottled-water

for ideas about things that would benefit from shared infrastructure, and
ways to generalize it. I will be very happy to help there as time permits.

As a side note, doing this would also help making logical decoding a
useful feature for customers of Amazon and Google's built-in Postgres
hosting options.

Colour me totally unconvinced there. Either, or both, can simply bless
out-of-tree plugins as it is; after all, they can and do patch the core
server freely too.

It'd *help* encourage them both to pick the same plugin, but that's about
it. And only if the plugin could satisfy their various constraints about no
true superuser access, etc.

I guess I'm a bit frustrated, because *I tried this*, and where was anyone
from Google or Amazon then? But now there's a new home-invented plugin that
we should adopt, ignoring any of the existing ones. Why?

https://github.com/apigee-labs/transicator/tree/master/pgoutput

No README?

Why did this need to be invented, rather than using an existing plugin?

I don't mind, I mean, it's great that you're using the plugin
infrastructure and using postgres. I'm just curious what bottled-water,
pglogical, etc lacked, what made you go your own way?

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#10Craig Ringer
craig@2ndquadrant.com
In reply to: Euler Taveira de Oliveira (#7)
Re: Built-in plugin for logical decoding output

On 24 September 2017 at 07:41, Euler Taveira <euler@timbira.com.br> wrote:

It is difficult to
coordinate a change like that having only one-way communication).
<http://www.postgresql.org/mailpref/pgsql-hackers&gt;

I really think we need to fix that at some point, such that:

* Downstream connections can send CopyData messages *up* the COPY BOTH
protocol, where they're passed to a hook on the output plugin; and

* Output plugins can hook the walsender's event loop (latch set, etc) and
send their own messages without being driven by a logical decoding event .

I wanted to do that some time ago but ran into some issues and time
constraints. Because of the need to support older versions I'm now
committed to an approach using direct libpq connections and function calls
instead, but it seems like a real shame to do that when the replication
protocol connection is *right there*...

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#11Simon Riggs
simon@2ndQuadrant.com
In reply to: Craig Ringer (#9)
Re: Built-in plugin for logical decoding output

On 24 September 2017 at 15:15, Craig Ringer <craig@2ndquadrant.com> wrote:

On 23 September 2017 at 06:28, Gregory Brail <gregbrail@google.com> wrote:

Would the community support the development of another plugin that is
distributed as part of "contrib" that addresses these issues?

Petr Jelinek and I tried just that with pglogical. Our submission was
knocked back with the complaint that there was no in-core user of the code,
and it couldn't be evaluated usefully without an in-core consumer/receiver.

It's possible we'd make more progress if we tried again now, since we could
probably write a test suite using the TAP test framework and a small
src/test/modules consumer. But now we'd probably instead get blocked with
the complaint that the output plugin used for logical replication should be
sufficient for any reasonable need. I anticipate that we'd have some
disagreements about what a reasonable need is, but ... *shrug*.

I personally think we _should_ have such a thing, and that it should be
separate to the logical replication plugin to allow us to evolve that
without worrying about out of core dependencies etc.

We plan to submit the next evolution of the code in 2018, in time for
the early cycle of PG12.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Petr Jelinek
petr@2ndquadrant.com
In reply to: Alvaro Hernandez (#6)
Re: Built-in plugin for logical decoding output

On 23/09/17 19:01, Alvaro Hernandez wro> On 23/09/17 18:42, Euler
Taveira wrote:

2017-09-22 19:28 GMT-03:00 Gregory Brail <gregbrail@google.com>:

We have been working on a project that makes extensive use of logical
replication for use inside Apigee (which is a very small part of
Google):

https://github.com/apigee-labs/transicator

In order to do this, we had to write our own logical replication plugin
because the supplied "test_decoding" plugin from the "contrib"
directory was
hard for us to work with. Primarily:

test_decoding is a proof of concept to illustrate the logical decoding
potential. It is not intended for production.

    However, AFAIK, AWS's DMS uses it for production purposes (see
http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_PostgreSQL.html).

I developed wal2json [1]
for general use. It outputs changes in JSON. It was one of the first
logical decoding plugins.

1) It doesn't include all the fields that we need for Transicator (most
importantly we need the LSN and the 64-bit transaction ID).

wal2json includes both.

2) It outputs a text format that is hard to parse.

There are a lot of JSON parsers.

I imagine that other users of logical decoding are facing similar
problems.

Would the community support the development of another plugin that is
distributed as part of "contrib" that addresses these issues? I'd be
happy
to submit a patch, or GitHub repo, or whatever works best as an example.
(Also, although Transicator uses protobuf, I'm happy to have it output a
simple binary format as well.)

There was a prior discussion and it was suggestted that we have a
ready-for-production plugin in core (besides pgoutput). It was
suggested [1] that I submit wal2json for 11. I'm in process to clean
up the code and hope to submit it to CF2.

Thanks, I'll be happy to review that.

    I would be happy to see another logical decoding plugin into core
starting on 11. However, this also poses a bit of a challenge for
middleware implementors: you need to support one for 9.4-9.5
(test_decoding), another for 10 (pgoutput) and maybe another for 11
onwards. The idea of asking users to install a binary plugin is very
unsexy, so these are the options available.

Well, test_decoding is not meant for production use anyway, no need for
middleware to support it. The pgoutput is primarily used for internal
replication purposes, which is why we need something with more
interoperability in mind in the first place. The new plugin should still
support publications etc though IMHO.

    However, having said that, and while json is a great output format
for interoperability, if there's a discussion on which plugin to include
next, I'd also favor one that has some more compact representation
format (or that supports several formats, not only json).

JSON is indeed great for interoperability, if you want more compact
format, use either pgoutput or write something of your own or do
conversion to something else in your consumer. I don't think postgres
needs to provide 100 different formats out of the box when there is an
API. The JSON output does not have to be extremely chatty either btw.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Alvaro Hernandez
aht@ongres.com
In reply to: Petr Jelinek (#12)
Re: Built-in plugin for logical decoding output

On 25/09/17 19:39, Petr Jelinek wrote:

Well, test_decoding is not meant for production use anyway, no need for
middleware to support it. The pgoutput is primarily used for internal
replication purposes, which is why we need something with more
interoperability in mind in the first place. The new plugin should still
support publications etc though IMHO.

    However, having said that, and while json is a great output format
for interoperability, if there's a discussion on which plugin to include
next, I'd also favor one that has some more compact representation
format (or that supports several formats, not only json).

JSON is indeed great for interoperability, if you want more compact
format, use either pgoutput or write something of your own or do
conversion to something else in your consumer. I don't think postgres
needs to provide 100 different formats out of the box when there is an
API. The JSON output does not have to be extremely chatty either btw.

    In my opinion, logical decoding plugins that don't come with core
are close to worthless (don't get me wrong):

- They very unlikely will be installed in managed environments (an area
growing significantly).
- As anything that is not in core, raises concerns by users.
- Distribution and testing are non-trivial: many OS/archs combinations.

    Given the above, I believe having a general-purpose output plugin
in-core is critical to the use of logical decoding. As for 9.4-9.6 there
is test_decoding, and given that AWS uses it for production, that's kind
of fine. For 10 there is at least pgoutput, which could be used (even
though it was meant for replication). But if a new plugin is to be
developed for 11+, one really general purpose one, I'd say json is not a
good choice if it is the only output it would support. json is too
verbose, and replication, if anything, needs performance (it is both
network heavy and serialization/deserialization is quite expensive). Why
not, if one and only one plugin would be developed for 11+, general
purpose, do something that is, indeed, more general, i.e., that supports
high-performance scenarios too?

    Álvaro

--

Alvaro Hernandez

-----------
OnGres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Andrew Dunstan
andrew@dunslane.net
In reply to: Alvaro Hernandez (#13)
Re: Built-in plugin for logical decoding output

On 09/25/2017 12:48 PM, Alvaro Hernandez wrote:

On 25/09/17 19:39, Petr Jelinek wrote:

Well, test_decoding is not meant for production use anyway, no need for
middleware to support it. The pgoutput is primarily used for internal
replication purposes, which is why we need something with more
interoperability in mind in the first place. The new plugin should still
support publications etc though IMHO.

     However, having said that, and while json is a great output format
for interoperability, if there's a discussion on which plugin to
include
next, I'd also favor one that has some more compact representation
format (or that supports several formats, not only json).

JSON is indeed great for interoperability, if you want more compact
format, use either pgoutput or write something of your own or do
conversion to something else in your consumer. I don't think postgres
needs to provide 100 different formats out of the box when there is an
API. The JSON output does not have to be extremely chatty either btw.

    In my opinion, logical decoding plugins that don't come with core
are close to worthless (don't get me wrong):

- They very unlikely will be installed in managed environments (an
area growing significantly).
- As anything that is not in core, raises concerns by users.
- Distribution and testing are non-trivial: many OS/archs combinations.

    Given the above, I believe having a general-purpose output plugin
in-core is critical to the use of logical decoding. As for 9.4-9.6
there is test_decoding, and given that AWS uses it for production,
that's kind of fine. For 10 there is at least pgoutput, which could be
used (even though it was meant for replication). But if a new plugin
is to be developed for 11+, one really general purpose one, I'd say
json is not a good choice if it is the only output it would support.
json is too verbose, and replication, if anything, needs performance
(it is both network heavy and serialization/deserialization is quite
expensive). Why not, if one and only one plugin would be developed for
11+, general purpose, do something that is, indeed, more general,
i.e., that supports high-performance scenarios too?

  

A general purpose lower bandwidth plugin might one supporting Protocol
Buffers. The downside is that unlike json it's not self-contained, you
need the message definitions to interpret the stream, AIUI.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Gregory Brail
gregbrail@google.com
In reply to: Alvaro Hernandez (#13)
Re: Built-in plugin for logical decoding output

I'm encouraged that pgoutput exists and I'm sorry that I missed it before.
I think it's fine as a binary-only format. If someone can write a client
for the Postgres wire protocol as documented in Chapter 52 of the docs,
then they should have no trouble consuming the output from pgoutput.

However, I can't find any docs for the output format of pgoutput, which is
going to make it less likely for people to be able to consume it. Is anyone
working on docs? I know that it's a painful process.

I also think that a JSON-format (or configurable format) plugin would make
this part of PG much more usable and I'd encourage the community to come up
with one.

Finally, since there were some "why didn't you just" questions in the email
thread, let me write a little bit about what we were trying to do.

We have a set of data that represents the configuration of some of our
customer's systems. (This is for Apigee Edge, which is a software product
that represents a small part of Google Cloud, and which was developed long
before we joined Google.) We'd like to efficiently and reliably push
configuration changes down to our customer's systems, mostly to make it
possible for them to run parts of our software stack in their own data
centers, with limited or even unreliable network connectivity to the rest
of our services. Data replication is a great fit for this problem.

However, we want the downstream software components (the ones that our
customers run in their own data centers) to know when various things
change, we want those changes delivered in a consistent order, and we want
to be able to reliably receive them by having each consumer keep track of
where they currently are in the replication scheme. Logical replication is
a great fit for this because it enables us to build a list of all the
changes to this management data in a consistent order. Once we have that
list, it's fairly simple to persist it somewhere and let clients consume it
in various ways. (In our case, via an HTTP API that supports long polling.
Having all the clients consume a Kafka stream was not an option that we
wanted to consider.)

The difference between what we're trying to do and most solutions that use
logical replication is that we will have thousands or tens of thousands of
clients pulling a list of changes that originated in a single Postgres
database. That means that we need to index our own copy of the replication
output so that clients can efficiently get changes only to "their" data.
Furthermore, it means that we can't do things like create a unique
replication slot for each client. Instead, we have a smaller number of
servers that replicate from the master, and then those in turn give out
lists of changes to other clients.

On Mon, Sep 25, 2017 at 9:48 AM, Alvaro Hernandez <aht@ongres.com> wrote:

Show quoted text

On 25/09/17 19:39, Petr Jelinek wrote:

Well, test_decoding is not meant for production use anyway, no need for
middleware to support it. The pgoutput is primarily used for internal
replication purposes, which is why we need something with more
interoperability in mind in the first place. The new plugin should still
support publications etc though IMHO.

However, having said that, and while json is a great output format

for interoperability, if there's a discussion on which plugin to include
next, I'd also favor one that has some more compact representation
format (or that supports several formats, not only json).

JSON is indeed great for interoperability, if you want more compact

format, use either pgoutput or write something of your own or do
conversion to something else in your consumer. I don't think postgres
needs to provide 100 different formats out of the box when there is an
API. The JSON output does not have to be extremely chatty either btw.

In my opinion, logical decoding plugins that don't come with core are
close to worthless (don't get me wrong):

- They very unlikely will be installed in managed environments (an area
growing significantly).
- As anything that is not in core, raises concerns by users.
- Distribution and testing are non-trivial: many OS/archs combinations.

Given the above, I believe having a general-purpose output plugin
in-core is critical to the use of logical decoding. As for 9.4-9.6 there is
test_decoding, and given that AWS uses it for production, that's kind of
fine. For 10 there is at least pgoutput, which could be used (even though
it was meant for replication). But if a new plugin is to be developed for
11+, one really general purpose one, I'd say json is not a good choice if
it is the only output it would support. json is too verbose, and
replication, if anything, needs performance (it is both network heavy and
serialization/deserialization is quite expensive). Why not, if one and only
one plugin would be developed for 11+, general purpose, do something that
is, indeed, more general, i.e., that supports high-performance scenarios
too?

Álvaro

--

Alvaro Hernandez

-----------
OnGres

#16Joshua D. Drake
jd@commandprompt.com
In reply to: Gregory Brail (#15)
Re: Built-in plugin for logical decoding output

On 09/25/2017 09:59 AM, Gregory Brail wrote:

However, I can't find any docs for the output format of pgoutput, which
is going to make it less likely for people to be able to consume it. Is
anyone working on docs? I know that it's a painful process.

I also think that a JSON-format (or configurable format) plugin would
make this part of PG much more usable and I'd encourage the community to
come up with one.

https://github.com/ildus/decoder_json
https://github.com/posix4e/jsoncdc
https://github.com/leptonix/decoding-json
https://github.com/Aloomaio/psql-json-decoder

Thanks,

JD

--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.us
***** Unless otherwise stated, opinions are my own. *****

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Alvaro Hernandez
aht@ongres.com
In reply to: Andrew Dunstan (#14)
Re: Built-in plugin for logical decoding output

On 25/09/17 19:56, Andrew Dunstan wrote:

On 09/25/2017 12:48 PM, Alvaro Hernandez wrote:

On 25/09/17 19:39, Petr Jelinek wrote:

Well, test_decoding is not meant for production use anyway, no need for
middleware to support it. The pgoutput is primarily used for internal
replication purposes, which is why we need something with more
interoperability in mind in the first place. The new plugin should still
support publications etc though IMHO.

     However, having said that, and while json is a great output format
for interoperability, if there's a discussion on which plugin to
include
next, I'd also favor one that has some more compact representation
format (or that supports several formats, not only json).

JSON is indeed great for interoperability, if you want more compact
format, use either pgoutput or write something of your own or do
conversion to something else in your consumer. I don't think postgres
needs to provide 100 different formats out of the box when there is an
API. The JSON output does not have to be extremely chatty either btw.

    In my opinion, logical decoding plugins that don't come with core
are close to worthless (don't get me wrong):

- They very unlikely will be installed in managed environments (an
area growing significantly).
- As anything that is not in core, raises concerns by users.
- Distribution and testing are non-trivial: many OS/archs combinations.

    Given the above, I believe having a general-purpose output plugin
in-core is critical to the use of logical decoding. As for 9.4-9.6
there is test_decoding, and given that AWS uses it for production,
that's kind of fine. For 10 there is at least pgoutput, which could be
used (even though it was meant for replication). But if a new plugin
is to be developed for 11+, one really general purpose one, I'd say
json is not a good choice if it is the only output it would support.
json is too verbose, and replication, if anything, needs performance
(it is both network heavy and serialization/deserialization is quite
expensive). Why not, if one and only one plugin would be developed for
11+, general purpose, do something that is, indeed, more general,
i.e., that supports high-performance scenarios too?

A general purpose lower bandwidth plugin might one supporting Protocol
Buffers. The downside is that unlike json it's not self-contained, you
need the message definitions to interpret the stream, AIUI.

    Sure. But that's just a matter of documenting them, or even better,
providing the .proto files, which are language-independent.

    There are also many other efficient serialization formats to
explore, some self-contained, some not.

    Álvaro

--

Alvaro Hernandez

-----------
OnGres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Gregory Brail
gregbrail@google.com
In reply to: Joshua D. Drake (#16)
Re: Built-in plugin for logical decoding output

Yes. I'm advocating something "built-in" to Postgres. Any or all of those
are likely a great starting point.

As for protobuf, I'm a big advocate -- it is easy to use, fast, extensible,
runs on lots of platforms, and produces very compact output. However it
introduces a few dependencies to the build and that may make it too
difficult for wide options within Postgres.

On Mon, Sep 25, 2017 at 10:07 AM, Joshua D. Drake <jd@commandprompt.com>
wrote:

Show quoted text

On 09/25/2017 09:59 AM, Gregory Brail wrote:

However, I can't find any docs for the output format of pgoutput, which is

going to make it less likely for people to be able to consume it. Is anyone
working on docs? I know that it's a painful process.

I also think that a JSON-format (or configurable format) plugin would
make this part of PG much more usable and I'd encourage the community to
come up with one.

https://github.com/ildus/decoder_json
https://github.com/posix4e/jsoncdc
https://github.com/leptonix/decoding-json
https://github.com/Aloomaio/psql-json-decoder

Thanks,

JD

--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.us
***** Unless otherwise stated, opinions are my own. *****

#19Andres Freund
andres@anarazel.de
In reply to: Alvaro Hernandez (#8)
Re: Built-in plugin for logical decoding output

On 2017-09-24 13:36:56 +0300, Alvaro Hernandez wrote:

��� However, if DMS uses it for what I'd call production use, I assume it is
actually production quality. I bet they do enough testing, and don't ship
software to potentially millions of customers if it doesn't work well. So...
first, I'd consider this a a sign of robustness.

You've been in software for how long? ... ;) There's quite mixed
experiences with DMS.

FWIW, I don't think there's a huge problem w/ using test_decoding - the
output isn't pretty but it's parseable. It's too verbose due to
repeating column & type names (which also slows down), but...

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Andres Freund
andres@anarazel.de
In reply to: Andrew Dunstan (#14)
Re: Built-in plugin for logical decoding output

Hi,

On 2017-09-25 12:56:00 -0400, Andrew Dunstan wrote:

A general purpose lower bandwidth plugin might one supporting Protocol
Buffers. The downside is that unlike json it's not self-contained, you
need the message definitions to interpret the stream, AIUI.

I think that makes it a non-starter for many purposes were you care
about bandwidth.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Petr Jelinek
petr@2ndquadrant.com
In reply to: Alvaro Hernandez (#13)
#22Joshua D. Drake
jd@commandprompt.com
In reply to: Gregory Brail (#18)
#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Hernandez (#13)
#24Joshua D. Drake
jd@commandprompt.com
In reply to: Petr Jelinek (#21)
#25Petr Jelinek
petr@2ndquadrant.com
In reply to: Tom Lane (#23)
#26Joshua D. Drake
jd@commandprompt.com
In reply to: Petr Jelinek (#25)
#27Andres Freund
andres@anarazel.de
In reply to: Joshua D. Drake (#26)
#28Andres Freund
andres@anarazel.de
In reply to: Petr Jelinek (#25)
#29Joshua D. Drake
jd@commandprompt.com
In reply to: Andres Freund (#27)
#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#28)
#31Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#30)
#32Stephen Frost
sfrost@snowman.net
In reply to: Andres Freund (#28)
#33Alvaro Hernandez
aht@ongres.com
In reply to: Andres Freund (#19)
#34Alvaro Hernandez
aht@ongres.com
In reply to: Joshua D. Drake (#24)
#35Joshua D. Drake
jd@commandprompt.com
In reply to: Alvaro Hernandez (#34)
#36Andres Freund
andres@anarazel.de
In reply to: Alvaro Hernandez (#34)
#37Jignesh K. Shah
J.K.Shah@Sun.COM
In reply to: Joshua D. Drake (#35)
#38Magnus Hagander
magnus@hagander.net
In reply to: Alvaro Hernandez (#33)
#39Chris Browne
cbbrowne@acm.org
In reply to: Joshua D. Drake (#26)
#40Petr Jelinek
petr@2ndquadrant.com
In reply to: Joshua D. Drake (#29)
#41Craig Ringer
craig@2ndquadrant.com
In reply to: Andres Freund (#31)
#42Craig Ringer
craig@2ndquadrant.com
In reply to: Joshua D. Drake (#29)
#43Alvaro Hernandez
aht@ongres.com
In reply to: Magnus Hagander (#38)
#44Alvaro Hernandez
aht@ongres.com
In reply to: Jignesh K. Shah (#37)
#45Alvaro Hernandez
aht@ongres.com
In reply to: Andres Freund (#36)
#46Craig Ringer
craig@2ndquadrant.com
In reply to: Alvaro Hernandez (#45)
#47Alvaro Hernandez
aht@ongres.com
In reply to: Craig Ringer (#46)
#48Craig Ringer
craig@2ndquadrant.com
In reply to: Alvaro Hernandez (#47)
#49Magnus Hagander
magnus@hagander.net
In reply to: Alvaro Hernandez (#43)
#50Petr Jelinek
petr@2ndquadrant.com
In reply to: Alvaro Hernandez (#47)
#51Alvaro Hernandez
aht@ongres.com
In reply to: Petr Jelinek (#50)
#52Alvaro Hernandez
aht@ongres.com
In reply to: Craig Ringer (#48)
#53Magnus Hagander
magnus@hagander.net
In reply to: Alvaro Hernandez (#51)
In reply to: Alvaro Hernandez (#44)
#55Craig Ringer
craig@2ndquadrant.com
In reply to: Magnus Hagander (#53)
#56Alvaro Hernandez
aht@ongres.com
In reply to: Craig Ringer (#55)
#57Henry M
henrymanmail@gmail.com
In reply to: Alvaro Hernandez (#56)
#58Dave Cramer
pg@fastcrypt.com
In reply to: Henry M (#57)