Logical Decoding Callbacks

Started by Xavier Stevensabout 11 years ago8 messagesgeneral
Jump to latest
#1Xavier Stevens
xavier@simple.com

I'm trying to write a logical decoding plugin and I'm seeing some
interesting behavior around the startup callback. When I use psql and the
built-in SQL functions (like pg_logical_slot_peek_changes) to use my module
I see the startup_cb get called. I have written my own streaming
replication client and for some reason when I use my client the startup_cb
does not get called, but I can still see the changes just fine. The reason
why the startup_cb is particularly important in my case, is I'm trying to
set two dynamic OIDs for PostGIS geometry and geography types. I've tried
setting them both as globals and as part of the output_plugin_private data.

Any idea what's wrong here? Did I miss something in the streaming
replication documentation?

Thanks in advance.

Xavier

#2Michael Paquier
michael@paquier.xyz
In reply to: Xavier Stevens (#1)
Re: Logical Decoding Callbacks

On Tue, Feb 10, 2015 at 9:52 AM, Xavier Stevens wrote:

I'm trying to write a logical decoding plugin and I'm seeing some
interesting behavior around the startup callback. When I use psql and the
built-in SQL functions (like pg_logical_slot_peek_changes) to use my module
I see the startup_cb get called. I have written my own streaming replication
client and for some reason when I use my client the startup_cb does not get
called, but I can still see the changes just fine. The reason why the
startup_cb is particularly important in my case, is I'm trying to set two
dynamic OIDs for PostGIS geometry and geography types. I've tried setting
them both as globals and as part of the output_plugin_private data.

Any idea what's wrong here? Did I miss something in the streaming
replication documentation?

Not sure, I would imagine that your problem is in the manipulation of
the memory contexts related to the decoder plugin and the way your
variables are saved.

The documentation states that startup_cb is called each time a
replication slot is created or when a slot is asked to stream changes,
be it through the psql interface or using the replication protocol:
http://www.postgresql.org/docs/devel/static/logicaldecoding-output-plugin.html#LOGICALDECODING-OUTPUT-PLUGIN-CALLBACKS

Just in case, I have just done a quick test with pg_recvlogical on
latest HEAD of REL9_4_STABLE and the startup_cb gets called:
DEBUG: received replication command: IDENTIFY_SYSTEM
DEBUG: received replication command: START_REPLICATION SLOT "slot" LOGICAL 0/0
LOG: called startup_cb_wrapper
Note that on 9.5 (master HEAD at c619c23) I am seeing an assertion
failure. What is the server version you are using? I would imagine 9.4
but let's be sure.
Regards,
--
Michael

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Andres Freund
andres@anarazel.de
In reply to: Xavier Stevens (#1)
Re: Logical Decoding Callbacks

Hi,

On 2015-02-09 16:52:59 -0800, Xavier Stevens wrote:

I'm trying to write a logical decoding plugin and I'm seeing some
interesting behavior around the startup callback. When I use psql and the
built-in SQL functions (like pg_logical_slot_peek_changes) to use my module
I see the startup_cb get called. I have written my own streaming
replication client and for some reason when I use my client the startup_cb
does not get called, but I can still see the changes just fine. The reason
why the startup_cb is particularly important in my case, is I'm trying to
set two dynamic OIDs for PostGIS geometry and geography types. I've tried
setting them both as globals and as part of the output_plugin_private data.

Any idea what's wrong here? Did I miss something in the streaming
replication documentation?

Hm, that's odd. The startup callback should (and is, I used it just a
couple hours ago) definitely be called when using the streaming
replication protocol. You're saying that the callback is not called when
you do START_LOGICAL_REPLICATION ... ?

Can you reproduce the problem with test_decoding? Because that'd
actually crash if no startup callback were registered...

Can you show your code/a testcase?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#2)
Re: Logical Decoding Callbacks

On 2015-02-10 11:01:08 +0900, Michael Paquier wrote:

Just in case, I have just done a quick test with pg_recvlogical on
latest HEAD of REL9_4_STABLE and the startup_cb gets called:
DEBUG: received replication command: IDENTIFY_SYSTEM
DEBUG: received replication command: START_REPLICATION SLOT "slot" LOGICAL 0/0
LOG: called startup_cb_wrapper
Note that on 9.5 (master HEAD at c619c23) I am seeing an assertion
failure. What is the server version you are using? I would imagine 9.4
but let's be sure.

Uh, which plugin and what assertion? What's the backtrace?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#4)
Re: Logical Decoding Callbacks

On Tue, Feb 10, 2015 at 5:59 PM, Andres Freund <andres@2ndquadrant.com>
wrote:

On 2015-02-10 11:01:08 +0900, Michael Paquier wrote:

Just in case, I have just done a quick test with pg_recvlogical on
latest HEAD of REL9_4_STABLE and the startup_cb gets called:
DEBUG: received replication command: IDENTIFY_SYSTEM
DEBUG: received replication command: START_REPLICATION SLOT "slot"

LOGICAL 0/0

LOG: called startup_cb_wrapper
Note that on 9.5 (master HEAD at c619c23) I am seeing an assertion
failure. What is the server version you are using? I would imagine 9.4
but let's be sure.

Uh, which plugin and what assertion? What's the backtrace?

I am raising that on a new thread on -hackers, that's not related to the
report here. Thanks for reminding.
--
Michael

#6Xavier Stevens
xavier@simple.com
In reply to: Michael Paquier (#5)
Re: Logical Decoding Callbacks

Sorry to raise the issue on startup_cb. I added a whole bunch of logging
statements and I was only running the section of code I wanted when the
startup callback had options.

This now gets me to the next issue I encounter. In my output plugin, I'm
trying to use the SPI interface to query about PostGIS OIDs in the startup
callback. Just calling SPI_connect() seems to be causing a segfault.

This is the last thing I see in the logs before the segfault occurs:

https://github.com/xstevens/decoderbufs/blob/master/src/decoderbufs.c#L151

On Tue, Feb 10, 2015 at 4:18 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:

Show quoted text

On Tue, Feb 10, 2015 at 5:59 PM, Andres Freund <andres@2ndquadrant.com>
wrote:

On 2015-02-10 11:01:08 +0900, Michael Paquier wrote:

Just in case, I have just done a quick test with pg_recvlogical on
latest HEAD of REL9_4_STABLE and the startup_cb gets called:
DEBUG: received replication command: IDENTIFY_SYSTEM
DEBUG: received replication command: START_REPLICATION SLOT "slot"

LOGICAL 0/0

LOG: called startup_cb_wrapper
Note that on 9.5 (master HEAD at c619c23) I am seeing an assertion
failure. What is the server version you are using? I would imagine 9.4
but let's be sure.

Uh, which plugin and what assertion? What's the backtrace?

I am raising that on a new thread on -hackers, that's not related to the
report here. Thanks for reminding.
--
Michael

#7Andres Freund
andres@anarazel.de
In reply to: Xavier Stevens (#6)
Re: Logical Decoding Callbacks

On 2015-02-10 10:33:41 -0800, Xavier Stevens wrote:

Sorry to raise the issue on startup_cb. I added a whole bunch of logging
statements and I was only running the section of code I wanted when the
startup callback had options.

Heh.

Just to make sure: You can pass options via replication protocol too.

This now gets me to the next issue I encounter. In my output plugin, I'm
trying to use the SPI interface to query about PostGIS OIDs in the startup
callback. Just calling SPI_connect() seems to be causing a segfault.

This is the last thing I see in the logs before the segfault occurs:

https://github.com/xstevens/decoderbufs/blob/master/src/decoderbufs.c#L151

The problem likely is that in the startup callback you're neither
guaranteed to be in a transaction, nor to have a snapshot set up.

It'd generally be easier to analyze such problems if you provide a
backtrace (e.g. by enabling core files). Another generally very
adviseable thing to do when developing code running in the backend is to
enable assertions (you may already do that...).

You can lookup types much easier than that btw. C.f. TypenameGetTypid(typname)

But note that both that and what you do would possibly fail if there's
more than one geometry type around. You could either hardcode postgis'
schema name and use
namespaceId = LookupExplicitNamespace(schemaname, false);
typoid = GetSysCacheOid2(TYPENAMENSP, PointerGetDatum(typname), ObjectIdGetDatum(namespaceId));
if (typoid == InvalidOid)
elog(ERROR, "cache lookup failed for type %u", typoid);

or be a bit more complex and lookup the postgis' extension's schema
pg_extension.extnamespace first.

Anyway, to run full queries in the startup callback you're going to have
to do something like:

if (!IsTransactionState())
{
tx_started = true;
StartTransactionCommand();
}
PushActiveSnapshot(GetTransactionSnapshot());

/* do your stuff */

PopActiveSnapshot();
if (tx_started)
CommitTransactionCommand();

Note that the begin, change, commit callbacks *do* run with a
transaction and snapshot setup. But you can't run general SQL queries -
only catalog tables (or tables marked as such) are accessible.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8Xavier Stevens
xavier@simple.com
In reply to: Andres Freund (#7)
Re: Logical Decoding Callbacks

There was no reason I needed to run full statements in this case. I just
didn't know I could get the type ids like that.

Thanks for all of your help Andres!

On Tue, Feb 10, 2015 at 1:23 PM, Andres Freund <andres@2ndquadrant.com>
wrote:

Show quoted text

On 2015-02-10 10:33:41 -0800, Xavier Stevens wrote:

Sorry to raise the issue on startup_cb. I added a whole bunch of logging
statements and I was only running the section of code I wanted when the
startup callback had options.

Heh.

Just to make sure: You can pass options via replication protocol too.

This now gets me to the next issue I encounter. In my output plugin, I'm
trying to use the SPI interface to query about PostGIS OIDs in the

startup

callback. Just calling SPI_connect() seems to be causing a segfault.

This is the last thing I see in the logs before the segfault occurs:

https://github.com/xstevens/decoderbufs/blob/master/src/decoderbufs.c#L151

The problem likely is that in the startup callback you're neither
guaranteed to be in a transaction, nor to have a snapshot set up.

It'd generally be easier to analyze such problems if you provide a
backtrace (e.g. by enabling core files). Another generally very
adviseable thing to do when developing code running in the backend is to
enable assertions (you may already do that...).

You can lookup types much easier than that btw. C.f.
TypenameGetTypid(typname)

But note that both that and what you do would possibly fail if there's
more than one geometry type around. You could either hardcode postgis'
schema name and use
namespaceId = LookupExplicitNamespace(schemaname, false);
typoid = GetSysCacheOid2(TYPENAMENSP, PointerGetDatum(typname),
ObjectIdGetDatum(namespaceId));
if (typoid == InvalidOid)
elog(ERROR, "cache lookup failed for type %u", typoid);

or be a bit more complex and lookup the postgis' extension's schema
pg_extension.extnamespace first.

Anyway, to run full queries in the startup callback you're going to have
to do something like:

if (!IsTransactionState())
{
tx_started = true;
StartTransactionCommand();
}
PushActiveSnapshot(GetTransactionSnapshot());

/* do your stuff */

PopActiveSnapshot();
if (tx_started)
CommitTransactionCommand();

Note that the begin, change, commit callbacks *do* run with a
transaction and snapshot setup. But you can't run general SQL queries -
only catalog tables (or tables marked as such) are accessible.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services