pg13: xlogreader API adjust
Hello
Per discussion in thread [1]/messages/by-id/20200406025651.fpzdb5yyb7qyhqko@alap3.anarazel.de, I propose the following patch to give
another adjustment to the xlogreader API. This results in a small but
not insignificat net reduction of lines of code. What this patch does
is adjust the signature of these new xlogreader callbacks, making the
API simpler. The changes are:
* the segment_open callback installs the FD in xlogreader state itself,
instead of passing the FD back. This was suggested by Kyotaro
Horiguchi in that thread[2]/messages/by-id/20200508.114228.963995144765118400.horikyota.ntt@gmail.com.
* We no longer pass segcxt to segment_open; it's in XLogReaderState,
which is already an argument.
* We no longer pass seg/segcxt to WALRead; instead, that function takes
them from XLogReaderState, which is already an argument.
(This means XLogSendPhysical has to drink more of the fake_xlogreader
kool-aid.)
I claim the reason to do it now instead of pg14 is to make it simpler
for third-party xlogreader callers to adjust.
(Some might be thinking that I do this to avoid an API change later, but
my guts tell me that we'll adjust xlogreader again in pg14 for the
encryption stuff and other reasons, so.)
[1]: /messages/by-id/20200406025651.fpzdb5yyb7qyhqko@alap3.anarazel.de
[2]: /messages/by-id/20200508.114228.963995144765118400.horikyota.ntt@gmail.com
--
�lvaro Herrera Developer, https://www.PostgreSQL.org/
Attachments:
xlogreader-api-adjust.patchtext/x-diff; charset=us-asciiDownload+55-80
At Mon, 11 May 2020 16:33:36 -0400, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in
Hello
Per discussion in thread [1], I propose the following patch to give
another adjustment to the xlogreader API. This results in a small but
not insignificat net reduction of lines of code. What this patch does
is adjust the signature of these new xlogreader callbacks, making the
API simpler. The changes are:* the segment_open callback installs the FD in xlogreader state itself,
instead of passing the FD back. This was suggested by Kyotaro
Horiguchi in that thread[2].* We no longer pass segcxt to segment_open; it's in XLogReaderState,
which is already an argument.* We no longer pass seg/segcxt to WALRead; instead, that function takes
them from XLogReaderState, which is already an argument.
(This means XLogSendPhysical has to drink more of the fake_xlogreader
kool-aid.)I claim the reason to do it now instead of pg14 is to make it simpler
for third-party xlogreader callers to adjust.(Some might be thinking that I do this to avoid an API change later, but
my guts tell me that we'll adjust xlogreader again in pg14 for the
encryption stuff and other reasons, so.)[1] /messages/by-id/20200406025651.fpzdb5yyb7qyhqko@alap3.anarazel.de
[2] /messages/by-id/20200508.114228.963995144765118400.horikyota.ntt@gmail.com
The simplified interface of WALRead looks far better to me since it no
longer has unreasonable duplicates of parameters. I agree to the
discussion about third-party xlogreader callers but not sure about
back-patching burden.
I'm not sure the reason for wal_segment_open and WalSndSegmentOpen
being modified different way about error handling of BasicOpenFile, I
prefer the WalSndSegmentOpen way. However, that difference doesn't
harm anything so I'm fine with the current patch.
+ fake_xlogreader.seg = *sendSeg;
+ fake_xlogreader.segcxt = *sendCxt;
fake_xlogreader.seg is a different instance from *sendSeg. WALRead
modifies fake_xlogreader.seg but does not modify *sendSeg. Thus the
change doesn't persist. On the other hand WalSndSegmentOpen reads
*sendSeg, which is not under control of WALRead.
Maybe we had better to make fake_xlogreader be a global variable of
walsender.c that covers the current sendSeg and sendCxt.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On 2020-May-12, Kyotaro Horiguchi wrote:
I'm not sure the reason for wal_segment_open and WalSndSegmentOpen
being modified different way about error handling of BasicOpenFile, I
prefer the WalSndSegmentOpen way. However, that difference doesn't
harm anything so I'm fine with the current patch.
Yeah, I couldn't decide which style I liked the most. I used the one
you suggested.
+ fake_xlogreader.seg = *sendSeg;
+ fake_xlogreader.segcxt = *sendCxt;fake_xlogreader.seg is a different instance from *sendSeg. WALRead
modifies fake_xlogreader.seg but does not modify *sendSeg. Thus the
change doesn't persist. On the other hand WalSndSegmentOpen reads
*sendSeg, which is not under control of WALRead.Maybe we had better to make fake_xlogreader be a global variable of
walsender.c that covers the current sendSeg and sendCxt.
I tried that. I was about to leave it at just modifying physical
walsender (simple enough, and it passed tests), but I noticed that
WalSndErrorCleanup() would be a problem because we don't know if it's
physical or logical walsender. So in the end I added a global
'xlogreader' pointer in walsender.c -- logical walsender sets it to the
true xlogreader it has inside the logical decoding context, and physical
walsender sets it to its fake xlogreader. That seems to work nicely.
sendSeg/sendCxt are gone entirely. Logical walsender was doing
WALOpenSegmentInit() uselessly during InitWalSender(), since it was
using the separate sendSeg/sendCxt structs instead of the ones in its
xlogreader. (Some mysteries become clearer!)
It's slightly disquieting that the segment_close call in
WalSndErrorCleanup is not covered, but in any case this should work well
AFAICS. I think this is simpler to understand than formerly.
Now the only silliness remaining is the fact that different users of the
xlogreader interface are doing different things about the TLI.
Hopefully we can unify everything to something sensible one day .. but
that's not going to happen in pg13.
I'll get this pushed tomorrow, unless there are further objections.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
0001-Adjust-walsender-usage-of-xlogreader-simplify-APIs.patchtext/x-diff; charset=us-asciiDownload+83-106
Pushed. Thanks for the help!
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
Pushed. Thanks for the help!
This seems to have fixed bowerbird. Were you expecting that?
regards, tom lane
On 2020-May-13, Tom Lane wrote:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
Pushed. Thanks for the help!
This seems to have fixed bowerbird. Were you expecting that?
Hm, not really.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
I think I've discovered a problem with 850196b6. The following steps
can be used to trigger a segfault:
# wal_level = logical
psql postgres -c "create database testdb;"
psql testdb -c "select pg_create_logical_replication_slot('slot', 'test_decoding');"
psql "dbname=postgres replication=database" -c "START_REPLICATION SLOT slot LOGICAL 0/0;"
From a quick glance, I think the problem starts in
StartLogicalReplication() in walsender.c. The call to
CreateDecodingContext() may ERROR before xlogreader is initialized in
the next line, so the subsequent call to WalSndErrorCleanup()
segfaults when it attempts to access xlogreader.
Nathan
At Thu, 14 May 2020 01:03:48 +0000, "Bossart, Nathan" <bossartn@amazon.com> wrote in
I think I've discovered a problem with 850196b6. The following steps
can be used to trigger a segfault:# wal_level = logical
psql postgres -c "create database testdb;"
psql testdb -c "select pg_create_logical_replication_slot('slot', 'test_decoding');"
psql "dbname=postgres replication=database" -c "START_REPLICATION SLOT slot LOGICAL 0/0;"From a quick glance, I think the problem starts in
StartLogicalReplication() in walsender.c. The call to
CreateDecodingContext() may ERROR before xlogreader is initialized in
the next line, so the subsequent call to WalSndErrorCleanup()
segfaults when it attempts to access xlogreader.
Good catch! That's not only for CreateDecodingContet. That happens
everywhere in the query loop in PostgresMain() until logreader is
initialized. So that also happens, for example, by starting logical
replication using invalidated slot. Checking xlogreader != NULL in
WalSndErrorCleanup is sufficient. It doesn't make actual difference,
but the attached explicitly initialize the pointer with NULL.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
0001-Don-t-check-uninitialized-xlog-reader-state.patchtext/x-patch; charset=us-asciiDownload+2-3
On Thu, May 14, 2020 at 02:12:25PM +0900, Kyotaro Horiguchi wrote:
Good catch! That's not only for CreateDecodingContet. That happens
everywhere in the query loop in PostgresMain() until logreader is
initialized. So that also happens, for example, by starting logical
replication using invalidated slot. Checking xlogreader != NULL in
WalSndErrorCleanup is sufficient. It doesn't make actual difference,
but the attached explicitly initialize the pointer with NULL.
Alvaro, are you planning to look at that? Should we have an open item
for this matter?
--
Michael
On 2020-May-15, Michael Paquier wrote:
On Thu, May 14, 2020 at 02:12:25PM +0900, Kyotaro Horiguchi wrote:
Good catch! That's not only for CreateDecodingContet. That happens
everywhere in the query loop in PostgresMain() until logreader is
initialized. So that also happens, for example, by starting logical
replication using invalidated slot. Checking xlogreader != NULL in
WalSndErrorCleanup is sufficient. It doesn't make actual difference,
but the attached explicitly initialize the pointer with NULL.Alvaro, are you planning to look at that? Should we have an open item
for this matter?
On it now. I'm trying to add a test for this (needs a small change to
PostgresNode->psql), but I'm probably doing something stupid in the Perl
side, because it doesn't detect things as well as I'd like. Still
trying, but I may be asked to evict the office soon ...
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
crashtest.patchtext/x-diff; charset=us-asciiDownload+21-3
At Fri, 15 May 2020 19:24:28 -0400, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in
On 2020-May-15, Michael Paquier wrote:
On Thu, May 14, 2020 at 02:12:25PM +0900, Kyotaro Horiguchi wrote:
Good catch! That's not only for CreateDecodingContet. That happens
everywhere in the query loop in PostgresMain() until logreader is
initialized. So that also happens, for example, by starting logical
replication using invalidated slot. Checking xlogreader != NULL in
WalSndErrorCleanup is sufficient. It doesn't make actual difference,
but the attached explicitly initialize the pointer with NULL.Alvaro, are you planning to look at that? Should we have an open item
for this matter?On it now. I'm trying to add a test for this (needs a small change to
PostgresNode->psql), but I'm probably doing something stupid in the Perl
side, because it doesn't detect things as well as I'd like. Still
trying, but I may be asked to evict the office soon ...
FWIW, and I'm not sure which of the mail and the commit 1d3743023e was
earlier, but I confirmed that the committed test in
006_logical_decoding.pl causes a crash, and the crash is fixed by the
change of code.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center