Bug in Physical Replication Slots (at least 9.5)?

Started by Jonathon Nelsonover 9 years ago58 messageshackersbugs
Jump to latest
#1Jonathon Nelson
jdnelson@dyn.com
hackersbugs

We think we have discovered a bug in the physical replication slots
functionality in PostgreSQL 9.5.
We've seen the behavior across Operating Systems (CentOS-7 and openSUSE
LEAP 42.1), filesystems (ext4 and xfs), and versions (9.5.3 and 9.5.4). All
were on x86_64.

We notice that if we stop and then re-start the *standby*, upon restart it
will - sometimes - request a WAL file that the master no longer has.

First, the postgresql configuration differs only minimally from the stock
config:

Assume wal_keep_segments = 0.
Assume the use of physical replication slots.
Assume one master, one standby.

Lastly, we have observed the behavior "in the wild" at least twice and in
the lab a dozen or so times.

EXAMPLE #1 (logs are from the replica):

user=,db=,app=,client= DEBUG: creating and filling new WAL file
user=,db=,app=,client= DEBUG: done creating and filling new WAL file
user=,db=,app=,client= DEBUG: sending write 6/8B000000 flush 6/8A000000
apply 5/748425A0
user=,db=,app=,client= DEBUG: sending write 6/8B000000 flush 6/8B000000
apply 5/74843020
<control-c here>
user=,db=,app=,client= DEBUG: postmaster received signal 2
user=,db=,app=,client= LOG: received fast shutdown request
user=,db=,app=,client= LOG: aborting any active transactions

And, upon restart:

user=,db=,app=,client= LOG: restartpoint starting: xlog
user=,db=,app=,client= DEBUG: updated min recovery point to 6/67002390 on
timeline 1
user=,db=,app=,client= DEBUG: performing replication slot checkpoint
user=,db=,app=,client= DEBUG: updated min recovery point to 6/671768C0 on
timeline 1
user=,db=,app=,client= CONTEXT: writing block 589 of relation
base/13294/16501
user=,db=,app=,client= LOG: invalid magic number 0000 in log segment
00000001000000060000008B, offset 0
user=,db=,app=,client= DEBUG: switched WAL source from archive to stream
after failure
user=,db=,app=,client= LOG: started streaming WAL from primary at
6/8A000000 on timeline 1
user=,db=,app=,client= FATAL: could not receive data from WAL stream:
ERROR: requested WAL segment 00000001000000060000008A has already been
removed

A physical analysis shows that the WAL file 00000001000000060000008B is
100% zeroes (ASCII NUL).

The results of querying pg_replication_slots shows a restart_lsn that
matches ….6/8B.

Pg_controldata shows values like:
Minimum recovery ending location: 6/8Axxxxxx

How can the master show a position that is greater than the minimum
recovery ending location?

EXAMPLE #2:

Minimum recovery ending location: 19DD/73FFFFE0
Log segment 00000001000019DD00000073 was not available.
The restart LSN was 19DD/74000000.
The last few lines from pg_xlogdump 00000001000019DD00000073:

rmgr: Btree len (rec/tot): 2/ 64, tx: 77257, lsn:
19DD/73FFFF60, prev 19DD/73FFFF20, desc: INSERT_LEAF off 132, blkref #0:
rel 1663/16403/150017028 blk 1832
rmgr: Btree len (rec/tot): 2/ 64, tx: 77257, lsn:
19DD/73FFFFA0, prev 19DD/73FFFF60, desc: INSERT_LEAF off 206, blkref #0:
rel 1663/16403/150017028 blk 11709

If I'm understanding this properly, (0x73FFFFA0 - 0x73000000) is the first
byte of the last record in this file, and the record length is 64 bytes
which places the first byte of the next record at: 16777184 (0xffffe0)
(logical position 0x73ffffe0: this jives with pg_controldata).

However, there are only 32 bytes of file left:
0x73FFFFA0 - 0x73000000 + 64 -=> 16777184
16777216 - 16777184 -=> 32

Which means that the next record is in the WAL file
00000001000019DD00000074.

A possible theory:

Let us assume PG has applied 100% of the data in a given WAL file, and
let’s assume (as in this case) that the WAL file is
00000001000019DD00000073. When it starts up again, it uses the control
data to start and say “The next record is at 19DD/0x73ffffe0" which it
truncates to 0x73000000. However, PG has *also* already told the master
that is has fully received, written, and flushed all of the data for that
WAL file, so the master has 0x74000000 as the start position (and has
consequently removed the WAL file for 0x73). The relationship between
pg_controldata and pg_replication_slots.restart_lsn seem to be very
slightly (but importantly) at odds.

Could it be this part of the code?

From src/backend/replication/walreceiverfuncs.c in RequestXLogStreaming (as
of a0aa358ca603d8189fe4be72f614cf7cf363d81a):

235 /*
236 * We always start at the beginning of the segment. That prevents a
broken
237 * segment (i.e., with no records in the first half of a segment)
from
238 * being created by XLOG streaming, which might cause trouble later
on if
239 * the segment is e.g archived.
240 */
241 if (recptr % XLogSegSize != 0)
242 recptr -= recptr % XLogSegSize;
243

We start up with 19DD/0x73ffffe0 (but there would not be enough room in
that segment for any more records, so logically we'd have to go to
19DD/0x74000000). When we start WAL receiving, we truncate 0x73ffffe0 to
0x73000000, which the master has already removed (and - technically - we
don't actually need?).

--
Jon Nelson
Dyn / Principal Software Engineer

#2Jonathon Nelson
jdnelson@dyn.com
In reply to: Jonathon Nelson (#1)
hackersbugs
Re: Bug in Physical Replication Slots (at least 9.5)?

On Mon, Nov 28, 2016 at 1:39 PM, Jonathon Nelson <jdnelson@dyn.com> wrote:

We think we have discovered a bug in the physical replication slots
functionality in PostgreSQL 9.5.
We've seen the behavior across Operating Systems (CentOS-7 and openSUSE
LEAP 42.1), filesystems (ext4 and xfs), and versions (9.5.3 and 9.5.4). All
were on x86_64.

We notice that if we stop and then re-start the *standby*, upon restart it
will - sometimes - request a WAL file that the master no longer has.

I hate to largely re-quote my entire wall-of-text email/bug report, but
there were no responses to this (to be fair, it was at the end of a month
of US holidays, etc...).

Is there more information I should provide? Can I get this added to some
sort of official bug list (it doesn't have a bug number)?

Any help or advice here would be appreciated.

First, the postgresql configuration differs only minimally from the stock
config:

Assume wal_keep_segments = 0.
Assume the use of physical replication slots.
Assume one master, one standby.

Lastly, we have observed the behavior "in the wild" at least twice and in
the lab a dozen or so times.

EXAMPLE #1 (logs are from the replica):

user=,db=,app=,client= DEBUG: creating and filling new WAL file
user=,db=,app=,client= DEBUG: done creating and filling new WAL file
user=,db=,app=,client= DEBUG: sending write 6/8B000000 flush 6/8A000000
apply 5/748425A0
user=,db=,app=,client= DEBUG: sending write 6/8B000000 flush 6/8B000000
apply 5/74843020
<control-c here>
user=,db=,app=,client= DEBUG: postmaster received signal 2
user=,db=,app=,client= LOG: received fast shutdown request
user=,db=,app=,client= LOG: aborting any active transactions

And, upon restart:

user=,db=,app=,client= LOG: restartpoint starting: xlog
user=,db=,app=,client= DEBUG: updated min recovery point to 6/67002390 on
timeline 1
user=,db=,app=,client= DEBUG: performing replication slot checkpoint
user=,db=,app=,client= DEBUG: updated min recovery point to 6/671768C0 on
timeline 1
user=,db=,app=,client= CONTEXT: writing block 589 of relation
base/13294/16501
user=,db=,app=,client= LOG: invalid magic number 0000 in log segment
00000001000000060000008B, offset 0
user=,db=,app=,client= DEBUG: switched WAL source from archive to stream
after failure
user=,db=,app=,client= LOG: started streaming WAL from primary at
6/8A000000 on timeline 1
user=,db=,app=,client= FATAL: could not receive data from WAL stream:
ERROR: requested WAL segment 00000001000000060000008A has already been
removed

A physical analysis shows that the WAL file 00000001000000060000008B is
100% zeroes (ASCII NUL).

The results of querying pg_replication_slots shows a restart_lsn that
matches ….6/8B.

Pg_controldata shows values like:
Minimum recovery ending location: 6/8Axxxxxx

How can the master show a position that is greater than the minimum
recovery ending location?

EXAMPLE #2:

Minimum recovery ending location: 19DD/73FFFFE0
Log segment 00000001000019DD00000073 was not available.
The restart LSN was 19DD/74000000.
The last few lines from pg_xlogdump 00000001000019DD00000073:

rmgr: Btree len (rec/tot): 2/ 64, tx: 77257, lsn:
19DD/73FFFF60, prev 19DD/73FFFF20, desc: INSERT_LEAF off 132, blkref #0:
rel 1663/16403/150017028 blk 1832
rmgr: Btree len (rec/tot): 2/ 64, tx: 77257, lsn:
19DD/73FFFFA0, prev 19DD/73FFFF60, desc: INSERT_LEAF off 206, blkref #0:
rel 1663/16403/150017028 blk 11709

If I'm understanding this properly, (0x73FFFFA0 - 0x73000000) is the first
byte of the last record in this file, and the record length is 64 bytes
which places the first byte of the next record at: 16777184 (0xffffe0)
(logical position 0x73ffffe0: this jives with pg_controldata).

However, there are only 32 bytes of file left:
0x73FFFFA0 - 0x73000000 + 64 -=> 16777184
16777216 - 16777184 -=> 32

Which means that the next record is in the WAL file
00000001000019DD00000074.

A possible theory:

Let us assume PG has applied 100% of the data in a given WAL file, and
let’s assume (as in this case) that the WAL file is
00000001000019DD00000073. When it starts up again, it uses the control
data to start and say “The next record is at 19DD/0x73ffffe0" which it
truncates to 0x73000000. However, PG has *also* already told the master
that is has fully received, written, and flushed all of the data for that
WAL file, so the master has 0x74000000 as the start position (and has
consequently removed the WAL file for 0x73). The relationship between
pg_controldata and pg_replication_slots.restart_lsn seem to be very
slightly (but importantly) at odds.

Could it be this part of the code?

From src/backend/replication/walreceiverfuncs.c in RequestXLogStreaming
(as of a0aa358ca603d8189fe4be72f614cf7cf363d81a):

235 /*
236 * We always start at the beginning of the segment. That prevents
a broken
237 * segment (i.e., with no records in the first half of a segment)
from
238 * being created by XLOG streaming, which might cause trouble
later on if
239 * the segment is e.g archived.
240 */
241 if (recptr % XLogSegSize != 0)
242 recptr -= recptr % XLogSegSize;
243

We start up with 19DD/0x73ffffe0 (but there would not be enough room in
that segment for any more records, so logically we'd have to go to
19DD/0x74000000). When we start WAL receiving, we truncate 0x73ffffe0 to
0x73000000, which the master has already removed (and - technically - we
don't actually need?).

--
Jon Nelson
Dyn / Principal Software Engineer

--
Jon Nelson
Dyn / Principal Software Engineer

#3Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Jonathon Nelson (#2)
hackersbugs
Re: Bug in Physical Replication Slots (at least 9.5)?

Hello. I added pgsql-hackers.

This occurs also on git master and back to 9.4.

At Fri, 13 Jan 2017 08:47:06 -0600, Jonathon Nelson <jdnelson@dyn.com> wrote in <CACJqAM1ydcZcd5DoCp+y5hkWto1ZeGW+Mj8UK7avqctbGJO8Bw@mail.gmail.com>

On Mon, Nov 28, 2016 at 1:39 PM, Jonathon Nelson <jdnelson@dyn.com> wrote:

First, the postgresql configuration differs only minimally from the stock
config:

Assume wal_keep_segments = 0.
Assume the use of physical replication slots.
Assume one master, one standby.

Lastly, we have observed the behavior "in the wild" at least twice and in
the lab a dozen or so times.

EXAMPLE #1 (logs are from the replica):

user=,db=,app=,client= DEBUG: creating and filling new WAL file
user=,db=,app=,client= DEBUG: done creating and filling new WAL file
user=,db=,app=,client= DEBUG: sending write 6/8B000000 flush 6/8A000000
apply 5/748425A0
user=,db=,app=,client= DEBUG: sending write 6/8B000000 flush 6/8B000000
apply 5/74843020
<control-c here>
user=,db=,app=,client= DEBUG: postmaster received signal 2
user=,db=,app=,client= LOG: received fast shutdown request
user=,db=,app=,client= LOG: aborting any active transactions

And, upon restart:

user=,db=,app=,client= LOG: restartpoint starting: xlog
user=,db=,app=,client= DEBUG: updated min recovery point to 6/67002390 on
timeline 1
user=,db=,app=,client= DEBUG: performing replication slot checkpoint
user=,db=,app=,client= DEBUG: updated min recovery point to 6/671768C0 on
timeline 1
user=,db=,app=,client= CONTEXT: writing block 589 of relation
base/13294/16501
user=,db=,app=,client= LOG: invalid magic number 0000 in log segment
00000001000000060000008B, offset 0
user=,db=,app=,client= DEBUG: switched WAL source from archive to stream
after failure
user=,db=,app=,client= LOG: started streaming WAL from primary at
6/8A000000 on timeline 1
user=,db=,app=,client= FATAL: could not receive data from WAL stream:
ERROR: requested WAL segment 00000001000000060000008A has already been
removed

I managed to reproduce this. A little tweak as the first patch
lets the standby to suicide as soon as walreceiver sees a
contrecord at the beginning of a segment.

- M(aster): createdb as a master with wal_keep_segments = 0
(default), min_log_messages = debug2
- M: Create a physical repslot.
- S(tandby): Setup a standby database.
- S: Edit recovery.conf to use the replication slot above then
start it.
- S: touch /tmp/hoge
- M: Run pgbench ...
- S: After a while, the standby stops.

LOG: #################### STOP THE SERVER

- M: Stop pgbench.
- M: Do 'checkpoint;' twice.
- S: rm /tmp/hoge
- S: Fails to catch up with the following error.

FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 00000001000000000000002B has already been removed

This problem occurs when only the earlier parts in a continued
record is replicated then the segment is removed on the
master. In other words, the first half is only on standby, and
the second half is only on the master.

I believe that a continuation record cannot be span over three or
more *segments* (is it right?), so kepping one spare segment
would be enough. The attached second patch does this.

Other possible measures might be,

- Allowing switching wal source while reading a continuation
record. Currently ReadRecord assumes that a continuation record
can be read from single source. But this needs refactoring
involving xlog.c, xlogreader.c and relatives.

- Delaying recycing a segment until the last partial record on it
completes. This seems doable in page-wise (coarse resolution)
but would cost additional reading of past xlog files (page
header of past pages is required).

- Delaying write/flush feedback until the current record is
completed. walreceiver is not conscious of a WAL record and
this might break synchronous replication.

Any thoughts?

=========================================

A physical analysis shows that the WAL file 00000001000000060000008B is
100% zeroes (ASCII NUL).

I suppose it is on the standby so the segment file is the one
where the next transferred record will be written onto.

The results of querying pg_replication_slots shows a restart_lsn that
matches ….6/8B.

It is the beginning of the next record to be replicatd as
documentation. In other words, just after the last transferred
record (containing padding).

Pg_controldata shows values like:
Minimum recovery ending location: 6/8Axxxxxx

It is the beginning of the last applied record.

How can the master show a position that is greater than the minimum
recovery ending location?

So it is natural that the former is larger than the latter.

EXAMPLE #2:

Minimum recovery ending location: 19DD/73FFFFE0
Log segment 00000001000019DD00000073 was not available.
The restart LSN was 19DD/74000000.
The last few lines from pg_xlogdump 00000001000019DD00000073:

rmgr: Btree len (rec/tot): 2/ 64, tx: 77257, lsn:
19DD/73FFFF60, prev 19DD/73FFFF20, desc: INSERT_LEAF off 132, blkref #0:
rel 1663/16403/150017028 blk 1832
rmgr: Btree len (rec/tot): 2/ 64, tx: 77257, lsn:
19DD/73FFFFA0, prev 19DD/73FFFF60, desc: INSERT_LEAF off 206, blkref #0:
rel 1663/16403/150017028 blk 11709

If I'm understanding this properly, (0x73FFFFA0 - 0x73000000) is the first
byte of the last record in this file, and the record length is 64 bytes
which places the first byte of the next record at: 16777184 (0xffffe0)
(logical position 0x73ffffe0: this jives with pg_controldata).

Maybe right. pg_xlogdump skips partial records.

However, there are only 32 bytes of file left:
0x73FFFFA0 - 0x73000000 + 64 -=> 16777184
16777216 - 16777184 -=> 32

Which means that the next record is in the WAL file
00000001000019DD00000074.

Maybe right.

A possible theory:

Let us assume PG has applied 100% of the data in a given WAL file, and
let’s assume (as in this case) that the WAL file is
00000001000019DD00000073. When it starts up again, it uses the control
data to start and say “The next record is at 19DD/0x73ffffe0" which it
truncates to 0x73000000. However, PG has *also* already told the master
that is has fully received, written, and flushed all of the data for that
WAL file, so the master has 0x74000000 as the start position (and has
consequently removed the WAL file for 0x73). The relationship between
pg_controldata and pg_replication_slots.restart_lsn seem to be very
slightly (but importantly) at odds.

Could it be this part of the code?

No. the code does the right thing. The problem is that a
continuation record is assumed to be on the same wal source, that
is, archive/wal and streaming. But a continueation record is
distributed to two sources.

From src/backend/replication/walreceiverfuncs.c in RequestXLogStreaming
(as of a0aa358ca603d8189fe4be72f614cf7cf363d81a):

235 /*
236 * We always start at the beginning of the segment. That prevents
a broken
237 * segment (i.e., with no records in the first half of a segment)
from
238 * being created by XLOG streaming, which might cause trouble
later on if
239 * the segment is e.g archived.
240 */
241 if (recptr % XLogSegSize != 0)
242 recptr -= recptr % XLogSegSize;
243

We start up with 19DD/0x73ffffe0 (but there would not be enough room in
that segment for any more records, so logically we'd have to go to
19DD/0x74000000). When we start WAL receiving, we truncate 0x73ffffe0 to
0x73000000, which the master has already removed (and - technically - we
don't actually need?).

0x73ffffe0 and 0x73000000 are on the same segment. Current
recovery mechanism requires reading the record starts from
0x73ffffe0 and it is on the standby and it is read.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

repl_border_bug_test.patchtext/x-patch; charset=us-asciiDownload+23-0
keep_extra_one_seg_for_replslot.patchtext/x-patch; charset=us-asciiDownload+6-1
#4Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#3)
hackersbugs
Re: [BUGS] Bug in Physical Replication Slots (at least 9.5)?

Auch! It is wrong.

Not decrement keep, decrement segno.

2017年1月17日(火) 19:37 Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>:

Show quoted text

Hello. I added pgsql-hackers.

This occurs also on git master and back to 9.4.

At Fri, 13 Jan 2017 08:47:06 -0600, Jonathon Nelson <jdnelson@dyn.com>
wrote in <
CACJqAM1ydcZcd5DoCp+y5hkWto1ZeGW+Mj8UK7avqctbGJO8Bw@mail.gmail.com>

On Mon, Nov 28, 2016 at 1:39 PM, Jonathon Nelson <jdnelson@dyn.com>

wrote:

First, the postgresql configuration differs only minimally from the

stock

config:

Assume wal_keep_segments = 0.
Assume the use of physical replication slots.
Assume one master, one standby.

Lastly, we have observed the behavior "in the wild" at least twice and

in

the lab a dozen or so times.

EXAMPLE #1 (logs are from the replica):

user=,db=,app=,client= DEBUG: creating and filling new WAL file
user=,db=,app=,client= DEBUG: done creating and filling new WAL file
user=,db=,app=,client= DEBUG: sending write 6/8B000000 flush

6/8A000000

apply 5/748425A0
user=,db=,app=,client= DEBUG: sending write 6/8B000000 flush

6/8B000000

apply 5/74843020
<control-c here>
user=,db=,app=,client= DEBUG: postmaster received signal 2
user=,db=,app=,client= LOG: received fast shutdown request
user=,db=,app=,client= LOG: aborting any active transactions

And, upon restart:

user=,db=,app=,client= LOG: restartpoint starting: xlog
user=,db=,app=,client= DEBUG: updated min recovery point to

6/67002390 on

timeline 1
user=,db=,app=,client= DEBUG: performing replication slot checkpoint
user=,db=,app=,client= DEBUG: updated min recovery point to

6/671768C0 on

timeline 1
user=,db=,app=,client= CONTEXT: writing block 589 of relation
base/13294/16501
user=,db=,app=,client= LOG: invalid magic number 0000 in log segment
00000001000000060000008B, offset 0
user=,db=,app=,client= DEBUG: switched WAL source from archive to

stream

after failure
user=,db=,app=,client= LOG: started streaming WAL from primary at
6/8A000000 on timeline 1
user=,db=,app=,client= FATAL: could not receive data from WAL stream:
ERROR: requested WAL segment 00000001000000060000008A has already been
removed

I managed to reproduce this. A little tweak as the first patch
lets the standby to suicide as soon as walreceiver sees a
contrecord at the beginning of a segment.

- M(aster): createdb as a master with wal_keep_segments = 0
(default), min_log_messages = debug2
- M: Create a physical repslot.
- S(tandby): Setup a standby database.
- S: Edit recovery.conf to use the replication slot above then
start it.
- S: touch /tmp/hoge
- M: Run pgbench ...
- S: After a while, the standby stops.

LOG: #################### STOP THE SERVER

- M: Stop pgbench.
- M: Do 'checkpoint;' twice.
- S: rm /tmp/hoge
- S: Fails to catch up with the following error.

FATAL: could not receive data from WAL stream: ERROR: requested WAL

segment 00000001000000000000002B has already been removed

This problem occurs when only the earlier parts in a continued
record is replicated then the segment is removed on the
master. In other words, the first half is only on standby, and
the second half is only on the master.

I believe that a continuation record cannot be span over three or
more *segments* (is it right?), so kepping one spare segment
would be enough. The attached second patch does this.

Other possible measures might be,

- Allowing switching wal source while reading a continuation
record. Currently ReadRecord assumes that a continuation record
can be read from single source. But this needs refactoring
involving xlog.c, xlogreader.c and relatives.

- Delaying recycing a segment until the last partial record on it
completes. This seems doable in page-wise (coarse resolution)
but would cost additional reading of past xlog files (page
header of past pages is required).

- Delaying write/flush feedback until the current record is
completed. walreceiver is not conscious of a WAL record and
this might break synchronous replication.

Any thoughts?

=========================================

A physical analysis shows that the WAL file 00000001000000060000008B is
100% zeroes (ASCII NUL).

I suppose it is on the standby so the segment file is the one
where the next transferred record will be written onto.

The results of querying pg_replication_slots shows a restart_lsn that
matches ….6/8B.

It is the beginning of the next record to be replicatd as
documentation. In other words, just after the last transferred
record (containing padding).

Pg_controldata shows values like:
Minimum recovery ending location: 6/8Axxxxxx

It is the beginning of the last applied record.

How can the master show a position that is greater than the minimum
recovery ending location?

So it is natural that the former is larger than the latter.

EXAMPLE #2:

Minimum recovery ending location: 19DD/73FFFFE0
Log segment 00000001000019DD00000073 was not available.
The restart LSN was 19DD/74000000.
The last few lines from pg_xlogdump 00000001000019DD00000073:

rmgr: Btree len (rec/tot): 2/ 64, tx: 77257, lsn:
19DD/73FFFF60, prev 19DD/73FFFF20, desc: INSERT_LEAF off 132, blkref

#0:

rel 1663/16403/150017028 blk 1832
rmgr: Btree len (rec/tot): 2/ 64, tx: 77257, lsn:
19DD/73FFFFA0, prev 19DD/73FFFF60, desc: INSERT_LEAF off 206, blkref

#0:

rel 1663/16403/150017028 blk 11709

If I'm understanding this properly, (0x73FFFFA0 - 0x73000000) is the

first

byte of the last record in this file, and the record length is 64 bytes
which places the first byte of the next record at: 16777184 (0xffffe0)
(logical position 0x73ffffe0: this jives with pg_controldata).

Maybe right. pg_xlogdump skips partial records.

However, there are only 32 bytes of file left:
0x73FFFFA0 - 0x73000000 + 64 -=> 16777184
16777216 - 16777184 -=> 32

Which means that the next record is in the WAL file
00000001000019DD00000074.

Maybe right.

A possible theory:

Let us assume PG has applied 100% of the data in a given WAL file, and
let’s assume (as in this case) that the WAL file is
00000001000019DD00000073. When it starts up again, it uses the control
data to start and say “The next record is at 19DD/0x73ffffe0" which it
truncates to 0x73000000. However, PG has *also* already told the

master

that is has fully received, written, and flushed all of the data for

that

WAL file, so the master has 0x74000000 as the start position (and has
consequently removed the WAL file for 0x73). The relationship between
pg_controldata and pg_replication_slots.restart_lsn seem to be very
slightly (but importantly) at odds.

Could it be this part of the code?

No. the code does the right thing. The problem is that a
continuation record is assumed to be on the same wal source, that
is, archive/wal and streaming. But a continueation record is
distributed to two sources.

From src/backend/replication/walreceiverfuncs.c in RequestXLogStreaming
(as of a0aa358ca603d8189fe4be72f614cf7cf363d81a):

235 /*
236 * We always start at the beginning of the segment. That

prevents

a broken
237 * segment (i.e., with no records in the first half of a

segment)

from
238 * being created by XLOG streaming, which might cause trouble
later on if
239 * the segment is e.g archived.
240 */
241 if (recptr % XLogSegSize != 0)
242 recptr -= recptr % XLogSegSize;
243

We start up with 19DD/0x73ffffe0 (but there would not be enough room in
that segment for any more records, so logically we'd have to go to
19DD/0x74000000). When we start WAL receiving, we truncate 0x73ffffe0

to

0x73000000, which the master has already removed (and - technically -

we

don't actually need?).

0x73ffffe0 and 0x73000000 are on the same segment. Current
recovery mechanism requires reading the record starts from
0x73ffffe0 and it is on the standby and it is read.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#5Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro Horiguchi (#3)
hackersbugs
Re: Bug in Physical Replication Slots (at least 9.5)?

On Tue, Jan 17, 2017 at 7:36 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

I managed to reproduce this. A little tweak as the first patch
lets the standby to suicide as soon as walreceiver sees a
contrecord at the beginning of a segment.

Good idea.

I believe that a continuation record cannot be span over three or
more *segments* (is it right?), so keeping one spare segment
would be enough. The attached second patch does this.

I have to admit that I did not think about this problem much yet (I
bookmarked this report weeks ago to be honest as something to look
at), but that does not look right to me. Couldn't a record be spawned
across even more segments? Take a random string longer than 64MB or
event longer for example.

Other possible measures might be,

- Allowing switching wal source while reading a continuation
record. Currently ReadRecord assumes that a continuation record
can be read from single source. But this needs refactoring
involving xlog.c, xlogreader.c and relatives.

This is scary thinking about back-branches.

- Delaying recycling a segment until the last partial record on it
completes. This seems doable in page-wise (coarse resolution)
but would cost additional reading of past xlog files (page
header of past pages is required).

Hm, yes. That looks like the least invasive way to go. At least that
looks more correct than the others.

- Delaying write/flush feedback until the current record is
completed. walreceiver is not conscious of a WAL record and
this might break synchronous replication.

Not sure about this one yet.
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#6Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Michael Paquier (#5)
hackersbugs
Re: Bug in Physical Replication Slots (at least 9.5)?

Hello,

At Wed, 18 Jan 2017 12:34:51 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in <CAB7nPqQytF2giE7FD-4oJJpPVwiKJrDQPc24hLNGThX01SbSmA@mail.gmail.com>

On Tue, Jan 17, 2017 at 7:36 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

I managed to reproduce this. A little tweak as the first patch
lets the standby to suicide as soon as walreceiver sees a
contrecord at the beginning of a segment.

Good idea.

Thanks. Fortunately(?), the problematic situation seems to happen
at almost all segment boundary.

I believe that a continuation record cannot be span over three or
more *segments* (is it right?), so keeping one spare segment
would be enough. The attached second patch does this.

I have to admit that I did not think about this problem much yet (I
bookmarked this report weeks ago to be honest as something to look
at), but that does not look right to me. Couldn't a record be spawned
across even more segments? Take a random string longer than 64MB or
event longer for example.

Though I haven't look closer to how a modification is splitted
into WAL records. A tuple cannot be so long. As a simple test, I
observed rechder->xl_tot_len at the end of XLogRecordAssemble
inserting an about 400KB not-so-compressable string into a text
column, but I saw a series of many records with shorter than
several thousand bytes.

Other possible measures might be,

- Allowing switching wal source while reading a continuation
record. Currently ReadRecord assumes that a continuation record
can be read from single source. But this needs refactoring
involving xlog.c, xlogreader.c and relatives.

This is scary thinking about back-branches.

Yes. It would be no longer a bug fix. (Or becomes a quite ugly hack..)

- Delaying recycling a segment until the last partial record on it
completes. This seems doable in page-wise (coarse resolution)
but would cost additional reading of past xlog files (page
header of past pages is required).

Hm, yes. That looks like the least invasive way to go. At least that
looks more correct than the others.

The attached patch does that. Usually it reads page headers only
on segment boundaries, but once continuation record found (or
failed to read the next page header, that is, the first record on
the first page in the next segment has not been replicated), it
becomes to happen on every page boundary until non-continuation
page comes.

I leave a debug info (at LOG level) in the attached file shown on
every state change of keep pointer. At least for pgbench, the
cost seems ignorable.

- Delaying write/flush feedback until the current record is
completed. walreceiver is not conscious of a WAL record and
this might break synchronous replication.

Not sure about this one yet.

I'm not sure, too:p

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

retard_keeplogseg.patchtext/x-patch; charset=us-asciiDownload+101-10
#7Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#6)
hackersbugs
Re: Bug in Physical Replication Slots (at least 9.5)?

Hello,

At Thu, 19 Jan 2017 18:37:31 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20170119.183731.223893446.horiguchi.kyotaro@lab.ntt.co.jp>

- Delaying recycling a segment until the last partial record on it
completes. This seems doable in page-wise (coarse resolution)
but would cost additional reading of past xlog files (page
header of past pages is required).

Hm, yes. That looks like the least invasive way to go. At least that
looks more correct than the others.

The attached patch does that. Usually it reads page headers only
on segment boundaries, but once continuation record found (or
failed to read the next page header, that is, the first record on
the first page in the next segment has not been replicated), it
becomes to happen on every page boundary until non-continuation
page comes.

I leave a debug info (at LOG level) in the attached file shown on
every state change of keep pointer. At least for pgbench, the
cost seems ignorable.

I revised it. It became neater and less invasive.

- Removed added keep from struct WalSnd. It is never referrenced
from other processes. It is static variable now.

- Restore keepPtr from replication slot on starting.

- Moved the main part to more appropriate position.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

retard_keeplogseg_v2.patchtext/x-patch; charset=us-asciiDownload+100-7
#8Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#7)
hackersbugs
Re: [HACKERS] Bug in Physical Replication Slots (at least 9.5)?

Hello, I'll add the rebased version to the next CF.

At Fri, 20 Jan 2017 11:07:29 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20170120.110729.107284864.horiguchi.kyotaro@lab.ntt.co.jp>

- Delaying recycling a segment until the last partial record on it
completes. This seems doable in page-wise (coarse resolution)
but would cost additional reading of past xlog files (page
header of past pages is required).

Hm, yes. That looks like the least invasive way to go. At least that
looks more correct than the others.

The attached patch does that. Usually it reads page headers only
on segment boundaries, but once continuation record found (or
failed to read the next page header, that is, the first record on
the first page in the next segment has not been replicated), it
becomes to happen on every page boundary until non-continuation
page comes.

I leave a debug info (at LOG level) in the attached file shown on
every state change of keep pointer. At least for pgbench, the
cost seems ignorable.

I revised it. It became neater and less invasive.

- Removed added keep from struct WalSnd. It is never referrenced
from other processes. It is static variable now.

- Restore keepPtr from replication slot on starting.

keepPtr is renamed to a more meaningful name restartLSN.

- Moved the main part to more appropriate position.

- Removed the debug print code.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

0001-Fix-a-bug-of-physical-replication-slot.patchtext/x-patch; charset=us-asciiDownload+100-8
#9Fujii Masao
masao.fujii@gmail.com
In reply to: Kyotaro Horiguchi (#6)
hackersbugs
Re: Bug in Physical Replication Slots (at least 9.5)?

On Thu, Jan 19, 2017 at 6:37 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Hello,

At Wed, 18 Jan 2017 12:34:51 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in <CAB7nPqQytF2giE7FD-4oJJpPVwiKJrDQPc24hLNGThX01SbSmA@mail.gmail.com>

On Tue, Jan 17, 2017 at 7:36 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

I managed to reproduce this. A little tweak as the first patch
lets the standby to suicide as soon as walreceiver sees a
contrecord at the beginning of a segment.

Good idea.

Thanks. Fortunately(?), the problematic situation seems to happen
at almost all segment boundary.

I believe that a continuation record cannot be span over three or
more *segments* (is it right?), so keeping one spare segment
would be enough. The attached second patch does this.

I have to admit that I did not think about this problem much yet (I
bookmarked this report weeks ago to be honest as something to look
at), but that does not look right to me. Couldn't a record be spawned
across even more segments? Take a random string longer than 64MB or
event longer for example.

Though I haven't look closer to how a modification is splitted
into WAL records. A tuple cannot be so long. As a simple test, I
observed rechder->xl_tot_len at the end of XLogRecordAssemble
inserting an about 400KB not-so-compressable string into a text
column, but I saw a series of many records with shorter than
several thousand bytes.

Other possible measures might be,

- Allowing switching wal source while reading a continuation
record. Currently ReadRecord assumes that a continuation record
can be read from single source. But this needs refactoring
involving xlog.c, xlogreader.c and relatives.

This is scary thinking about back-branches.

Yes. It would be no longer a bug fix. (Or becomes a quite ugly hack..)

- Delaying recycling a segment until the last partial record on it
completes. This seems doable in page-wise (coarse resolution)
but would cost additional reading of past xlog files (page
header of past pages is required).

Hm, yes. That looks like the least invasive way to go. At least that
looks more correct than the others.

The attached patch does that. Usually it reads page headers only
on segment boundaries, but once continuation record found (or
failed to read the next page header, that is, the first record on
the first page in the next segment has not been replicated), it
becomes to happen on every page boundary until non-continuation
page comes.

I'm afraid that many WAL segments would start with a continuation record
when there are the workload of short transactions (e.g., by pgbench), and
which would make restart_lsn go behind very much. No?

The discussion on this thread just makes me think that restart_lsn should
indicate the replay location instead of flush location. This seems safer.

Regards,

--
Fujii Masao

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#10Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Fujii Masao (#9)
hackersbugs
Re: [HACKERS] Bug in Physical Replication Slots (at least 9.5)?

Thank you for the comment.

At Thu, 2 Feb 2017 01:26:03 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in <CAHGQGwEET=QBA_jND=xhrXn+9ZreP4_qMBAqsBZg56beqxbveg@mail.gmail.com>

The attached patch does that. Usually it reads page headers only
on segment boundaries, but once continuation record found (or
failed to read the next page header, that is, the first record on
the first page in the next segment has not been replicated), it
becomes to happen on every page boundary until non-continuation
page comes.

I'm afraid that many WAL segments would start with a continuation record
when there are the workload of short transactions (e.g., by pgbench), and
which would make restart_lsn go behind very much. No?

I agreed. So trying to release the lock for every page boundary
but restart_lsn goes behind much if so many contiguous pages were
CONTRECORD. But I think the chance for the situation sticks for
one or more segments is ignorablly low. Being said that, there
*is* possibility of false continuation, anyway.

The discussion on this thread just makes me think that restart_lsn should
indicate the replay location instead of flush location. This seems safer.

Standby restarts from minRecoveryPoint, which is a copy of
XLogCtl->replayEndRecPtr and updated by
UpdateMinRecoveryPoint(). Whlie, applyPtr in reply messages is a
copy of XLogCtl->lastReplayedEndRecptr which is updated after the
upate of on-disk minRecoveryPoint. It seems safe from the
viewpoint.

On the other hand, apply is pausable. Records are copied and
flushd on standby then the segments on master that is already
sent are safely be removed even for the case. In spite of that,
older segments on the master are kept from being removed during
the pause. If applyPtr were used as restart_lsn, this could be
another problem and this is sure to happen.

I'm not sure how much possibility is there for several contiguous
segments are full of contpages. But I think it's worse that apply
pause causes needless pg_wal flooding.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#11Michael Paquier
michael@paquier.xyz
In reply to: Fujii Masao (#9)
hackersbugs
Re: [BUGS] Bug in Physical Replication Slots (at least 9.5)?

On Thu, Feb 2, 2017 at 1:26 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

I'm afraid that many WAL segments would start with a continuation record
when there are the workload of short transactions (e.g., by pgbench), and
which would make restart_lsn go behind very much. No?

I don't quite understand this argument. Even if there are many small
transactions, that would cause restart_lsn to just be late by one
segment, all the time.

The discussion on this thread just makes me think that restart_lsn should
indicate the replay location instead of flush location. This seems safer.

That would penalize WAL retention on the primary with standbys using
recovery_min_apply_delay and a slot for example...

We can attempt to address this problem two ways. The patch proposed
(ugly btw and there are two typos!) is doing it in the WAL sender by
not making restart_lsn jump to the next segment if a continuation
record is found. Or we could have the standby request for the next
segment instead if the record it wants to replay from is at a boundary
and that it locally has the beginning of the record, and it has it
because it already confirmed to the primary that it flushed to the
next segment. Not sure which fix is better though.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Michael Paquier (#11)
hackersbugs
Re: [BUGS] Bug in Physical Replication Slots (at least 9.5)?

At Thu, 2 Feb 2017 15:34:33 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in <CAB7nPqQ05G15JooRMEONgPkW0osot77yaFAUF9_6Q8G+v+2+xg@mail.gmail.com>

On Thu, Feb 2, 2017 at 1:26 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

I'm afraid that many WAL segments would start with a continuation record
when there are the workload of short transactions (e.g., by pgbench), and
which would make restart_lsn go behind very much. No?

I don't quite understand this argument. Even if there are many small
transactions, that would cause restart_lsn to just be late by one
segment, all the time.

The discussion on this thread just makes me think that restart_lsn should
indicate the replay location instead of flush location. This seems safer.

That would penalize WAL retention on the primary with standbys using
recovery_min_apply_delay and a slot for example...

We can attempt to address this problem two ways. The patch proposed
(ugly btw and there are two typos!) is doing it in the WAL sender by
not making restart_lsn jump to the next segment if a continuation
record is found.

Sorry for the ug..:p Anyway, the previous version was not the
latest. The attached one is the revised version. (Sorry, I
haven't find a typo by myself..)

Or we could have the standby request for the next
segment instead if the record it wants to replay from is at a boundary
and that it locally has the beginning of the record, and it has it
because it already confirmed to the primary that it flushed to the
next segment. Not sure which fix is better though.

We could it as I said, with some refactoring ReadRecord involving
reader plugin mechanism..

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

0001-Fix-a-bug-of-physical-replication-slot.patchtext/x-patch; charset=us-asciiDownload+97-8
#13Venkata B Nagothi
nag1010@gmail.com
In reply to: Kyotaro Horiguchi (#3)
hackersbugs
Re: [BUGS] Bug in Physical Replication Slots (at least 9.5)?

On Tue, Jan 17, 2017 at 9:36 PM, Kyotaro HORIGUCHI <
horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Hello. I added pgsql-hackers.

This occurs also on git master and back to 9.4.

At Fri, 13 Jan 2017 08:47:06 -0600, Jonathon Nelson <jdnelson@dyn.com>
wrote in <CACJqAM1ydcZcd5DoCp+y5hkWto1ZeGW+Mj8UK7avqctbGJO8Bw@mail.
gmail.com>

On Mon, Nov 28, 2016 at 1:39 PM, Jonathon Nelson <jdnelson@dyn.com>

wrote:

First, the postgresql configuration differs only minimally from the

stock

config:

Assume wal_keep_segments = 0.
Assume the use of physical replication slots.
Assume one master, one standby.

Lastly, we have observed the behavior "in the wild" at least twice and

in

the lab a dozen or so times.

EXAMPLE #1 (logs are from the replica):

user=,db=,app=,client= DEBUG: creating and filling new WAL file
user=,db=,app=,client= DEBUG: done creating and filling new WAL file
user=,db=,app=,client= DEBUG: sending write 6/8B000000 flush

6/8A000000

apply 5/748425A0
user=,db=,app=,client= DEBUG: sending write 6/8B000000 flush

6/8B000000

apply 5/74843020
<control-c here>
user=,db=,app=,client= DEBUG: postmaster received signal 2
user=,db=,app=,client= LOG: received fast shutdown request
user=,db=,app=,client= LOG: aborting any active transactions

And, upon restart:

user=,db=,app=,client= LOG: restartpoint starting: xlog
user=,db=,app=,client= DEBUG: updated min recovery point to

6/67002390 on

timeline 1
user=,db=,app=,client= DEBUG: performing replication slot checkpoint
user=,db=,app=,client= DEBUG: updated min recovery point to

6/671768C0 on

timeline 1
user=,db=,app=,client= CONTEXT: writing block 589 of relation
base/13294/16501
user=,db=,app=,client= LOG: invalid magic number 0000 in log segment
00000001000000060000008B, offset 0
user=,db=,app=,client= DEBUG: switched WAL source from archive to

stream

after failure
user=,db=,app=,client= LOG: started streaming WAL from primary at
6/8A000000 on timeline 1
user=,db=,app=,client= FATAL: could not receive data from WAL stream:
ERROR: requested WAL segment 00000001000000060000008A has already been
removed

I managed to reproduce this. A little tweak as the first patch
lets the standby to suicide as soon as walreceiver sees a
contrecord at the beginning of a segment.

- M(aster): createdb as a master with wal_keep_segments = 0
(default), min_log_messages = debug2
- M: Create a physical repslot.
- S(tandby): Setup a standby database.
- S: Edit recovery.conf to use the replication slot above then
start it.
- S: touch /tmp/hoge
- M: Run pgbench ...
- S: After a while, the standby stops.

LOG: #################### STOP THE SERVER

- M: Stop pgbench.
- M: Do 'checkpoint;' twice.
- S: rm /tmp/hoge
- S: Fails to catch up with the following error.

FATAL: could not receive data from WAL stream: ERROR: requested WAL

segment 00000001000000000000002B has already been removed

I have been testing / reviewing the latest patch
"0001-Fix-a-bug-of-physical-replication-slot.patch" and i think, i might
need some more clarification on this.

Before applying the patch, I tried re-producing the above error -

- I had master->standby in streaming replication
- Took the backup of master
- with a low max_wal_size and wal_keep_segments = 0
- Configured standby with recovery.conf
- Created replication slot on master
- Configured the replication slot on standby and started the standby
- I got the below error

2017-03-10 11:58:15.704 AEDT [478] LOG: invalid record length at

0/F2000140: wanted 24, got 0

2017-03-10 11:58:15.706 AEDT [481] LOG: started streaming WAL from

primary at 0/F2000000 on timeline 1

2017-03-10 11:58:15.706 AEDT [481] FATAL: could not receive data

from WAL stream: ERROR: requested WAL segment 0000000100000000000000F2 has
already been removed

and i could notice that the file "0000000100000000000000F2" was removed
from the master. This can be easily re-produced and this occurs
irrespective of configuring replication slots.

As long as the file "0000000100000000000000F2" is available on the master,
standby continues to stream WALs without any issues.

some more details -

Contents of the file "0000000100000000000000F2" on standby before
pg_stop_backup()

rmgr: Standby len (rec/tot): 24/ 50, tx: 0, lsn:
0/F2000028, prev 0/F1000098, desc: RUNNING_XACTS nextXid 638
latestCompletedXid 637 oldestRunningXid 638
rmgr: Standby len (rec/tot): 24/ 50, tx: 0, lsn:
0/F2000060, prev 0/F2000028, desc: RUNNING_XACTS nextXid 638
latestCompletedXid 637 oldestRunningXid 638
rmgr: XLOG len (rec/tot): 80/ 106, tx: 0, lsn:
0/F2000098, prev 0/F2000060, desc: CHECKPOINT_ONLINE redo 0/F2000060; tli
1; prev tli 1; fpw true; xid 0:638; oid 16487; multi 1; offset 0; oldest
xid 544 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp
xid: 0/0; oldest running xid 638; online
rmgr: Standby len (rec/tot): 24/ 50, tx: 0, lsn:
0/F2000108, prev 0/F2000098, desc: RUNNING_XACTS nextXid 638
latestCompletedXid 637 oldestRunningXid 638
pg_waldump: FATAL: error in WAL record at 0/F2000108: invalid record
length at 0/F2000140: wanted 24, got 0

Contents of the file on master after pg_stop_backup()

rmgr: Standby len (rec/tot): 24/ 50, tx: 0, lsn:
0/F2000028, prev 0/F1000098, desc: RUNNING_XACTS nextXid 638
latestCompletedXid 637 oldestRunningXid 638
rmgr: Standby len (rec/tot): 24/ 50, tx: 0, lsn:
0/F2000060, prev 0/F2000028, desc: RUNNING_XACTS nextXid 638
latestCompletedXid 637 oldestRunningXid 638
rmgr: XLOG len (rec/tot): 80/ 106, tx: 0, lsn:
0/F2000098, prev 0/F2000060, desc: CHECKPOINT_ONLINE redo 0/F2000060; tli
1; prev tli 1; fpw true; xid 0:638; oid 16487; multi 1; offset 0; oldest
xid 544 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp
xid: 0/0; oldest running xid 638; online
rmgr: Standby len (rec/tot): 24/ 50, tx: 0, lsn:
0/F2000108, prev 0/F2000098, desc: RUNNING_XACTS nextXid 638
latestCompletedXid 637 oldestRunningXid 638
rmgr: Heap2 len (rec/tot): 8/ 7735, tx: 0, lsn:
0/F2000140, prev 0/F2000108, desc: CLEAN remxid 620, blkref #0: rel
1663/13179/2619 blk 2 FPW
rmgr: Heap2 len (rec/tot): 8/ 6863, tx: 0, lsn:
0/F2001F78, prev 0/F2000140, desc: CLEAN remxid 620, blkref #0: rel
1663/13179/2840 blk 0 FPW
rmgr: Standby len (rec/tot): 24/ 50, tx: 0, lsn:
0/F2003A60, prev 0/F2001F78, desc: RUNNING_XACTS nextXid 638
latestCompletedXid 637 oldestRunningXid 638
rmgr: Standby len (rec/tot): 24/ 50, tx: 0, lsn:
0/F2003A98, prev 0/F2003A60, desc: RUNNING_XACTS nextXid 638
latestCompletedXid 637 oldestRunningXid 638
rmgr: XLOG len (rec/tot): 80/ 106, tx: 0, lsn:
0/F2003AD0, prev 0/F2003A98, desc: CHECKPOINT_ONLINE redo 0/F2003A98; tli
1; prev tli 1; fpw true; xid 0:638; oid 16487; multi 1; offset 0; oldest
xid 544 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp
xid: 0/0; oldest running xid 638; online
rmgr: Standby len (rec/tot): 24/ 50, tx: 0, lsn:
0/F2003B40, prev 0/F2003AD0, desc: RUNNING_XACTS nextXid 638
latestCompletedXid 637 oldestRunningXid 638
rmgr: XLOG len (rec/tot): 8/ 34, tx: 0, lsn:
0/F2003B78, prev 0/F2003B40, desc: BACKUP_END 0/F2000060
rmgr: XLOG len (rec/tot): 0/ 24, tx: 0, lsn:
0/F2003BA0, prev 0/F2003B78, desc: SWITCH

If the scenario i created to reproduce the error is correct, then, applying
the patch is not making a difference.

I think, i need help in building a specific test case which will re-produce
the specific BUG related to physical replication slots as reported ?

Will continue to review the patch, once i have any comments on this.

Regards,
Venkata B N

Database Consultant

#14Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Venkata B Nagothi (#13)
hackersbugs
Re: [BUGS] Bug in Physical Replication Slots (at least 9.5)?

Hello,

At Mon, 13 Mar 2017 11:06:00 +1100, Venkata B Nagothi <nag1010@gmail.com> wrote in <CAEyp7J-4MmVwGoZSwvaSULZC80JDD_tL-9KsNiqF17+bNqiSBg@mail.gmail.com>

On Tue, Jan 17, 2017 at 9:36 PM, Kyotaro HORIGUCHI <
horiguchi.kyotaro@lab.ntt.co.jp> wrote:

I managed to reproduce this. A little tweak as the first patch
lets the standby to suicide as soon as walreceiver sees a
contrecord at the beginning of a segment.

- M(aster): createdb as a master with wal_keep_segments = 0
(default), min_log_messages = debug2
- M: Create a physical repslot.
- S(tandby): Setup a standby database.
- S: Edit recovery.conf to use the replication slot above then
start it.
- S: touch /tmp/hoge
- M: Run pgbench ...
- S: After a while, the standby stops.

LOG: #################### STOP THE SERVER

- M: Stop pgbench.
- M: Do 'checkpoint;' twice.
- S: rm /tmp/hoge
- S: Fails to catch up with the following error.

FATAL: could not receive data from WAL stream: ERROR: requested WAL

segment 00000001000000000000002B has already been removed

I have been testing / reviewing the latest patch
"0001-Fix-a-bug-of-physical-replication-slot.patch" and i think, i might
need some more clarification on this.

Before applying the patch, I tried re-producing the above error -

- I had master->standby in streaming replication
- Took the backup of master
- with a low max_wal_size and wal_keep_segments = 0
- Configured standby with recovery.conf
- Created replication slot on master
- Configured the replication slot on standby and started the standby

I suppose the "configure" means primary_slot_name in recovery.conf.

- I got the below error

2017-03-10 11:58:15.704 AEDT [478] LOG: invalid record length at

0/F2000140: wanted 24, got 0

2017-03-10 11:58:15.706 AEDT [481] LOG: started streaming WAL from

primary at 0/F2000000 on timeline 1

2017-03-10 11:58:15.706 AEDT [481] FATAL: could not receive data

from WAL stream: ERROR: requested WAL segment 0000000100000000000000F2 has
already been removed

Maybe you created the master slot with non-reserve (default) mode
and put a some-minites pause after making the backup and before
starting the standby. For the case the master slot doesn't keep
WAL segments unless the standby connects so a couple of
checkpoints can blow away the first segment required by the
standby. This is quite reasonable behavior. The following steps
makes this more sure.

- Took the backup of master
- with a low max_wal_size = 2 and wal_keep_segments = 0
- Configured standby with recovery.conf
- Created replication slot on master

+ - SELECT pg_switch_wal(); on master twice.
+ - checkpoint; on master twice.

- Configured the replication slot on standby and started the standby

Creating the slot with the following command will save it.

=# select pg_create_physical_replication_slot('s1', true);

and i could notice that the file "0000000100000000000000F2" was removed
from the master. This can be easily re-produced and this occurs
irrespective of configuring replication slots.

As long as the file "0000000100000000000000F2" is available on the master,
standby continues to stream WALs without any issues.

...

If the scenario i created to reproduce the error is correct, then, applying
the patch is not making a difference.

Yes, the patch is not for saving this case. The patch saves the
case where the previous segment to the first required segment by
standby was removed and it contains the first part of a record
continues to the first required segment. On the other hand this
case is that the segment at the start point of standby is just
removed.

I think, i need help in building a specific test case which will re-produce
the specific BUG related to physical replication slots as reported ?

Will continue to review the patch, once i have any comments on this.

Thaks a lot!

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#14)
hackersbugs
Re: [HACKERS] Bug in Physical Replication Slots (at least 9.5)?

This conflicts with 6912acc (replication lag tracker) so just
rebased on a6f22e8.

At Fri, 17 Mar 2017 16:48:27 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20170317.164827.46663014.horiguchi.kyotaro@lab.ntt.co.jp>

Hello,

At Mon, 13 Mar 2017 11:06:00 +1100, Venkata B Nagothi <nag1010@gmail.com> wrote in <CAEyp7J-4MmVwGoZSwvaSULZC80JDD_tL-9KsNiqF17+bNqiSBg@mail.gmail.com>

On Tue, Jan 17, 2017 at 9:36 PM, Kyotaro HORIGUCHI <
horiguchi.kyotaro@lab.ntt.co.jp> wrote:

I managed to reproduce this. A little tweak as the first patch
lets the standby to suicide as soon as walreceiver sees a
contrecord at the beginning of a segment.

- M(aster): createdb as a master with wal_keep_segments = 0
(default), min_log_messages = debug2
- M: Create a physical repslot.
- S(tandby): Setup a standby database.
- S: Edit recovery.conf to use the replication slot above then
start it.
- S: touch /tmp/hoge
- M: Run pgbench ...
- S: After a while, the standby stops.

LOG: #################### STOP THE SERVER

- M: Stop pgbench.
- M: Do 'checkpoint;' twice.
- S: rm /tmp/hoge
- S: Fails to catch up with the following error.

FATAL: could not receive data from WAL stream: ERROR: requested WAL

segment 00000001000000000000002B has already been removed

I have been testing / reviewing the latest patch
"0001-Fix-a-bug-of-physical-replication-slot.patch" and i think, i might
need some more clarification on this.

Before applying the patch, I tried re-producing the above error -

- I had master->standby in streaming replication
- Took the backup of master
- with a low max_wal_size and wal_keep_segments = 0
- Configured standby with recovery.conf
- Created replication slot on master
- Configured the replication slot on standby and started the standby

I suppose the "configure" means primary_slot_name in recovery.conf.

- I got the below error

2017-03-10 11:58:15.704 AEDT [478] LOG: invalid record length at

0/F2000140: wanted 24, got 0

2017-03-10 11:58:15.706 AEDT [481] LOG: started streaming WAL from

primary at 0/F2000000 on timeline 1

2017-03-10 11:58:15.706 AEDT [481] FATAL: could not receive data

from WAL stream: ERROR: requested WAL segment 0000000100000000000000F2 has
already been removed

Maybe you created the master slot with non-reserve (default) mode
and put a some-minites pause after making the backup and before
starting the standby. For the case the master slot doesn't keep
WAL segments unless the standby connects so a couple of
checkpoints can blow away the first segment required by the
standby. This is quite reasonable behavior. The following steps
makes this more sure.

- Took the backup of master
- with a low max_wal_size = 2 and wal_keep_segments = 0
- Configured standby with recovery.conf
- Created replication slot on master

+ - SELECT pg_switch_wal(); on master twice.
+ - checkpoint; on master twice.

- Configured the replication slot on standby and started the standby

Creating the slot with the following command will save it.

=# select pg_create_physical_replication_slot('s1', true);

and i could notice that the file "0000000100000000000000F2" was removed
from the master. This can be easily re-produced and this occurs
irrespective of configuring replication slots.

As long as the file "0000000100000000000000F2" is available on the master,
standby continues to stream WALs without any issues.

...

If the scenario i created to reproduce the error is correct, then, applying
the patch is not making a difference.

Yes, the patch is not for saving this case. The patch saves the
case where the previous segment to the first required segment by
standby was removed and it contains the first part of a record
continues to the first required segment. On the other hand this
case is that the segment at the start point of standby is just
removed.

I think, i need help in building a specific test case which will re-produce
the specific BUG related to physical replication slots as reported ?

Will continue to review the patch, once i have any comments on this.

Thaks a lot!

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

0001-Fix-a-bug-of-physical-replication-slot_a6f22e8.patchtext/x-patch; charset=us-asciiDownload+97-8
#16Venkata B Nagothi
nag1010@gmail.com
In reply to: Kyotaro Horiguchi (#15)
hackersbugs
Re: [HACKERS] Bug in Physical Replication Slots (at least 9.5)?

Regards,

Venkata B N
Database Consultant

On Tue, Mar 28, 2017 at 5:51 PM, Kyotaro HORIGUCHI <
horiguchi.kyotaro@lab.ntt.co.jp> wrote:

This conflicts with 6912acc (replication lag tracker) so just
rebased on a6f22e8.

I tried applying this patch to latest master, it is not getting applied

[dba@buildhost postgresql]$ git apply
/data/postgresql-patches/9.5-ReplicationSlots-Bug-Patch/0001-Fix-a-bug-of-physical-replication-slot_a6f22e8.patch
/data/postgresql-patches/9.5-ReplicationSlots-Bug-Patch/0001-Fix-a-bug-of-physical-replication-slot_a6f22e8.patch:28:
trailing whitespace.
/*
/data/postgresql-patches/9.5-ReplicationSlots-Bug-Patch/0001-Fix-a-bug-of-physical-replication-slot_a6f22e8.patch:29:
trailing whitespace.
* This variable corresponds to restart_lsn in pg_replication_slots for a
/data/postgresql-patches/9.5-ReplicationSlots-Bug-Patch/0001-Fix-a-bug-of-physical-replication-slot_a6f22e8.patch:30:
trailing whitespace.
* physical slot. This has a valid value only when it differs from the
current
/data/postgresql-patches/9.5-ReplicationSlots-Bug-Patch/0001-Fix-a-bug-of-physical-replication-slot_a6f22e8.patch:31:
trailing whitespace.
* flush pointer.
/data/postgresql-patches/9.5-ReplicationSlots-Bug-Patch/0001-Fix-a-bug-of-physical-replication-slot_a6f22e8.patch:32:
trailing whitespace.
*/
error: patch failed: src/backend/replication/walsender.c:210
error: src/backend/replication/walsender.c: patch does not apply

Regards,

Venkata Balaji N
Database Consultant

#17Michael Paquier
michael@paquier.xyz
In reply to: Venkata B Nagothi (#16)
hackersbugs
Re: [BUGS] Bug in Physical Replication Slots (at least 9.5)?

On Thu, Mar 30, 2017 at 8:49 AM, Venkata B Nagothi <nag1010@gmail.com> wrote:

On Tue, Mar 28, 2017 at 5:51 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
I tried applying this patch to latest master, it is not getting applied

[dba@buildhost postgresql]$ git apply
/data/postgresql-patches/9.5-ReplicationSlots-Bug-Patch/0001-Fix-a-bug-of-physical-replication-slot_a6f22e8.patch
/data/postgresql-patches/9.5-ReplicationSlots-Bug-Patch/0001-Fix-a-bug-of-physical-replication-slot_a6f22e8.patch:28:
trailing whitespace.
/*
/data/postgresql-patches/9.5-ReplicationSlots-Bug-Patch/0001-Fix-a-bug-of-physical-replication-slot_a6f22e8.patch:29:
trailing whitespace.
* This variable corresponds to restart_lsn in pg_replication_slots for a
/data/postgresql-patches/9.5-ReplicationSlots-Bug-Patch/0001-Fix-a-bug-of-physical-replication-slot_a6f22e8.patch:30:
trailing whitespace.
* physical slot. This has a valid value only when it differs from the
current
/data/postgresql-patches/9.5-ReplicationSlots-Bug-Patch/0001-Fix-a-bug-of-physical-replication-slot_a6f22e8.patch:31:
trailing whitespace.
* flush pointer.
/data/postgresql-patches/9.5-ReplicationSlots-Bug-Patch/0001-Fix-a-bug-of-physical-replication-slot_a6f22e8.patch:32:
trailing whitespace.
*/
error: patch failed: src/backend/replication/walsender.c:210
error: src/backend/replication/walsender.c: patch does not apply

git apply and git am can be very picky sometimes, so you may want to
fallback to patch -p1 if things don't work. In this case it does.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Venkata B Nagothi
nag1010@gmail.com
In reply to: Michael Paquier (#17)
hackersbugs
Re: [HACKERS] Bug in Physical Replication Slots (at least 9.5)?

On Thu, Mar 30, 2017 at 10:55 AM, Michael Paquier <michael.paquier@gmail.com

wrote:

On Thu, Mar 30, 2017 at 8:49 AM, Venkata B Nagothi <nag1010@gmail.com>
wrote:

On Tue, Mar 28, 2017 at 5:51 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
I tried applying this patch to latest master, it is not getting applied

[dba@buildhost postgresql]$ git apply
/data/postgresql-patches/9.5-ReplicationSlots-Bug-Patch/

0001-Fix-a-bug-of-physical-replication-slot_a6f22e8.patch

/data/postgresql-patches/9.5-ReplicationSlots-Bug-Patch/

0001-Fix-a-bug-of-physical-replication-slot_a6f22e8.patch:28:

trailing whitespace.
/*
/data/postgresql-patches/9.5-ReplicationSlots-Bug-Patch/

0001-Fix-a-bug-of-physical-replication-slot_a6f22e8.patch:29:

trailing whitespace.
* This variable corresponds to restart_lsn in pg_replication_slots for a
/data/postgresql-patches/9.5-ReplicationSlots-Bug-Patch/

0001-Fix-a-bug-of-physical-replication-slot_a6f22e8.patch:30:

trailing whitespace.
* physical slot. This has a valid value only when it differs from the
current
/data/postgresql-patches/9.5-ReplicationSlots-Bug-Patch/

0001-Fix-a-bug-of-physical-replication-slot_a6f22e8.patch:31:

trailing whitespace.
* flush pointer.
/data/postgresql-patches/9.5-ReplicationSlots-Bug-Patch/

0001-Fix-a-bug-of-physical-replication-slot_a6f22e8.patch:32:

trailing whitespace.
*/
error: patch failed: src/backend/replication/walsender.c:210
error: src/backend/replication/walsender.c: patch does not apply

git apply and git am can be very picky sometimes, so you may want to
fallback to patch -p1 if things don't work. In this case it does.

patch -p1 seems to be working. Thanks !

Regards,

Venkata B N
Database Consultant

#19Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Michael Paquier (#17)
hackersbugs
Re: [BUGS] Bug in Physical Replication Slots (at least 9.5)?

On Thu, Mar 30, 2017 at 8:49 AM, Venkata B Nagothi <nag1010@gmail.com> wrote:

On Tue, Mar 28, 2017 at 5:51 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
I tried applying this patch to latest master, it is not getting applied

[dba@buildhost postgresql]$ git apply
/data/postgresql-patches/9.5-ReplicationSlots-Bug-Patch/0001-Fix-a-bug-of-physical-replication-slot_a6f22e8.patch
/data/postgresql-patches/9.5-ReplicationSlots-Bug-Patch/0001-Fix-a-bug-of-physical-replication-slot_a6f22e8.patch:28:
trailing whitespace.
/*
/data/postgresql-patches/9.5-ReplicationSlots-Bug-Patch/0001-Fix-a-bug-of-physical-replication-slot_a6f22e8.patch:29:
trailing whitespace.
* This variable corresponds to restart_lsn in pg_replication_slots for a
/data/postgresql-patches/9.5-ReplicationSlots-Bug-Patch/0001-Fix-a-bug-of-physical-replication-slot_a6f22e8.patch:30:
trailing whitespace.
* physical slot. This has a valid value only when it differs from the
current
/data/postgresql-patches/9.5-ReplicationSlots-Bug-Patch/0001-Fix-a-bug-of-physical-replication-slot_a6f22e8.patch:31:
trailing whitespace.
* flush pointer.
/data/postgresql-patches/9.5-ReplicationSlots-Bug-Patch/0001-Fix-a-bug-of-physical-replication-slot_a6f22e8.patch:32:
trailing whitespace.
*/
error: patch failed: src/backend/replication/walsender.c:210
error: src/backend/replication/walsender.c: patch does not apply

git apply and git am can be very picky sometimes, so you may want to
fallback to patch -p1 if things don't work. In this case it does.

Committers will not apply patches which has trailing whitespace
issues. So the patch submitter needs to fix them anyway.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Michael Paquier
michael@paquier.xyz
In reply to: Tatsuo Ishii (#19)
hackersbugs
Re: [HACKERS] Bug in Physical Replication Slots (at least 9.5)?

On Thu, Mar 30, 2017 at 9:12 AM, Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

Committers will not apply patches which has trailing whitespace
issues. So the patch submitter needs to fix them anyway.

I cannot comment on that point (committers are free to pick up things
the way they want), but just using git commands to apply a patch
should not be an obstacle for a review if a patch can be easily
applied as long as they roughly respect GNU's diff format.
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#21Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Michael Paquier (#20)
hackersbugs
#22Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Venkata B Nagothi (#18)
hackersbugs
#23Venkata B Nagothi
nag1010@gmail.com
In reply to: Kyotaro Horiguchi (#22)
hackersbugs
#24Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Venkata B Nagothi (#23)
hackersbugs
#25Venkata B Nagothi
nag1010@gmail.com
In reply to: Kyotaro Horiguchi (#24)
hackersbugs
#26Venkata B Nagothi
nag1010@gmail.com
In reply to: Kyotaro Horiguchi (#14)
hackersbugs
#27Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Venkata B Nagothi (#26)
hackersbugs
#28Venkata B Nagothi
nag1010@gmail.com
In reply to: Kyotaro Horiguchi (#27)
hackersbugs
#29Michael Paquier
michael@paquier.xyz
In reply to: Venkata B Nagothi (#28)
hackersbugs
#30Ryan Murphy
ryanfmurphy@gmail.com
In reply to: Michael Paquier (#29)
hackersbugs
#31Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Michael Paquier (#29)
hackersbugs
#32Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro Horiguchi (#31)
hackersbugs
#33Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Michael Paquier (#32)
hackersbugs
#34Andres Freund
andres@anarazel.de
In reply to: Kyotaro Horiguchi (#33)
hackersbugs
#35Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#34)
hackersbugs
#36Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Michael Paquier (#35)
hackersbugs
#37Andres Freund
andres@anarazel.de
In reply to: Kyotaro Horiguchi (#36)
hackersbugs
#38Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Andres Freund (#37)
hackersbugs
#39Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro Horiguchi (#38)
hackersbugs
#40Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Michael Paquier (#39)
hackersbugs
#41Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro Horiguchi (#40)
hackersbugs
#42Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#41)
hackersbugs
#43Stephen Frost
sfrost@snowman.net
In reply to: Michael Paquier (#42)
hackersbugs
#44Michael Paquier
michael@paquier.xyz
In reply to: Stephen Frost (#43)
hackersbugs
#45Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#44)
hackersbugs
#46Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Andres Freund (#45)
hackersbugs
#47Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro Horiguchi (#46)
hackersbugs
#48Bruce Momjian
bruce@momjian.us
In reply to: Kyotaro Horiguchi (#6)
hackersbugs
#49Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Bruce Momjian (#48)
hackersbugs
#50Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Michael Paquier (#47)
hackersbugs
#51Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#50)
hackersbugs
#52Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro Horiguchi (#51)
hackersbugs
#53Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Michael Paquier (#52)
hackersbugs
#54Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Kyotaro Horiguchi (#46)
hackersbugs
#55Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Heikki Linnakangas (#54)
hackersbugs
#56Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Kyotaro Horiguchi (#55)
hackersbugs
#57Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Heikki Linnakangas (#56)
hackersbugs
#58Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Heikki Linnakangas (#57)
hackersbugs