archive status ".ready" files may be created too early

Started by Nathan Bossartover 6 years ago128 messageshackers
Jump to latest
#1Nathan Bossart
nathandbossart@gmail.com

Hi hackers,

I believe I've uncovered a bug that may cause archive status ".ready"
files to be created too early, which in turn may cause an incorrect
version of the corresponding WAL segment to be archived.

The crux of the issue seems to be that XLogWrite() does not wait for
the entire record to be written to disk before creating the ".ready"
file. Instead, it just waits for the last page of the segment to be
written before notifying the archiver. If PostgreSQL crashes before
it is able to write the rest of the record, it will end up reusing the
".ready" segment at the end of crash recovery. In the meantime, the
archiver process may have already processed the old version of the
segment.

This issue seems to most often manifest as WAL corruption on standby
servers after the primary server has crashed because it ran out of
disk space. I have attached a proof-of-concept patch
(ready_file_fix.patch) that waits to create any ".ready" files until
closer to the end of XLogWrite(). The patch is incorrect for a few
reasons, but I hope it helps illustrate the problem. I have also
attached another patch (repro_helper.patch) to be used in conjunction
with the following steps to reproduce the issue:

initdb .
pg_ctl -D . -o "-c archive_mode=on -c archive_command='exit 0'" -l log.log start
pgbench -i -s 1000 postgres
psql postgres -c "SELECT pg_switch_wal();"

With just repro_helper.patch applied, these commands should produce
both of the following log statements:

PANIC: failing at inconvenient time
LOG: status file already exists for "000000010000000000000017"

With both patches applied, the commands will only produce the first
PANIC statement.

Another thing I am exploring is whether a crash in between writing the
last page of a segment and creating the ".ready" file could cause the
archiver process to skip processing it altogether. In the scenario I
mention earlier, the server seems to recreate the ".ready" file since
it rewrites a portion of the segment. However, if a WAL record fits
perfectly into the last section of the segment, I am not sure whether
the ".ready" file would be created after restart.

I am admittedly in the early stages of working on this problem, but I
thought it would be worth reporting to the community early on in case
anyone has any thoughts on or past experiences with this issue.

Nathan

Attachments:

repro_helper.patchapplication/octet-stream; name=repro_helper.patchDownload+15-0
ready_file_fix.patchapplication/octet-stream; name=ready_file_fix.patchDownload+10-1
#2Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Nathan Bossart (#1)
Re: archive status ".ready" files may be created too early

Hello.

At Thu, 12 Dec 2019 22:50:20 +0000, "Bossart, Nathan" <bossartn@amazon.com> wrote in

Hi hackers,

I believe I've uncovered a bug that may cause archive status ".ready"
files to be created too early, which in turn may cause an incorrect
version of the corresponding WAL segment to be archived.

The crux of the issue seems to be that XLogWrite() does not wait for
the entire record to be written to disk before creating the ".ready"
file. Instead, it just waits for the last page of the segment to be
written before notifying the archiver. If PostgreSQL crashes before
it is able to write the rest of the record, it will end up reusing the
".ready" segment at the end of crash recovery. In the meantime, the
archiver process may have already processed the old version of the
segment.

Year, that can happen if the server restarted after the crash.

This issue seems to most often manifest as WAL corruption on standby
servers after the primary server has crashed because it ran out of
disk space.

In the first place, it's quite bad to set restart_after_crash to on,
or just restart crashed master in replication set. The standby can be
incosistent at the time of master crash, so it should be fixed using
pg_rewind or should be recreated from a base backup.

Even without that archiving behavior, a standby may receive wal bytes
inconsistent to the bytes from the same master just before crash. It
is not limited to segment boundary. It can happen on every block
boundary and could happen everywhere with more complecated steps.

What you are calling as a "problem" seems coming from allowing the
restart_after_crash behavior. On the other hand, as recommended in the
documentation, archive_command can refuse overwriting of the same
segment, but we don't impose to do that.

As the result the patch doesn't seem to save anything than setting up
and operating correctly.

Another thing I am exploring is whether a crash in between writing the
last page of a segment and creating the ".ready" file could cause the
archiver process to skip processing it altogether. In the scenario I
mention earlier, the server seems to recreate the ".ready" file since
it rewrites a portion of the segment. However, if a WAL record fits
perfectly into the last section of the segment, I am not sure whether
the ".ready" file would be created after restart.

Why that segment needs .ready after restart, even though nothing could
be written to the old segment?

I am admittedly in the early stages of working on this problem, but I
thought it would be worth reporting to the community early on in case
anyone has any thoughts on or past experiences with this issue.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#3Nathan Bossart
nathandbossart@gmail.com
In reply to: Kyotaro Horiguchi (#2)
Re: archive status ".ready" files may be created too early

On 12/12/19, 8:08 PM, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote:

At Thu, 12 Dec 2019 22:50:20 +0000, "Bossart, Nathan" <bossartn@amazon.com> wrote in

Hi hackers,

I believe I've uncovered a bug that may cause archive status ".ready"
files to be created too early, which in turn may cause an incorrect
version of the corresponding WAL segment to be archived.

The crux of the issue seems to be that XLogWrite() does not wait for
the entire record to be written to disk before creating the ".ready"
file. Instead, it just waits for the last page of the segment to be
written before notifying the archiver. If PostgreSQL crashes before
it is able to write the rest of the record, it will end up reusing the
".ready" segment at the end of crash recovery. In the meantime, the
archiver process may have already processed the old version of the
segment.

Year, that can happen if the server restarted after the crash.

This issue seems to most often manifest as WAL corruption on standby
servers after the primary server has crashed because it ran out of
disk space.

In the first place, it's quite bad to set restart_after_crash to on,
or just restart crashed master in replication set. The standby can be
incosistent at the time of master crash, so it should be fixed using
pg_rewind or should be recreated from a base backup.

Even without that archiving behavior, a standby may receive wal bytes
inconsistent to the bytes from the same master just before crash. It
is not limited to segment boundary. It can happen on every block
boundary and could happen everywhere with more complecated steps.

What you are calling as a "problem" seems coming from allowing the
restart_after_crash behavior. On the other hand, as recommended in the
documentation, archive_command can refuse overwriting of the same
segment, but we don't impose to do that.

As the result the patch doesn't seem to save anything than setting up
and operating correctly.

Disregarding the behavior of standby servers for a minute, I think
that what I've described is still a problem for archiving. If the
segment is archived too early, point-in-time restores that require it
will fail. If the server refuses to overwrite existing archive files,
the archiver process may fail to process the "good" version of the
segment until someone takes action to fix it. I think this is
especially troubling for backup utilities like pgBackRest that check
the archive_status directory independently since it is difficult to
know if the segment is truly ".ready".

I've attached a slightly improved patch to show how this might be
fixed. I am curious what concerns there are about doing something
like it to prevent this scenario.

Another thing I am exploring is whether a crash in between writing the
last page of a segment and creating the ".ready" file could cause the
archiver process to skip processing it altogether. In the scenario I
mention earlier, the server seems to recreate the ".ready" file since
it rewrites a portion of the segment. However, if a WAL record fits
perfectly into the last section of the segment, I am not sure whether
the ".ready" file would be created after restart.

Why that segment needs .ready after restart, even though nothing could
be written to the old segment?

If a ".ready" file is never created for a segment, I don't think it
will be archived.

Nathan

Attachments:

fix_ready_file_v2.patchapplication/octet-stream; name=fix_ready_file_v2.patchDownload+13-1
#4Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Kyotaro Horiguchi (#2)
Re: archive status ".ready" files may be created too early

On 2019-Dec-13, Kyotaro Horiguchi wrote:

At Thu, 12 Dec 2019 22:50:20 +0000, "Bossart, Nathan" <bossartn@amazon.com> wrote in

The crux of the issue seems to be that XLogWrite() does not wait for
the entire record to be written to disk before creating the ".ready"
file. Instead, it just waits for the last page of the segment to be
written before notifying the archiver. If PostgreSQL crashes before
it is able to write the rest of the record, it will end up reusing the
".ready" segment at the end of crash recovery. In the meantime, the
archiver process may have already processed the old version of the
segment.

Year, that can happen if the server restarted after the crash.

... which is the normal way to run things, no?

servers after the primary server has crashed because it ran out of
disk space.

In the first place, it's quite bad to set restart_after_crash to on,
or just restart crashed master in replication set.

Why is it bad? It's the default value.

The standby can be incosistent at the time of master crash, so it
should be fixed using pg_rewind or should be recreated from a base
backup.

Surely the master will just come up and replay its WAL, and there should
be no inconsistency.

You seem to be thinking that a standby is promoted immediately on crash
of the master, but this is not a given.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#5Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Nathan Bossart (#3)
Re: archive status ".ready" files may be created too early

Uggg. I must apologyze for the last bogus comment.

At Fri, 13 Dec 2019 21:24:36 +0000, "Bossart, Nathan" <bossartn@amazon.com> wrote in

On 12/12/19, 8:08 PM, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote:

As the result the patch doesn't seem to save anything than setting up
and operating correctly.

Disregarding the behavior of standby servers for a minute, I think

I'm sorry. a continuation record split beyond a segment boundary
doesn't seem to harm replication. Please forget it.

that what I've described is still a problem for archiving. If the

Yeah, I think that happens and it seems a problem.

segment is archived too early, point-in-time restores that require it
will fail. If the server refuses to overwrite existing archive files,
the archiver process may fail to process the "good" version of the
segment until someone takes action to fix it. I think this is
especially troubling for backup utilities like pgBackRest that check
the archive_status directory independently since it is difficult to
know if the segment is truly ".ready".

I've attached a slightly improved patch to show how this might be
fixed. I am curious what concerns there are about doing something
like it to prevent this scenario.

Basically, I agree to the direction, where the .ready notification is
delayed until all requested WAL bytes are written out.

But I think I found a corner case where the patch doesn't work. As I
mentioned in another message, if WAL buffer was full,
AdvanceXLInsertBuffer calls XLogWrite to write out the victim buffer
regardless whether the last record in the page was the first half of a
continuation record. XLogWrite can mark the segment as .ready even
with the patch.

Is that correct? And do you think the corner case is worth amending?

If so, we could amend also that case by marking the last segment as
.ready when XLogWrite writes the first bytes of the next segment. (As
the further corner-case, it still doesn't work if a contination record
spans over trhee or more segments.. But I don't (or want not to) think
we don't need to consider that case..)

Thoughts?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#6Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Alvaro Herrera (#4)
Re: archive status ".ready" files may be created too early

Thank you Alvaro for the comment (on my comment).

At Fri, 13 Dec 2019 18:33:44 -0300, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in

On 2019-Dec-13, Kyotaro Horiguchi wrote:

At Thu, 12 Dec 2019 22:50:20 +0000, "Bossart, Nathan" <bossartn@amazon.com> wrote in

The crux of the issue seems to be that XLogWrite() does not wait for
the entire record to be written to disk before creating the ".ready"
file. Instead, it just waits for the last page of the segment to be
written before notifying the archiver. If PostgreSQL crashes before
it is able to write the rest of the record, it will end up reusing the
".ready" segment at the end of crash recovery. In the meantime, the
archiver process may have already processed the old version of the
segment.

Year, that can happen if the server restarted after the crash.

... which is the normal way to run things, no?

Yes. In older version (< 10), the default value for wal_level was
minimal. In 10, the default only for wal_level was changed to
replica. Still I'm not sure if restart_after_crash can be recommended
for streaming replcation...

Why is it bad? It's the default value.

I reconsider it more deeply. And concluded that's not harm replication
as I thought.

WAL-buffer overflow may write partial continuation record and it can
be flushed immediately. That made me misunderstood that standby can
receive only the first half of a continuation record. Actually, that
write doesn't advance LogwrtResult.Flush. So standby doesn't receive a
split record on page boundary. (The cases where crashed mater is used
as new standby as-is might contaminate my thought..)

Sorry for the bogus comment. My conclusion here is that
restart_after_crash doesn't seem to harm standby immediately.

The standby can be incosistent at the time of master crash, so it
should be fixed using pg_rewind or should be recreated from a base
backup.

Surely the master will just come up and replay its WAL, and there should
be no inconsistency.

You seem to be thinking that a standby is promoted immediately on crash
of the master, but this is not a given.

Basically no, but it might be mixed a bit. Anyway returning to the
porposal, I think that XLogWrite can be called during at
WAL-buffer-full and it can go into the last page in a segment. The
proposed patch doesn't work since the XLogWrite call didn't write the
whole continuation record. But I'm not sure that corner-case is worth
amendint..

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#7Fujii Masao
masao.fujii@gmail.com
In reply to: Nathan Bossart (#1)
Re: archive status ".ready" files may be created too early

On Fri, Dec 13, 2019 at 7:50 AM Bossart, Nathan <bossartn@amazon.com> wrote:

Hi hackers,

I believe I've uncovered a bug that may cause archive status ".ready"
files to be created too early, which in turn may cause an incorrect
version of the corresponding WAL segment to be archived.

The crux of the issue seems to be that XLogWrite() does not wait for
the entire record to be written to disk before creating the ".ready"
file. Instead, it just waits for the last page of the segment to be
written before notifying the archiver. If PostgreSQL crashes before
it is able to write the rest of the record, it will end up reusing the
".ready" segment at the end of crash recovery. In the meantime, the
archiver process may have already processed the old version of the
segment.

Maybe I'm missing something... But since XLogWrite() seems to
call issue_xlog_fsync() before XLogArchiveNotifySeg(), ISTM that
this trouble shouldn't happen. No?

Regards,

--
Fujii Masao

#8Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Fujii Masao (#7)
Re: archive status ".ready" files may be created too early

Hello.

At Wed, 18 Dec 2019 14:10:04 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in

On Fri, Dec 13, 2019 at 7:50 AM Bossart, Nathan <bossartn@amazon.com> wrote:

Hi hackers,

I believe I've uncovered a bug that may cause archive status ".ready"
files to be created too early, which in turn may cause an incorrect
version of the corresponding WAL segment to be archived.

The crux of the issue seems to be that XLogWrite() does not wait for
the entire record to be written to disk before creating the ".ready"
file. Instead, it just waits for the last page of the segment to be
written before notifying the archiver. If PostgreSQL crashes before
it is able to write the rest of the record, it will end up reusing the
".ready" segment at the end of crash recovery. In the meantime, the
archiver process may have already processed the old version of the
segment.

Maybe I'm missing something... But since XLogWrite() seems to
call issue_xlog_fsync() before XLogArchiveNotifySeg(), ISTM that
this trouble shouldn't happen. No?

The trouble happens after the synced file is archived. If the last
record in the arcvhied segment was the first half of a continuation
record and crash before writing the last half, crash recovery stops
just before the first half and different record can be overwitten. As
the result the archived version of the segment becomes rotten.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#9Nathan Bossart
nathandbossart@gmail.com
In reply to: Kyotaro Horiguchi (#8)
Re: archive status ".ready" files may be created too early

On 12/17/19, 2:26 AM, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote:

But I think I found a corner case where the patch doesn't work. As I
mentioned in another message, if WAL buffer was full,
AdvanceXLInsertBuffer calls XLogWrite to write out the victim buffer
regardless whether the last record in the page was the first half of a
continuation record. XLogWrite can mark the segment as .ready even
with the patch.

Is that correct? And do you think the corner case is worth amending?

I certainly think it is worth trying to prevent potential WAL archive
corruption in known corner cases. Your comment highlights a potential
shortcoming of my patch. AFAICT there is no guarantee that
XLogWrite() is called with a complete WAL record. Even if that
assumption is true at the moment, it might not hold up over time.

If so, we could amend also that case by marking the last segment as
.ready when XLogWrite writes the first bytes of the next segment. (As
the further corner-case, it still doesn't work if a contination record
spans over trhee or more segments.. But I don't (or want not to) think
we don't need to consider that case..)

I'm working on a new version of the patch that will actually look at
the WAL page metadata to determine when it is safe to mark a segment
as ready for archival. It seems relatively easy to figure out whether
a page is the last one for the current WAL record.

Nathan

#10Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#9)
Re: archive status ".ready" files may be created too early

On 12/18/19, 8:34 AM, "Bossart, Nathan" <bossartn@amazon.com> wrote:

On 12/17/19, 2:26 AM, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote:

If so, we could amend also that case by marking the last segment as
.ready when XLogWrite writes the first bytes of the next segment. (As
the further corner-case, it still doesn't work if a contination record
spans over trhee or more segments.. But I don't (or want not to) think
we don't need to consider that case..)

I'm working on a new version of the patch that will actually look at
the WAL page metadata to determine when it is safe to mark a segment
as ready for archival. It seems relatively easy to figure out whether
a page is the last one for the current WAL record.

I stand corrected. My attempts to add logic to check the WAL records
added quite a bit more complexity than seemed reasonable to maintain
in this code path. For example, I didn't anticipate things like
XLOG_SWITCH records.

I am still concerned about the corner case you noted, but I have yet
to find a practical way to handle it. You suggested waiting until
writing the first bytes of the next segment before marking a segment
as ready, but I'm not sure that fixes this problem either, and I
wonder if it could result in waiting arbitrarily long before creating
a ".ready" file in some cases. Perhaps I am misunderstanding your
suggestion.

Another thing I noticed is that any changes in this area could impact
archive_timeout. If we reset the archive_timeout timer when we mark
the segments ready, we could force WAL switches more often. If we do
not move the timer logic, we could be resetting it before the file is
ready for the archiver. However, these differences might be subtle
enough to be okay.

Nathan

#11Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Nathan Bossart (#10)
Re: archive status ".ready" files may be created too early

At Sat, 21 Dec 2019 01:18:24 +0000, "Bossart, Nathan" <bossartn@amazon.com> wrote in

On 12/18/19, 8:34 AM, "Bossart, Nathan" <bossartn@amazon.com> wrote:

On 12/17/19, 2:26 AM, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote:

If so, we could amend also that case by marking the last segment as
.ready when XLogWrite writes the first bytes of the next segment. (As
the further corner-case, it still doesn't work if a contination record
spans over trhee or more segments.. But I don't (or want not to) think
we don't need to consider that case..)

I'm working on a new version of the patch that will actually look at
the WAL page metadata to determine when it is safe to mark a segment
as ready for archival. It seems relatively easy to figure out whether
a page is the last one for the current WAL record.

I stand corrected. My attempts to add logic to check the WAL records
added quite a bit more complexity than seemed reasonable to maintain
in this code path. For example, I didn't anticipate things like
XLOG_SWITCH records.

I am still concerned about the corner case you noted, but I have yet
to find a practical way to handle it. You suggested waiting until
writing the first bytes of the next segment before marking a segment
as ready, but I'm not sure that fixes this problem either, and I
wonder if it could result in waiting arbitrarily long before creating
a ".ready" file in some cases. Perhaps I am misunderstanding your
suggestion.

Another thing I noticed is that any changes in this area could impact
archive_timeout. If we reset the archive_timeout timer when we mark
the segments ready, we could force WAL switches more often. If we do
not move the timer logic, we could be resetting it before the file is
ready for the archiver. However, these differences might be subtle
enough to be okay.

You're right. That doesn't seem to work. Another thing I had in my
mind was that XLogWrite had an additional flag so that
AdvanceXLInsertBuffer can tell not to mark .ready. The function is
called while it *is* writing a complete record. So even if
AdvanceXLInsertBuffer inhibit marking .ready the succeeding bytes
comes soon and we can mark the old segment as .ready at the time.

..
+ * If record_write == false, we don't mark the last segment as .ready
+ * if the caller requested to write up to segment boundary.
..
  static void
- XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
+ XLogWrite(XLogwrtRqst WriteRqst, bool flexible, bool record_write)

When XLogWrite is called with record_write = false, we don't mark
.ready and don't advance lastSegSwitchTime/LSN. At the next time
XLogWrite is called with record_write=true, if lastSegSwitchLSN is
behind the latest segment boundary before or equal to
LogwrtResult.Write, mark the skipped segments as .ready and update
lastSegSwitchTime/LSN.

Does the above make sense?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#12Nathan Bossart
nathandbossart@gmail.com
In reply to: Kyotaro Horiguchi (#11)
Re: archive status ".ready" files may be created too early

On 12/23/19, 6:09 PM, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote:

You're right. That doesn't seem to work. Another thing I had in my
mind was that XLogWrite had an additional flag so that
AdvanceXLInsertBuffer can tell not to mark .ready. The function is
called while it *is* writing a complete record. So even if
AdvanceXLInsertBuffer inhibit marking .ready the succeeding bytes
comes soon and we can mark the old segment as .ready at the time.

..
+ * If record_write == false, we don't mark the last segment as .ready
+ * if the caller requested to write up to segment boundary.
..
static void
- XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
+ XLogWrite(XLogwrtRqst WriteRqst, bool flexible, bool record_write)

When XLogWrite is called with record_write = false, we don't mark
.ready and don't advance lastSegSwitchTime/LSN. At the next time
XLogWrite is called with record_write=true, if lastSegSwitchLSN is
behind the latest segment boundary before or equal to
LogwrtResult.Write, mark the skipped segments as .ready and update
lastSegSwitchTime/LSN.

Thanks for the suggestion. I explored this proposal a bit today.
It looks like there are three places where XLogWrite() is called:
AdvanceXLInsertBuffer(), XLogFlush(), and XLogBackgroundFlush(). IIUC
while XLogFlush() generally seems to be used to write complete records
to disk, this might not be true for XLogBackgroundFlush(), and we're
reasonably sure we cannot make such an assumption for
AdvanceXLInsertBuffer(). Therefore, we would likely only set
record_write to true for XLogFlush() and for certain calls to
XLogBackgroundFlush (e.g. flushing asynchronous commits).

I'm worried that this approach could be fragile and that we could end
up waiting an arbitrarily long time before marking segments as ready
for archival. Even if we pay very close attention to the latest
flushed LSN, it seems possible that a non-record_write call to
XLogWrite() advances things such that we avoid ever calling it with
record_write = true. For example, XLogBackgroundFlush() may have
flushed the completed blocks, which we cannot assume are complete
records. Then, XLogFlush() would skip calling XLogWrite() if
LogwrtResult.Flush is sufficiently far ahead. In this scenario, I
don't think we would mark any eligible segments as ".ready" until the
next call to XLogWrite() with record_write = true, which may never
happen.

The next approach I'm going to try is having the callers of
XLogWrite() manage marking segments ready. That might make it easier
to mitigate some of my concerns above, but I'm not tremendously
optimistic that this approach will fare any better.

Nathan

#13Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#12)
Re: archive status ".ready" files may be created too early

Sorry for the long delay.

I've finally gotten to a new approach that I think is promising. My
previous attempts to fix this within XLogWrite() or within the
associated code paths all seemed to miss corner cases or to add far
too much complexity. The new proof-of-concept patch that I have
attached is much different. Instead of trying to adjust the ready-
for-archive logic in the XLogWrite() code paths, I propose relocating
the ready-for-archive logic to a separate process.

The v3 patch is a proof-of-concept patch that moves the ready-for-
archive logic to the WAL writer process. We mark files as ready-for-
archive when the WAL flush pointer has advanced beyond a known WAL
record boundary. In this patch, I am using the WAL insert location as
the known WAL record boundary. The main idea is that it should be
safe to archive a segment once we know the last WAL record for the WAL
segment, which may overflow into the following segment, has been
completely written to disk.

There are many things missing from this proof-of-concept patch that
will need to be handled if this approach seems reasonable. For
example, I haven't looked into any adjustments needed for the
archive_timeout parameter, I haven't added a way to persist the
"latest segment marked ready-for-archive" through crashes, I haven't
tried reducing the frequency of retrieving the WAL locations, and I'm
not sure the WAL writer process is even the right location for this
logic. However, these remaining problems all seem tractable to me.

I would appreciate your feedback on whether you believe this approach
is worth pursuing.

Nathan

Attachments:

v3-0001-Avoid-marking-WAL-segments-as-ready-for-archive-t.patchapplication/octet-stream; name=v3-0001-Avoid-marking-WAL-segments-as-ready-for-archive-t.patchDownload+43-4
#14Ryo Matsumura (Fujitsu)
matsumura.ryo@fujitsu.com
In reply to: Nathan Bossart (#13)
RE: archive status ".ready" files may be created too early

2020-03-26 18:50:24 Bossart, Nathan <bossartn(at)amazon(dot)com> wrote:

The v3 patch is a proof-of-concept patch that moves the ready-for-
archive logic to the WAL writer process. We mark files as ready-for-
archive when the WAL flush pointer has advanced beyond a known WAL
record boundary.

I like such a simple resolution, but I cannot agree it.

1.
This patch makes wal_writer_delay to have two meanings. For example,
an user setting the parameter to a bigger value gets a archived file
later.

2.
Even if we create a new parameter, we and users cannot determine the
best value.

3.
PostgreSQL guarantees that if a database cluster stopped smartly,
the cluster flushed and archived all WAL record as follows.

[xlog.c]
* If archiving is enabled, rotate the last XLOG file so that all the
* remaining records are archived (postmaster wakes up the archiver
* process one more time at the end of shutdown). The checkpoint
* record will go to the next XLOG file and won't be archived (yet).

Therefore, the idea may need that end-synchronization between WalWriter
and archiver(pgarch). I cannot agree it because processing for stopping
system has complexity inherently and the syncronization makes it more
complicated. Your idea gives up currency of the notifying instead of simplicity,
but I think that the synchronization may ruin its merit.

4.
I found the patch spills a chance for notifying. We have to be more careful.
At the following case, WalWriter will notify after a little less than 3 times
of wal_writer_delay in worst case. It may not be allowed depending on value
of wal_writer_delay. If we create a new parameter, we cannot explain to user about it.

Premise:
- Seg1 has been already notified.
- FlushedPtr is 0/2D00000 (= all WAL record is flushed).

-----
Step 1.
Backend-A updates InsertPtr to 0/2E00000, but does not
copy WAL record to buffer.

Step 2. (sleep)
WalWriter memorize InsertPtr 0/2E00000 to the local variable
(LocalInsertPtr) and sleep because FlushedPtr has not passed
InsertPtr.

Step 3.
Backend-A copies WAL record to buffer.

Step 4.
Backend-B process updates InsertPtr to 0/3100000,
copies their record to buffer, commits (flushes it by itself),
and updates FlushedPtr to 0/3100000.

Step 5.
WalWriter detects that FlushedPtr(0/3100000) passes
LocalInsertPtr(0/2E00000), but WalWriter cannot notify Seg2
though it should be notified.

It is caused by that WalWriter does not know that
which record is crossing segment boundary.

Then, after two sleeping for cheking that InsertPtr passes
FlushedPtr again in worst case, Seg2 is notified.

Step 6. (sleep)
WalWriter sleep.

Step 7.
Backend-C inserts WAL record, flush, and updates as follows:
InsertPtr --> 0/3200000
FlushedPtr --> 0/3200000

Step 8.
Backend-D updates InsertPtr to 0/3300000, but does not copy
record to buffer.

Step 9. (sleep)
WalWriter memorize InsertPtr 0/3300000 to LocalInsertPtr
and sleep because FlushedPtr has been 0/3200000.

Step 10.
Backend-D copies its record.

Step 11.
Someone(Backend-X or WalWriter) flushes and updates FlushedPtr
to 0/3300000.

Step 12.
WalWriter detects that FlushedPtr(0/3300000) passes
LocalInsertPtr(0/3300000) and notify Seg2.
-----

I'm preparing a patch that backend inserting segment-crossboundary
WAL record leaves its EndRecPtr and someone flushing it checks
the EndRecPtr and notifies..

Regards
Ryo Matsumura

#15Nathan Bossart
nathandbossart@gmail.com
In reply to: Ryo Matsumura (Fujitsu) (#14)
Re: archive status ".ready" files may be created too early

On 5/28/20, 11:42 PM, "matsumura.ryo@fujitsu.com" <matsumura.ryo@fujitsu.com> wrote:

I'm preparing a patch that backend inserting segment-crossboundary
WAL record leaves its EndRecPtr and someone flushing it checks
the EndRecPtr and notifies..

Thank you for sharing your thoughts. I will be happy to take a look
at your patch.

Nathan

#16Ryo Matsumura (Fujitsu)
matsumura.ryo@fujitsu.com
In reply to: Nathan Bossart (#15)
RE: archive status ".ready" files may be created too early

On 5/28/20, 11:42 PM, "matsumura.ryo@fujitsu.com" <matsumura.ryo@fujitsu.com>
wrote:

I'm preparing a patch that backend inserting segment-crossboundary
WAL record leaves its EndRecPtr and someone flushing it checks
the EndRecPtr and notifies..

I'm sorry for my slow work.

I attach a patch.
I also attach a simple target test for primary.

1. Description in primary side

[Basic problem]
A process flushing WAL record doesn't know whether the flushed RecPtr is
EndRecPtr of cross-segment-boundary WAL record or not because only process
inserting the WAL record knows it and it never memorizes the information to anywhere.

[Basic concept of the patch in primary]
A process inserting a cross-segment-boundary WAL record memorizes its EndRecPtr
(I call it CrossBoundaryEndRecPtr) to a new structure in XLogCtl.
A flushing process creates .ready (Later, I call it just 'notify'.) against
a segment that is previous one including CrossBoundaryEndRecPtr only when its
flushed RecPtr is equal or greater than the CrossBoundaryEndRecPtr.

[Detail of implementation in primary]
* Structure of CrossBoundaryEndRecPtrs
Requirement of structure is as the following:
- System must memorize multiple CrossBoundaryEndRecPtr.
- A flushing process must determine to notify or not with only flushed RecPtr briefly.

Therefore, I implemented the structure as an array (I call it CrossBoundaryEndRecPtr array)
that is same as xlblck array. Strictly, it is enogh that the length is
'xbuffers/wal_segment_size', but I choose 'xbuffers' for simplicity that makes
enable the flushing process to use XLogRecPtrToBufIdx().
See also the definition of XLogCtl, XLOGShmemSize(), and XLOGShmemInit() in my patch.

* Action of inserting process
A inserting process memorie its CrossBoundaryEndRecPtr to CrossBoundaryEndRecPtr
array element calculated by XLogRecPtrToBufIdx with its CrossBoundaryEndRecPtr.
If the WAL record crosses many segments, only element against last segment
including the EndRecPtr is set and others are not set.
See also CopyXLogRecordToWAL() in my patch.

* Action of flushing process
Overview has been already written as the follwing.
A flushing process creates .ready (Later, I call it just 'notify'.) against
a segment that is previous one including CrossBoundaryEndRecPtr only when its
flushed RecPtr is equal or greater than the CrossBoundaryEndRecPtr.

An additional detail is as the following. The flushing process may notify
many segments if the record crosses many segments, so the process memorizes
latest notified segment number to latestArchiveNotifiedSegNo in XLogCtl.
The process notifies from latestArchiveNotifiedSegNo + 1 to
flushing segment number - 1.

And latestArchiveNotifiedSegNo is set to EndOfLog after Startup process exits
replay-loop. Standby also set same timing (= before promoting).

Mutual exlusion about latestArchiveNotifiedSegNo is not required because
the timing of accessing has been already included in WALWriteLock critical section.

2. Description in standby side

* Who notifies?
walreceiver also doesn't know whether the flushed RecPtr is EndRecPtr of
cross-segment-boundary WAL record or not. In standby, only Startup process
knows the information because it is hidden in WAL record itself and only
Startup process reads and builds WAL record.

* Action of Statup process
Therefore, I implemented that walreceiver never notify and Startup process does it.
In detail, when Startup process reads one full-length WAL record, it notifies
from a segment that includes head(ReadRecPtr) of the record to a previous segment that
includes EndRecPtr of the record.

Now, we must pay attention about switching time line.
The last segment of previous TimeLineID must be notified before switching.
This case is considered when RM_XLOG_ID is detected.

3. About other notifying for segment
Two notifyings for segment are remain. They are not needed to fix.

(1) Notifying for partial segment
It is not needed to be completed, so it's OK to notify without special consideration.

(2) Re-notifying
Currently, Checkpointer has notified through XLogArchiveCheckDone().
It is a safe-net for failure of notifying by backend or WAL writer.
Backend or WAL writer doesn't retry to notify if falis, but Checkpointer retries
to notify when it removes old segment. If it fails to notify, then it does not
remove the segment. It makes Checkpointer to retry notify until the notifying suceeeds.
Also, in this case, we can just notify whithout special consideration
because Checkpointer guarantees that all WAL record included in the segment have been already flushed.

Please, your review and comments.

Regards
Ryo Matsumura

Attachments:

test_in_primary.shapplication/octet-stream; name=test_in_primary.shDownload
bugfix_early_archiving_v1.0.patchapplication/octet-stream; name=bugfix_early_archiving_v1.0.patchDownload+132-24
#17Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Ryo Matsumura (Fujitsu) (#16)
Re: archive status ".ready" files may be created too early

Hello. Matsumura-san.

I agree that WAL writer is not the place to notify segmnet. And the
direction you suggested would work.

At Fri, 19 Jun 2020 10:18:34 +0000, "matsumura.ryo@fujitsu.com" <matsumura.ryo@fujitsu.com> wrote in

1. Description in primary side

[Basic problem]
A process flushing WAL record doesn't know whether the flushed RecPtr is
EndRecPtr of cross-segment-boundary WAL record or not because only process
inserting the WAL record knows it and it never memorizes the information to anywhere.

[Basic concept of the patch in primary]
A process inserting a cross-segment-boundary WAL record memorizes its EndRecPtr
(I call it CrossBoundaryEndRecPtr) to a new structure in XLogCtl.
A flushing process creates .ready (Later, I call it just 'notify'.) against
a segment that is previous one including CrossBoundaryEndRecPtr only when its
flushed RecPtr is equal or greater than the CrossBoundaryEndRecPtr.

...

See also the definition of XLogCtl, XLOGShmemSize(), and XLOGShmemInit() in my patch.

I think we don't need most of that shmem stuff. XLogWrite is called
after WAL buffer is filled up to the requested position. So when it
crosses segment boundary we know the all past corss segment-boundary
records are stable. That means all we need to remember is only the
position of the latest corss-boundary record.

* Action of inserting process
A inserting process memorie its CrossBoundaryEndRecPtr to CrossBoundaryEndRecPtr
array element calculated by XLogRecPtrToBufIdx with its CrossBoundaryEndRecPtr.
If the WAL record crosses many segments, only element against last segment
including the EndRecPtr is set and others are not set.
See also CopyXLogRecordToWAL() in my patch.

If we call XLogMarkEndRecPtrIfNeeded() there, the function is called
every time a record is written, most of which are wasteful.
XLogInsertRecord already has a code block executed only at every page
boundary.

* Action of flushing process
Overview has been already written as the follwing.
A flushing process creates .ready (Later, I call it just 'notify'.) against
a segment that is previous one including CrossBoundaryEndRecPtr only when its
flushed RecPtr is equal or greater than the CrossBoundaryEndRecPtr.

An additional detail is as the following. The flushing process may notify
many segments if the record crosses many segments, so the process memorizes
latest notified segment number to latestArchiveNotifiedSegNo in XLogCtl.
The process notifies from latestArchiveNotifiedSegNo + 1 to
flushing segment number - 1.

And latestArchiveNotifiedSegNo is set to EndOfLog after Startup process exits
replay-loop. Standby also set same timing (= before promoting).

Mutual exlusion about latestArchiveNotifiedSegNo is not required because
the timing of accessing has been already included in WALWriteLock critical section.

Looks reasonable.

2. Description in standby side

* Who notifies?
walreceiver also doesn't know whether the flushed RecPtr is EndRecPtr of
cross-segment-boundary WAL record or not. In standby, only Startup process
knows the information because it is hidden in WAL record itself and only
Startup process reads and builds WAL record.

Standby doesn't write it's own WAL records. Even if primary sent an
immature record on segment boundary, it just would promote to a new
TLI and starts its own history. Nothing breaks. However it could be a
problem if a standby that crashed the problematic way were started
as-is as a primary, such scenario is out of our scope.

Now we can identify stable portion of WAL stream. It's enough to
prevent walsender from sending data that can be overwritten
afterwards. GetReplicationTargetRecPtr() in the attached does that.

* Action of Statup process
Therefore, I implemented that walreceiver never notify and Startup process does it.
In detail, when Startup process reads one full-length WAL record, it notifies
from a segment that includes head(ReadRecPtr) of the record to a previous segment that
includes EndRecPtr of the record.

I don't like that archiving on standby relies on replay progress. We
should avoid that and fortunately I think we dont need it.

Now, we must pay attention about switching time line.
The last segment of previous TimeLineID must be notified before switching.
This case is considered when RM_XLOG_ID is detected.

That segment is archived after renamed as ".partial" later. We don't
archive the last incomplete segment of the previous timeline as-is.

3. About other notifying for segment
Two notifyings for segment are remain. They are not needed to fix.

(1) Notifying for partial segment
It is not needed to be completed, so it's OK to notify without special consideration.

(2) Re-notifying
Currently, Checkpointer has notified through XLogArchiveCheckDone().
It is a safe-net for failure of notifying by backend or WAL writer.
Backend or WAL writer doesn't retry to notify if falis, but Checkpointer retries
to notify when it removes old segment. If it fails to notify, then it does not
remove the segment. It makes Checkpointer to retry notify until the notifying suceeeds.
Also, in this case, we can just notify whithout special consideration
because Checkpointer guarantees that all WAL record included in the segment have been already flushed.

So it can be simplified as the attached. Any thoughts?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

0001-Avoid-to-archive-immature-records.patchtext/x-patch; charset=us-asciiDownload+131-9
#18Ryo Matsumura (Fujitsu)
matsumura.ryo@fujitsu.com
In reply to: Kyotaro Horiguchi (#17)
RE: archive status ".ready" files may be created too early

Hello, Horiguchi-san

Thank you for your comment and patch.

At Thursday, June 25, 2020 3:36 PM(JST), "Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>" wrote in

I think we don't need most of that shmem stuff. XLogWrite is called

I wanted no more shmem stuff too, but other ideas need more lock
that excludes inserter and writer each other.

after WAL buffer is filled up to the requested position. So when it
crosses segment boundary we know the all past corss segment-boundary
records are stable. That means all we need to remember is only the
position of the latest corss-boundary record.

I could not agree. In the following case, it may not work well.
- record-A and record-B (record-B is a newer one) is copied, and
- lastSegContRecStart/End points to record-B's, and
- FlushPtr is proceeded to in the middle of record-A.

In the above case, the writer should notify segments before record-A,
but it notifies ones before record-B. If the writer notifies
only when it flushes the latest record completely, it works well.
But the writer may not be enable to notify any segment forever when
WAL records crossing-segment-boundary are inserted contiunuously.

So I think that we must remeber all such cross-segement-boundary records's EndRecPtr in buffer.

If we call XLogMarkEndRecPtrIfNeeded() there, the function is called
every time a record is written, most of which are wasteful.
XLogInsertRecord already has a code block executed only at every page
boundary.

I agree.
XLogMarkEndRecPtrIfNeeded() is moved into the code block before updating
LogwrtRqst.Write for avoiding passing-each-other with writer.

Now we can identify stable portion of WAL stream. It's enough to
prevent walsender from sending data that can be overwritten
afterwards. GetReplicationTargetRecPtr() in the attached does that.

I didn't notice it.
I agree basically, but it is based on lastSegContRecStart/End.

So, first of all, we have to agree what should be remebered.

Regards
Ryo Matsumura

#19Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Ryo Matsumura (Fujitsu) (#18)
Re: archive status ".ready" files may be created too early

Hello, Matsumura-san.

At Mon, 6 Jul 2020 04:02:23 +0000, "matsumura.ryo@fujitsu.com" <matsumura.ryo@fujitsu.com> wrote in

Hello, Horiguchi-san

Thank you for your comment and patch.

At Thursday, June 25, 2020 3:36 PM(JST), "Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>" wrote in

I think we don't need most of that shmem stuff. XLogWrite is called

I wanted no more shmem stuff too, but other ideas need more lock
that excludes inserter and writer each other.

after WAL buffer is filled up to the requested position. So when it
crosses segment boundary we know the all past corss segment-boundary
records are stable. That means all we need to remember is only the
position of the latest corss-boundary record.

I could not agree. In the following case, it may not work well.
- record-A and record-B (record-B is a newer one) is copied, and
- lastSegContRecStart/End points to record-B's, and
- FlushPtr is proceeded to in the middle of record-A.

IIUC, that means record-B is a cross segment-border record and we hav e
flushed beyond the recrod-B. In that case crash recovery afterwards
can read the complete record-B and will finish recovery *after* the
record-B. That's what we need here.

In the above case, the writer should notify segments before record-A,
but it notifies ones before record-B. If the writer notifies

If you mean that NotifyStableSegments notifies up-to the previous
segment of the segment where record-A is placed, that's wrong. The
issue here is crash recovery sees an incomplete record at a
segment-border. So it is sufficient that crash recoery can read the
last record by looking pg_wal.

only when it flushes the latest record completely, it works well.

It confirms that "lastSegContRecEnd < LogwrtResult.Flush", that means
the last record(B) is completely flushed-out, isn't that? So it works
well.

But the writer may not be enable to notify any segment forever when
WAL records crossing-segment-boundary are inserted contiunuously.

No. As I mentioned in the preivous main, if we see a
cross-segment-boundary record, the previous cross-segment-boundary
record is flushed completely, and the segment containing the
first-half of the previous cross-segment-boundary record has already
been flushed. I didin't that but we can put an assertion in
XLogInsertRecord like this:

 +      /* Remember the range of the record if it spans over segments */
 +      XLByteToSeg(StartPos, startseg, wal_segment_size);
 +      XLByteToPrevSeg(EndPos, endseg, wal_segment_size);
 +
 +      if (startseg != endseg)
 +      {
++          /* we shouldn't have a record spanning over three or more segments */
++          Assert(endseg = startseg + 1);
 +          SpinLockAcquire(&XLogCtl->info_lck);
 +          if (XLogCtl->lastSegContRecEnd < StartPos)
 +          {
 +              XLogCtl->lastSegContRecStart = StartPos;
 +              XLogCtl->lastSegContRecEnd = EndPos;

So I think that we must remeber all such cross-segement-boundary records's EndRecPtr in buffer.

If we call XLogMarkEndRecPtrIfNeeded() there, the function is called
every time a record is written, most of which are wasteful.
XLogInsertRecord already has a code block executed only at every page
boundary.

I agree.
XLogMarkEndRecPtrIfNeeded() is moved into the code block before updating
LogwrtRqst.Write for avoiding passing-each-other with writer.

Now we can identify stable portion of WAL stream. It's enough to
prevent walsender from sending data that can be overwritten
afterwards. GetReplicationTargetRecPtr() in the attached does that.

I didn't notice it.
I agree basically, but it is based on lastSegContRecStart/End.

So, first of all, we have to agree what should be remebered.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#20Ryo Matsumura (Fujitsu)
matsumura.ryo@fujitsu.com
In reply to: Kyotaro Horiguchi (#19)
RE: archive status ".ready" files may be created too early

Hello, Horiguchi-san

At Monday, July 6, 2020 05:13:40 +0000, "Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>" wrote in

after WAL buffer is filled up to the requested position. So when it
crosses segment boundary we know the all past corss segment-boundary
records are stable. That means all we need to remember is only the
position of the latest corss-boundary record.

I could not agree. In the following case, it may not work well.
- record-A and record-B (record-B is a newer one) is copied, and
- lastSegContRecStart/End points to record-B's, and
- FlushPtr is proceeded to in the middle of record-A.

IIUC, that means record-B is a cross segment-border record and we hav e
flushed beyond the recrod-B. In that case crash recovery afterwards
can read the complete record-B and will finish recovery *after* the
record-B. That's what we need here.

I'm sorry I didn't explain enough.

Record-A and Record-B are cross segment-border records.
Record-A spans segment X and X+1
Record-B spans segment X+2 and X+3.
If both records have been inserted to WAL buffer, lastSegContRecStart/End points to Record-B.
If a writer flushes upto the middle of segment-X+1, NotifyStableSegments() allows the writer to notify segment-X.
Is my understanding correct?

Regards
Ryo Matsumrua

#21Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Ryo Matsumura (Fujitsu) (#20)
#22Ryo Matsumura (Fujitsu)
matsumura.ryo@fujitsu.com
In reply to: Kyotaro Horiguchi (#21)
#23Michael Paquier
michael@paquier.xyz
In reply to: Ryo Matsumura (Fujitsu) (#22)
#24Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Ryo Matsumura (Fujitsu) (#20)
#25Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Heikki Linnakangas (#24)
#26Anastasia Lubennikova
a.lubennikova@postgrespro.ru
In reply to: Kyotaro Horiguchi (#25)
#27Nathan Bossart
nathandbossart@gmail.com
In reply to: Anastasia Lubennikova (#26)
#28Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Nathan Bossart (#27)
#29Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#28)
#30Nathan Bossart
nathandbossart@gmail.com
In reply to: Kyotaro Horiguchi (#29)
#31Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Nathan Bossart (#30)
#32Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#29)
#33Andrey Borodin
amborodin@acm.org
In reply to: Kyotaro Horiguchi (#32)
#34Andrey Borodin
amborodin@acm.org
In reply to: Kyotaro Horiguchi (#32)
#35Nathan Bossart
nathandbossart@gmail.com
In reply to: Andrey Borodin (#34)
#36Nathan Bossart
nathandbossart@gmail.com
In reply to: Andrey Borodin (#33)
#37Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Nathan Bossart (#35)
#38Nathan Bossart
nathandbossart@gmail.com
In reply to: Kyotaro Horiguchi (#37)
#39Nathan Bossart
nathandbossart@gmail.com
In reply to: Kyotaro Horiguchi (#37)
#40Nathan Bossart
nathandbossart@gmail.com
In reply to: Kyotaro Horiguchi (#37)
#41Nathan Bossart
nathandbossart@gmail.com
In reply to: Kyotaro Horiguchi (#37)
#42Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Nathan Bossart (#40)
#43Nathan Bossart
nathandbossart@gmail.com
In reply to: Alvaro Herrera (#42)
#44Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Nathan Bossart (#43)
#45Nathan Bossart
nathandbossart@gmail.com
In reply to: Alvaro Herrera (#44)
#46Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Nathan Bossart (#45)
#47Nathan Bossart
nathandbossart@gmail.com
In reply to: Alvaro Herrera (#46)
#48Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Nathan Bossart (#45)
#49Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#48)
#50Nathan Bossart
nathandbossart@gmail.com
In reply to: Alvaro Herrera (#48)
#51Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Nathan Bossart (#50)
#52Nathan Bossart
nathandbossart@gmail.com
In reply to: Alvaro Herrera (#51)
#53Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Nathan Bossart (#52)
#54Nathan Bossart
nathandbossart@gmail.com
In reply to: Alvaro Herrera (#53)
#55Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Nathan Bossart (#54)
#56Nathan Bossart
nathandbossart@gmail.com
In reply to: Kyotaro Horiguchi (#55)
#57Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Nathan Bossart (#56)
#58Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#57)
#59Nathan Bossart
nathandbossart@gmail.com
In reply to: Kyotaro Horiguchi (#57)
#60Nathan Bossart
nathandbossart@gmail.com
In reply to: Kyotaro Horiguchi (#58)
#61Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Nathan Bossart (#60)
#62Nathan Bossart
nathandbossart@gmail.com
In reply to: Kyotaro Horiguchi (#61)
#63Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Nathan Bossart (#62)
#64Nathan Bossart
nathandbossart@gmail.com
In reply to: Kyotaro Horiguchi (#63)
#65Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Nathan Bossart (#64)
#66Nathan Bossart
nathandbossart@gmail.com
In reply to: Alvaro Herrera (#65)
#67Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Nathan Bossart (#66)
#68Nathan Bossart
nathandbossart@gmail.com
In reply to: Alvaro Herrera (#67)
#69Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Nathan Bossart (#68)
#70Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Nathan Bossart (#64)
#71Nathan Bossart
nathandbossart@gmail.com
In reply to: Alvaro Herrera (#70)
#72Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Nathan Bossart (#71)
#73Nathan Bossart
nathandbossart@gmail.com
In reply to: Alvaro Herrera (#72)
#74Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Nathan Bossart (#73)
#75Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#74)
#76Nathan Bossart
nathandbossart@gmail.com
In reply to: Alvaro Herrera (#75)
#77Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Nathan Bossart (#76)
#78Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#77)
#79Nathan Bossart
nathandbossart@gmail.com
In reply to: Alvaro Herrera (#78)
#80Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Nathan Bossart (#79)
#81Nathan Bossart
nathandbossart@gmail.com
In reply to: Alvaro Herrera (#80)
#82Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Nathan Bossart (#81)
#83Nathan Bossart
nathandbossart@gmail.com
In reply to: Alvaro Herrera (#82)
#84Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Nathan Bossart (#83)
#85Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#84)
#86Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#85)
#87Nathan Bossart
nathandbossart@gmail.com
In reply to: Robert Haas (#86)
#88Robert Haas
robertmhaas@gmail.com
In reply to: Nathan Bossart (#87)
#89Nathan Bossart
nathandbossart@gmail.com
In reply to: Robert Haas (#88)
#90Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Nathan Bossart (#87)
#91Robert Haas
robertmhaas@gmail.com
In reply to: Nathan Bossart (#89)
#92Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#90)
#93Nathan Bossart
nathandbossart@gmail.com
In reply to: Robert Haas (#92)
#94Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Nathan Bossart (#93)
#95Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#94)
#96Nathan Bossart
nathandbossart@gmail.com
In reply to: Alvaro Herrera (#95)
#97Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Nathan Bossart (#96)
#98Nathan Bossart
nathandbossart@gmail.com
In reply to: Alvaro Herrera (#97)
#99Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Nathan Bossart (#98)
#100Nathan Bossart
nathandbossart@gmail.com
In reply to: Alvaro Herrera (#99)
#101Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Nathan Bossart (#100)
#102Nathan Bossart
nathandbossart@gmail.com
In reply to: Alvaro Herrera (#101)
#103Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Nathan Bossart (#102)
#104Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#103)
#105Nathan Bossart
nathandbossart@gmail.com
In reply to: Alvaro Herrera (#104)
#106Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Nathan Bossart (#105)
#107Nathan Bossart
nathandbossart@gmail.com
In reply to: Alvaro Herrera (#106)
#108Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Nathan Bossart (#107)
#109Fujii Masao
masao.fujii@gmail.com
In reply to: Alvaro Herrera (#106)
#110Nathan Bossart
nathandbossart@gmail.com
In reply to: Fujii Masao (#109)
#111Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#106)
#112Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andres Freund (#111)
#113Nathan Bossart
nathandbossart@gmail.com
In reply to: Alvaro Herrera (#112)
#114Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#112)
#115Nathan Bossart
nathandbossart@gmail.com
In reply to: Andres Freund (#114)
#116Nathan Bossart
nathandbossart@gmail.com
In reply to: Andres Freund (#114)
#117Andres Freund
andres@anarazel.de
In reply to: Nathan Bossart (#115)
#118Nathan Bossart
nathandbossart@gmail.com
In reply to: Andres Freund (#117)
#119Andres Freund
andres@anarazel.de
In reply to: Nathan Bossart (#118)
#120Nathan Bossart
nathandbossart@gmail.com
In reply to: Andres Freund (#119)
#121Andres Freund
andres@anarazel.de
In reply to: Nathan Bossart (#120)
#122Nathan Bossart
nathandbossart@gmail.com
In reply to: Andres Freund (#121)
#123Andres Freund
andres@anarazel.de
In reply to: Nathan Bossart (#122)
#124Nathan Bossart
nathandbossart@gmail.com
In reply to: Andres Freund (#123)
#125Andres Freund
andres@anarazel.de
In reply to: Nathan Bossart (#124)
#126Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andres Freund (#125)
#127Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#126)
#128Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Paquier (#127)