Shared WAL archive between master and standby: WALs not always identical

adrian.klaver@aklaver.com

about 9 years ago

In reply to: Sasa Vilic (#1)

Re: Shared WAL archive between master and standby: WALs not always identical

On 02/27/2017 04:40 PM, Sasa Vilic wrote:

Hallo,

I am trying to setup shared WAL archive between master and standby.
Standby is synchronously streaming from master and both servers run with
archive_mode = always. The ideas is that when promoting standby to
master we would not missed WALs.

I seem to be missing the point of duplicating your effort.

You are doing this, correct?:

Master WAL --> WAL archive <--
|
Master stream --> Standby --> |

I can't see how the Standby contributes anything to the archive that it
does not already have from the Master?

My problem is that sometimes WAL uploaded from master and from slave are
not 100% identical. In most cases they are but occasionally they are
not. I have written small script that ensures that upload is free of
race condition and I log md5 sum of each WAL. Aren't WALs from master
and standby supposed to be identical? After all, standby is just
consuming WAL that it is receiving from master ...

Or do you have any better suggestion on how to achieve continuous
incremental backup?

Thanks in advance

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

https://www.postgresql.org/docs/9.6/static/warm-standby.html#CONTINUOUS-ARCHIVING-IN-STANDBY

david.g.johnston@gmail.com

about 9 years ago

In reply to: Sasa Vilic (#1)

Re: Shared WAL archive between master and standby: WALs not always identical

On Mon, Feb 27, 2017 at 5:40 PM, Sasa Vilic <sasavilic@gmail.com> wrote:

Aren't WALs from master and standby supposed to be identical?

This would seem unwise to assume on its face and at least one piece of
documentation directly mentions that it is false:

"""
When continuous WAL archiving is used in a standby, there are two different
scenarios: the WAL archive can be shared between the primary and the
standby, or the standby can have its own WAL archive. When the standby has
its own WAL archive, set archive_mode to always, and the standby will call
the archive command for every WAL segment it receives, whether it's by
restoring from the archive or by streaming replication. *The shared archive
can be handled similarly, but the archive_command must test if the file
being archived exists already, and if the existing file has identical
contents*. This requires more care in the archive_command, as it must be
careful to not overwrite an existing file with different contents, but
return success if the exactly same file is archived twice. And all that
must be done free of race conditions, if two servers attempt to archive the
same file at the same time.
"""

The contents of both must match with respect to the data files but there
are likely things that go into the master WAL stream solely for the purpose
of communicating with a standby - and possibly some standby concepts that
would be unique to the standby's WAL - that would cause them to differ.
Not familiar enough to quickly list examples of what those might be. But
IIUC the system seems designed around master->slave replication and doesn't
support slave daisy-chains.

David J.

sasavilic@gmail.com

about 9 years ago

In reply to: Adrian Klaver (#2)

Re: Shared WAL archive between master and standby: WALs not always identical

Master is streaming directly to standby. Both master and standby are
pushing WALs to archive.

My point is that in case that master crashed completely (and we failover to
standby) and wal archiver on master didn't push everything to wal archive,
we would still have a wal pushed from slave. Therefore there is no
interruption in WAL stream.

Regards,
Sasa

On 28 February 2017 at 01:57, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

Show quoted text

On 02/27/2017 04:40 PM, Sasa Vilic wrote:

Hallo,

I am trying to setup shared WAL archive between master and standby.
Standby is synchronously streaming from master and both servers run with
archive_mode = always. The ideas is that when promoting standby to
master we would not missed WALs.

I seem to be missing the point of duplicating your effort.

You are doing this, correct?:

Master WAL --> WAL archive <--
|
Master stream --> Standby --> |

I can't see how the Standby contributes anything to the archive that it
does not already have from the Master?

My problem is that sometimes WAL uploaded from master and from slave are
not 100% identical. In most cases they are but occasionally they are
not. I have written small script that ensures that upload is free of
race condition and I log md5 sum of each WAL. Aren't WALs from master
and standby supposed to be identical? After all, standby is just
consuming WAL that it is receiving from master ...

Or do you have any better suggestion on how to achieve continuous
incremental backup?

Thanks in advance

--
Adrian Klaver
adrian.klaver@aklaver.com

sasavilic@gmail.com

about 9 years ago

In reply to: David G. Johnston (#3)

Re: Shared WAL archive between master and standby: WALs not always identical

Hi David,

thanks for the answer. I read this in documentation but here there is a
corner case that I am not sure how to handle:
"""
This requires more care in the archive_command, as it must be careful to
not overwrite an existing file with different contents, *but return success
if the exactly same file is archived twice.*
"""
But what I am supposed to do when content differs? Still return success and
ignore or return error? If I return error, wouldn't that prevent wal
archiver slave from pushing further WALs?

Regards,
Sasa

On 28 February 2017 at 02:10, David G. Johnston <david.g.johnston@gmail.com>
wrote:

Show quoted text

On Mon, Feb 27, 2017 at 5:40 PM, Sasa Vilic <sasavilic@gmail.com> wrote:

Aren't WALs from master and standby supposed to be identical?

This would seem unwise to assume on its face and at least one piece of
documentation directly mentions that it is false:

https://www.postgresql.org/docs/9.6/static/warm-standby.
html#CONTINUOUS-ARCHIVING-IN-STANDBY

"""
When continuous WAL archiving is used in a standby, there are two
different scenarios: the WAL archive can be shared between the primary and
the standby, or the standby can have its own WAL archive. When the standby
has its own WAL archive, set archive_mode to always, and the standby will
call the archive command for every WAL segment it receives, whether it's by
restoring from the archive or by streaming replication. *The shared
archive can be handled similarly, but the archive_command must test if the
file being archived exists already, and if the existing file has identical
contents*. This requires more care in the archive_command, as it must be
careful to not overwrite an existing file with different contents, but
return success if the exactly same file is archived twice. And all that
must be done free of race conditions, if two servers attempt to archive the
same file at the same time.
"""

The contents of both must match with respect to the data files but there
are likely things that go into the master WAL stream solely for the purpose
of communicating with a standby - and possibly some standby concepts that
would be unique to the standby's WAL - that would cause them to differ.
Not familiar enough to quickly list examples of what those might be. But
IIUC the system seems designed around master->slave replication and doesn't
support slave daisy-chains.

David J.

sasavilic@gmail.com

about 9 years ago

In reply to: Sasa Vilic (#5)

Re: Shared WAL archive between master and standby: WALs not always identical

And also this:

"""
If archive_mode is set to on, the archiver is not enabled during recovery
or standby mode. If the standby server is promoted, it will start archiving
after the promotion, but will not archive any WAL it did not generate
itself. To get a complete series of WAL files in the archive, you must
ensure that all WAL is archived, before it reaches the standby
"""

If I understand it correctly, WAL started on master but finished on standby
(after failover) will still not be (old master is presumably dead and new
master must wait for next WAL segment). Of course, the next WAL segment
will be sent by new master, but we are going to miss exactly this one WAL
segment during which failover occurred and thus introduce interruption in
our WAL stream. Am I right?

Regards,
Sasa

On 28 February 2017 at 02:33, Sasa Vilic <sasavilic@gmail.com> wrote:

Show quoted text

Hi David,

thanks for the answer. I read this in documentation but here there is a
corner case that I am not sure how to handle:
"""
This requires more care in the archive_command, as it must be careful to
not overwrite an existing file with different contents, *but return
success if the exactly same file is archived twice.*
"""
But what I am supposed to do when content differs? Still return success
and ignore or return error? If I return error, wouldn't that prevent wal
archiver slave from pushing further WALs?

Regards,
Sasa

On 28 February 2017 at 02:10, David G. Johnston <
david.g.johnston@gmail.com> wrote:

On Mon, Feb 27, 2017 at 5:40 PM, Sasa Vilic <sasavilic@gmail.com> wrote:

Aren't WALs from master and standby supposed to be identical?

This would seem unwise to assume on its face and at least one piece of
documentation directly mentions that it is false:

https://www.postgresql.org/docs/9.6/static/warm-standby.html
#CONTINUOUS-ARCHIVING-IN-STANDBY

"""
When continuous WAL archiving is used in a standby, there are two
different scenarios: the WAL archive can be shared between the primary and
the standby, or the standby can have its own WAL archive. When the standby
has its own WAL archive, set archive_mode to always, and the standby will
call the archive command for every WAL segment it receives, whether it's by
restoring from the archive or by streaming replication. *The shared
archive can be handled similarly, but the archive_command must test if the
file being archived exists already, and if the existing file has identical
contents*. This requires more care in the archive_command, as it must be
careful to not overwrite an existing file with different contents, but
return success if the exactly same file is archived twice. And all that
must be done free of race conditions, if two servers attempt to archive the
same file at the same time.
"""

The contents of both must match with respect to the data files but there
are likely things that go into the master WAL stream solely for the purpose
of communicating with a standby - and possibly some standby concepts that
would be unique to the standby's WAL - that would cause them to differ.
Not familiar enough to quickly list examples of what those might be. But
IIUC the system seems designed around master->slave replication and doesn't
support slave daisy-chains.

David J.

david.g.johnston@gmail.com

about 9 years ago

In reply to: Sasa Vilic (#5)

Re: Shared WAL archive between master and standby: WALs not always identical

On Mon, Feb 27, 2017 at 6:33 PM, Sasa Vilic <sasavilic@gmail.com> wrote:

Hi David,

thanks for the answer. I read this in documentation but here there is a
corner case that I am not sure how to handle:
"""
This requires more care in the archive_command, as it must be careful to
not overwrite an existing file with different contents, *but return
success if the exactly same file is archived twice.*
"""
But what I am supposed to do when content differs? Still return success
and ignore or return error? If I return error, wouldn't that prevent wal
archiver slave from pushing further WALs?

As Adrian said - I'm not seeing the point to even dealing with a shared
archive. My solution would be to avoid the problem completely by pointing
the standby WAL elsewhere.

That said, if I was a guessing man, I would say that, yes, you indicate
failure. The file in question will exist within the archive and will
contain the contents from the master. The standby's view of the file would
be discarded.

"as it must be careful to not overwrite an existing file with different
contents, but return success if the exactly same file is archived twice." -
the unspoken flip side is not returning true if the "not overwrite"
provision took precedence.

David J.

adrian.klaver@aklaver.com

about 9 years ago

In reply to: Sasa Vilic (#4)

Re: Shared WAL archive between master and standby: WALs not always identical

On 02/27/2017 05:29 PM, Sasa Vilic wrote:

Master is streaming directly to standby. Both master and standby are
pushing WALs to archive.

My point is that in case that master crashed completely (and we failover
to standby) and wal archiver on master didn't push everything to wal
archive, we would still have a wal pushed from slave. Therefore there is
no interruption in WAL stream.

Still failing to see how the standby can have more information then what
the master had sent to it at the time of the crash.

Regards,
Sasa

On 28 February 2017 at 01:57, Adrian Klaver <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>> wrote:

On 02/27/2017 04:40 PM, Sasa Vilic wrote:

Hallo,

I am trying to setup shared WAL archive between master and standby.
Standby is synchronously streaming from master and both servers
run with
archive_mode = always. The ideas is that when promoting standby to
master we would not missed WALs.

I seem to be missing the point of duplicating your effort.

You are doing this, correct?:

Master WAL --> WAL archive <--
|
Master stream --> Standby --> |

I can't see how the Standby contributes anything to the archive that
it does not already have from the Master?

My problem is that sometimes WAL uploaded from master and from
slave are
not 100% identical. In most cases they are but occasionally they are
not. I have written small script that ensures that upload is free of
race condition and I log md5 sum of each WAL. Aren't WALs from
master
and standby supposed to be identical? After all, standby is just
consuming WAL that it is receiving from master ...

Or do you have any better suggestion on how to achieve continuous
incremental backup?

Thanks in advance

--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

david.g.johnston@gmail.com

about 9 years ago

In reply to: Sasa Vilic (#6)

Re: Shared WAL archive between master and standby: WALs not always identical

It is customary to inline or bottom-posts on these lists. Please follow
the example of those responding to your emails.

On Mon, Feb 27, 2017 at 6:45 PM, Sasa Vilic <sasavilic@gmail.com> wrote:

And also this:

"""
If archive_mode is set to on, the archiver is not enabled during recovery
or standby mode. If the standby server is promoted, it will start archiving
after the promotion, but will not archive any WAL it did not generate
itself. To get a complete series of WAL files in the archive, you must
ensure that all WAL is archived, before it reaches the standby
"""

If I understand it correctly, WAL started on master but finished on
standby (after failover) will still not be (old master is presumably dead
and new master must wait for next WAL segment). Of course, the next WAL
segment will be sent by new master, but we are going to miss exactly this
one WAL segment during which failover occurred and thus introduce
interruption in our WAL stream. Am I right?

Requires knowledge and familiarity I present lack. Sorry. I think I see
where you are going with all of this but it would probably help to
explicitly state the overall concern or plan and not just ask how specific
mechanics work in isolation.

David J.

#10

sasavilic@gmail.com

about 9 years ago

In reply to: Adrian Klaver (#8)

Re: Shared WAL archive between master and standby: WALs not always identical

Because standby is running in syncronous replication, whereby wal archiver
is asynchronous. Therefore there is a small window where slave has received
the data but master has not pushed it yet to wal archive.

Regards,
Sasa

Am 28.02.2017 02:48 schrieb "Adrian Klaver" <adrian.klaver@aklaver.com>:

Show quoted text

On 02/27/2017 05:29 PM, Sasa Vilic wrote:

Master is streaming directly to standby. Both master and standby are
pushing WALs to archive.

My point is that in case that master crashed completely (and we failover
to standby) and wal archiver on master didn't push everything to wal
archive, we would still have a wal pushed from slave. Therefore there is
no interruption in WAL stream.

Still failing to see how the standby can have more information then what
the master had sent to it at the time of the crash.

Regards,
Sasa

On 28 February 2017 at 01:57, Adrian Klaver <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>> wrote:

On 02/27/2017 04:40 PM, Sasa Vilic wrote:

Hallo,

I am trying to setup shared WAL archive between master and
standby.
Standby is synchronously streaming from master and both servers
run with
archive_mode = always. The ideas is that when promoting standby to
master we would not missed WALs.

I seem to be missing the point of duplicating your effort.

You are doing this, correct?:

Master WAL --> WAL archive <--
|
Master stream --> Standby --> |

I can't see how the Standby contributes anything to the archive that
it does not already have from the Master?

My problem is that sometimes WAL uploaded from master and from
slave are
not 100% identical. In most cases they are but occasionally they
are
not. I have written small script that ensures that upload is free
of
race condition and I log md5 sum of each WAL. Aren't WALs from
master
and standby supposed to be identical? After all, standby is just
consuming WAL that it is receiving from master ...

Or do you have any better suggestion on how to achieve continuous
incremental backup?

Thanks in advance

--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>

--
Adrian Klaver
adrian.klaver@aklaver.com

#11

https://www.postgresql.org/docs/9.5/static/warm-standby.html

david.g.johnston@gmail.com

about 9 years ago

In reply to: David G. Johnston (#3)

Re: Shared WAL archive between master and standby: WALs not always identical

On Mon, Feb 27, 2017 at 6:10 PM, David G. Johnston <
david.g.johnston@gmail.com> wrote:

But IIUC the system seems designed around master->slave replication and
doesn't support slave daisy-chains.

I thought that sounded wrong when I wrote it...

David J.

#12

sasavilic@gmail.com

about 9 years ago

In reply to: David G. Johnston (#9)

Re: Shared WAL archive between master and standby: WALs not always identical

Am 28.02.2017 02:50 schrieb "David G. Johnston" <david.g.johnston@gmail.com

:

It is customary to inline or bottom-posts on these lists. Please follow
the example of those responding to your emails.

On Mon, Feb 27, 2017 at 6:45 PM, Sasa Vilic <sasavilic@gmail.com> wrote:

And also this:

"""
If archive_mode is set to on, the archiver is not enabled during recovery
or standby mode. If the standby server is promoted, it will start archiving
after the promotion, but will not archive any WAL it did not generate
itself. To get a complete series of WAL files in the archive, you must
ensure that all WAL is archived, before it reaches the standby
"""

If I understand it correctly, WAL started on master but finished on
standby (after failover) will still not be (old master is presumably dead
and new master must wait for next WAL segment). Of course, the next WAL
segment will be sent by new master, but we are going to miss exactly this
one WAL segment during which failover occurred and thus introduce
interruption in our WAL stream. Am I right?

David J.

Hi David,

sorry about email formating. I didn't realize that such convetions exists
as I am first time here + I am currently using gmail mobile app. I hope it
looks good this time. :)

My general idea is to have synchronous hot standby and asynchronous shared
wal archive. If that were possible I could actually switch back and forth
between master and slave without interrupting wal stream and with very
short downtime. This also makes PostgreSQL upgrade very easy. Alternative
to this is to have separate backup for master and slave, but this has other
disadvantages:

* I need double disk space for backup if I am going to archive WALs from
standby and master at the same time, or
* I could only archive WALs from current master, but that would require
performing base backup immediately after failover. Otherwise archived WALs
are useless.

Both of these solutions are good solution but not perfect. I thought that
shared wal archive is possible, based on PostgreSQL documention. I also
assume that requirement "not to overwrite existing WAL with different
content" was only there to prevent from accidental mis-configuration.

I wasn't aware that standby server would produce different WAL. My current
test setup looks just like that. And it works, except for 2-3 WALs per day
that are not identical. Everything else is same. I can even restore backup
without issues, but those non-identical WALs still throuble me because I
fear that I am missing something big.

I wrote a small python script to perform WAL decoding and I hope get more
information tommorow morning. Since I was playing today with repmgr (which
internaly uses pg_rewind for switchover) I got suspicios that this could be
it.

Anyway, I hopped that someone had similar setup and experience with it. :(

Sorry for such lengthly email.

Regards,
Sasa

#13

david.g.johnston@gmail.com

about 9 years ago

In reply to: Sasa Vilic (#12)

Re: Shared WAL archive between master and standby: WALs not always identical

On Mon, Feb 27, 2017 at 7:32 PM, Sasa Vilic <sasavilic@gmail.com> wrote:

My general idea is to have synchronous hot standby and asynchronous shared
wal archive. If that were possible I could actually switch back and forth
between master and slave without interrupting wal stream and with very
short downtime. This also makes PostgreSQL upgrade very easy. Alternative
to this is to have separate backup for master and slave, but this has other
disadvantages:

* I need double disk space for backup if I am going to archive WALs from
standby and master at the same time, or
* I could only archive WALs from current master, but that would require
performing base backup immediately after failover. Otherwise archived WALs
are useless.

I became a bit skeptical when I read:

"To get a complete series of WAL files in the archive, you must ensure that
all WAL is archived, before it reaches the standby. This is inherently true
with file-based log shipping, as the standby can only restore files that
are found in the archive, but not if streaming replication is enabled."

given the lack of any hint as to how one would accomplish "a complete
series of WAL files in the archive" in streaming replication mode...

Maybe I'm just missing something here - but it does seem like you don't get
to have your cake and eat it...

Dave

#14

https://www.postgresql.org/docs/9.6/static/warm-standby-failover.html

adrian.klaver@aklaver.com

about 9 years ago

In reply to: Sasa Vilic (#10)

Re: Shared WAL archive between master and standby: WALs not always identical

On 02/27/2017 05:52 PM, Sasa Vilic wrote:

Because standby is running in syncronous replication, whereby wal
archiver is asynchronous. Therefore there is a small window where slave
has received the data but master has not pushed it yet to wal archive.

Exactly. The standby already has the latest information, it would gain
nothing from fetching it from the archive and anything it wrote to the
archive would only have the information it got from the master at the
point of failure. That is before you promoted it, after it would be on
its own path independent of the master.

See here for more info:

Regards,
Sasa

Am 28.02.2017 02:48 schrieb "Adrian Klaver" <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>>:

On 02/27/2017 05:29 PM, Sasa Vilic wrote:

Master is streaming directly to standby. Both master and standby are
pushing WALs to archive.

My point is that in case that master crashed completely (and we
failover
to standby) and wal archiver on master didn't push everything to wal
archive, we would still have a wal pushed from slave. Therefore
there is
no interruption in WAL stream.

Still failing to see how the standby can have more information then
what the master had sent to it at the time of the crash.

Regards,
Sasa

On 28 February 2017 at 01:57, Adrian Klaver
<adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
<mailto:adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>>> wrote:

On 02/27/2017 04:40 PM, Sasa Vilic wrote:

Hallo,

I am trying to setup shared WAL archive between master
and standby.
Standby is synchronously streaming from master and both
servers
run with
archive_mode = always. The ideas is that when promoting
standby to
master we would not missed WALs.

I seem to be missing the point of duplicating your effort.

You are doing this, correct?:

Master WAL --> WAL archive <--
|
Master stream --> Standby --> |

I can't see how the Standby contributes anything to the
archive that
it does not already have from the Master?

My problem is that sometimes WAL uploaded from master
and from
slave are
not 100% identical. In most cases they are but
occasionally they are
not. I have written small script that ensures that
upload is free of
race condition and I log md5 sum of each WAL. Aren't
WALs from
master
and standby supposed to be identical? After all, standby
is just
consuming WAL that it is receiving from master ...

Or do you have any better suggestion on how to achieve
continuous
incremental backup?

Thanks in advance

--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
<mailto:adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>>

--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#15

sasavilic@gmail.com

about 9 years ago

In reply to: Adrian Klaver (#14)

Re: Shared WAL archive between master and standby: WALs not always identical

On 2017-02-28 06:14, Adrian Klaver wrote:

On 02/27/2017 05:52 PM, Sasa Vilic wrote:

Because standby is running in syncronous replication, whereby wal
archiver is asynchronous. Therefore there is a small window where slave
has received the data but master has not pushed it yet to wal archive.

Exactly. The standby already has the latest information, it would gain
nothing from fetching it from the archive and anything it wrote to the
archive would only have the information it got from the master at the
point of failure. That is before you promoted it, after it would be on
its own path independent of the master.

Hi Adrian,

I am afraid that you don't understand me.

Standby is not fetching WAL from archive, it fetches it directly from
master and is done synchronously, which means that master will only
confirm transaction to client when WAL is streamed and applied at
standby. On the other hand, master does not have to wait for WAL
archiver. If master crashes before WAL archiver is able to send WAL, we
would still have it on standby.

Let us for the sake of demonstration consider that we have same very low
busy but very critical system:

1. Your client connects to primary server (master) and performs changes
on data
2. It just happen that this is a moment where PostgreSQL opens new WAL
segment. It writes few kilobytes in this new WAL segment but it has
almost 16MB to write before segment is complete. So the wal archiver has
to wait before it can push wal segment in wal archive
3. Secondary server (standby) is doing same, it is writing changes to
newly created WAL segment
4. Your client issues COMMIT
- primary waits until changes are applied at secondary
- primary flushes changes to WAL
- secondary confirms transaction to primary
- primary confirms transaction to client
- WAL is still not processed by wal archiver because it is only
i.e. 1 MB big and we are still left 15MB to go
5. Primary server crashes, i.e. due to catastrophic disk failure
- everything stops and can't be recovered
- wal archiver is dead, but even if it were alive it wouldn't send
WAL to archive anyway because 16MB of wal segment was not filled up
6. We promote our secondary server to master
- In secondary server's WAL we already got changes from primary
- Secondary continues appending new changes to wal segment
7. Eventually WAL segment on secondary fills up and then pushes it to
wal archive.
8. Although primary is dead, we didn't loose anything because lost WAL
data was pushed by secondary.

Regards,
Sasa

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#16

https://www.postgresql.org/docs/9.6/static/runtime-config-wal.html#RUNTIME-CONFIG-WAL-ARCHIVING

adrian.klaver@aklaver.com

about 9 years ago

In reply to: Sasa Vilic (#15)

Re: Shared WAL archive between master and standby: WALs not always identical

On 02/27/2017 11:14 PM, Sasa Vilic wrote:

On 2017-02-28 06:14, Adrian Klaver wrote:

On 02/27/2017 05:52 PM, Sasa Vilic wrote:

Because standby is running in syncronous replication, whereby wal
archiver is asynchronous. Therefore there is a small window where slave
has received the data but master has not pushed it yet to wal archive.

Exactly. The standby already has the latest information, it would gain
nothing from fetching it from the archive and anything it wrote to the
archive would only have the information it got from the master at the
point of failure. That is before you promoted it, after it would be on
its own path independent of the master.

Hi Adrian,

I am afraid that you don't understand me.

Standby is not fetching WAL from archive, it fetches it directly from
master and is done synchronously, which means that master will only
confirm transaction to client when WAL is streamed and applied at
standby. On the other hand, master does not have to wait for WAL
archiver. If master crashes before WAL archiver is able to send WAL, we
would still have it on standby.

Let us for the sake of demonstration consider that we have same very low
busy but very critical system:

1. Your client connects to primary server (master) and performs changes
on data
2. It just happen that this is a moment where PostgreSQL opens new WAL
segment. It writes few kilobytes in this new WAL segment but it has
almost 16MB to write before segment is complete. So the wal archiver has
to wait before it can push wal segment in wal archive
3. Secondary server (standby) is doing same, it is writing changes to
newly created WAL segment
4. Your client issues COMMIT
- primary waits until changes are applied at secondary
- primary flushes changes to WAL
- secondary confirms transaction to primary
- primary confirms transaction to client
- WAL is still not processed by wal archiver because it is only i.e.
1 MB big and we are still left 15MB to go
5. Primary server crashes, i.e. due to catastrophic disk failure
- everything stops and can't be recovered
- wal archiver is dead, but even if it were alive it wouldn't send
WAL to archive anyway because 16MB of wal segment was not filled up
6. We promote our secondary server to master
- In secondary server's WAL we already got changes from primary
- Secondary continues appending new changes to wal segment
7. Eventually WAL segment on secondary fills up and then pushes it to
wal archive.
8. Although primary is dead, we didn't loose anything because lost WAL
data was pushed by secondary.

I understand the above, what I did not understand, from your original post:

"My problem is that sometimes WAL uploaded from master and from slave
are not 100% identical. In most cases they are but occasionally they are
not. I have written small script that ensures that upload is free of
race condition and I log md5 sum of each WAL."

To me that reads as you sending WALs to the archive from both the master
and the standby in parallel, instead of sequentially as you imply in the
outline above. It would seem to be confirmed by the setting of
archive_mode = always:

https://www.postgresql.org/docs/9.6/static/warm-standby.html#CONTINUOUS-ARCHIVING-IN-STANDBY

Seems to mean the simpler thing to do would be to set standby to
archive_mode = on, in which case the standby would not contribute WAL's
until it was promoted which would seem to be what you want.

Regards,
Sasa

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#17

Jon Nelson

jnelson+pgsql@jamponi.net

about 9 years ago

In reply to: Adrian Klaver (#16)

Re: Shared WAL archive between master and standby: WALs not always identical

On Tue, Feb 28, 2017 at 9:41 AM, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

On 02/27/2017 11:14 PM, Sasa Vilic wrote:
...

"My problem is that sometimes WAL uploaded from master and from slave are
not 100% identical. In most cases they are but occasionally they are not. I
have written small script that ensures that upload is free of race
condition and I log md5 sum of each WAL."

The wisdom (or not!) in archiving WAL to the same location from multiple
sources (even if they share a common ancestor) notwithstanding, I must
admit to having my curiosity piqued.

Let's assume a different situation:
- a master and one or more standby units each archiving every WAL file but
to it's own archive
- we check to see if identically named WAL files are content identical

Does it surprise anybody else that, sometimes, an identically named WAL
file from the master and from a standby have different contents? It
surprises me.

I would love to know if the differences are due to some oversight in the
WAL archiving mechanism chosen by the OP or if, in fact, a master and a
standby generate different WAL files!

What does pg_xlogdump say about the differences in the files?

--
Jon

#18

sasavilic@gmail.com

about 9 years ago

In reply to: Adrian Klaver (#16)

Re: Shared WAL archive between master and standby: WALs not always identical

On 2017-02-28 16:41, Adrian Klaver wrote:

Seems to mean the simpler thing to do would be to set standby to
archive_mode = on, in which case the standby would not contribute WAL's
until it was promoted which would seem to be what you want.

Yes, that was my first thought. Except that documentation states following:

"""
If archive_mode is set to on, the archiver is not enabled during
recovery or standby mode. If the standby server is promoted, it will
start archiving after the promotion, *but will not archive any WAL it
did not generate itself*
"""

What happens with WAL that are started on primary but finished on secondary?

Regards,
Sasa

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#19

sasavilic@gmail.com

about 9 years ago

In reply to: Jon Nelson (#17)

Re: Shared WAL archive between master and standby: WALs not always identical

On 2017-02-28 16:57, Jon Nelson wrote:

What does pg_xlogdump say about the differences in the files?

What a nice tool. I didn't realize that it exists for 9.6.

Unfortunately, we gave up on shared WAL archive, so I don't if I will
still have all both WALs. I have one conflicting WAL from one of the
servers and there is the *whole* content:

rmgr: Standby len (rec/tot): 24/ 50, tx: 0, lsn:
43/74000028, prev 43/73000140, desc: RUNNING_XACTS nextXid 11617888
latestCompletedXid 11617887 oldestRunningXid 11617888
rmgr: XLOG len (rec/tot): 80/ 106, tx: 0, lsn:
43/74000060, prev 43/74000028, desc: CHECKPOINT_ONLINE redo 43/74000028;
tli 13; prev tli 13; fpw true; xid 0:11617888; oid 25304; multi 1;
offset 0; oldest xid 1750 in DB 13322; oldest multi 1 in DB 1;
oldest/newest commit timestamp xid: 0/0; oldest running xid 11617888; online
rmgr: Standby len (rec/tot): 24/ 50, tx: 0, lsn:
43/740000D0, prev 43/74000060, desc: RUNNING_XACTS nextXid 11617888
latestCompletedXid 11617887 oldestRunningXid 11617888
rmgr: XLOG len (rec/tot): 8/ 34, tx: 0, lsn:
43/74000108, prev 43/740000D0, desc: BACKUP_END 43/74000028
rmgr: XLOG len (rec/tot): 0/ 24, tx: 0, lsn:
43/74000130, prev 43/74000108, desc: SWITCH

I don't have WAL from other server, but I hope I will be able to find it
tomorrow (if I haven't deleted it).

But I can share with you what I have observed by manually looking into
WALs with hex editor:

* If I remember correctly, first page was same (what we see above)
* For one of the logs (the one I have right now), all first 16 pages
except for the first had XLogPageHeaderData set (magic number=0xD093,
tli=13, pageaddr) but WITHOUT any payload. Everything except header was
zero. Remaining part of WAL WAS ALL ZEROED.
* Same WAL from other server was also similar. First page contained data
and remaining part of WAL was filled with XLogPageHeaderData WITHOUT
payload.
* The only different was that one WAL had XLogPageHeaderData without
payload and other zeros instead.

This was on system during initial setup so for most time we didn't have
any clients at all => not much to be logged in WAL. We were trying out
replication, failover/switchover scenarios with repmgr and creating and
restoring backup with barman.

Regards,
Sasa

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#20