Clean shutdown and warm standby

masao.fujii@gmail.com

about 17 years ago

In reply to: Guillaume Smet (#1)

Re: Clean shutdown and warm standby

Hi,

On Thu, Apr 9, 2009 at 4:11 AM, Guillaume Smet <guillaume.smet@gmail.com> wrote:

Hi,

Following the discussion here
http://archives.postgresql.org/message-id/49D9E986.8010604@pse-consulting.de
, I wrote a small patch which rotates the last XLog file on shutdown
so that the archive command is also executed for this file and we are
sure we have all the useful XLog files when we perform a clean
shutdown of master + switch to the failover server. This rotation is
done only if the archive mode is active and an archive command is set.

It's currently really difficult to switch easily (ie without copying
the file manually) to the failover server without any data loss.

Is there any problem I've missed?

RequestXLogSwitch() doesn't wait until the switched WAL file has
actually been archived. So, some WAL files still may not exist in
the standby server also after clean shutdown of the primary.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

guillaume.smet@gmail.com

about 17 years ago

In reply to: Fujii Masao (#2)

Re: Clean shutdown and warm standby

On Thu, Apr 9, 2009 at 5:00 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

RequestXLogSwitch() doesn't wait until the switched WAL file has
actually been archived. So, some WAL files still may not exist in
the standby server also after clean shutdown of the primary.

Thanks for your comment.

RequestXLogSwitch() doesn't wait for archiving but the shutdown
process takes care of it AFAICS.

As far as I understand the shutdown code, we have the following
sequence (I just explain here the steps involved in the XLog and
archiver shutdown):
- postmaster.c line 2693: PM_WAIT_BACKENDS state: we start the
bgwriter and shut it down. It calls ShutdownXLog which creates the
shutdown checkpoint and, with my patch, switch to a new XLog file.
Then we are in PM_SHUTDOWN state.
- postmaster.c line 2244: the reaper is called for the bgwriter child
just shutdown and wakens the archiver one last time: the archive
command is executed for our last XLog file.

Did I miss something?

--
Guillaume

masao.fujii@gmail.com

about 17 years ago

In reply to: Guillaume Smet (#3)

Re: Clean shutdown and warm standby

Hi,

On Thu, Apr 9, 2009 at 6:36 PM, Guillaume Smet <guillaume.smet@gmail.com> wrote:

On Thu, Apr 9, 2009 at 5:00 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

RequestXLogSwitch() doesn't wait until the switched WAL file has
actually been archived. So, some WAL files still may not exist in
the standby server also after clean shutdown of the primary.

Thanks for your comment.

RequestXLogSwitch() doesn't wait for archiving but the shutdown
process takes care of it AFAICS.

Oh, you are right. I completely forgot about it :(

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

guillaume.smet@gmail.com

about 17 years ago

In reply to: Guillaume Smet (#1)

Re: Clean shutdown and warm standby

Hi,

On Wed, Apr 8, 2009 at 9:11 PM, I wrote:

Following the discussion here
http://archives.postgresql.org/message-id/49D9E986.8010604@pse-consulting.de
, I wrote a small patch which rotates the last XLog file on shutdown
[snip]

Any comment or advice on how I can fix it with a different method if
this one is considered wrong?

Original message and patch here:
http://archives.postgresql.org/message-id/1d4e0c10904081211p2c0f1cdepe620c11d1271ceb2@mail.gmail.com

Thanks.

--
Guillaume

heikki.linnakangas@enterprisedb.com

about 17 years ago

In reply to: Guillaume Smet (#5)

Re: Clean shutdown and warm standby

Guillaume Smet wrote:

On Wed, Apr 8, 2009 at 9:11 PM, I wrote:

Following the discussion here
http://archives.postgresql.org/message-id/49D9E986.8010604@pse-consulting.de
, I wrote a small patch which rotates the last XLog file on shutdown
[snip]

Any comment or advice on how I can fix it with a different method if
this one is considered wrong?

Original message and patch here:
http://archives.postgresql.org/message-id/1d4e0c10904081211p2c0f1cdepe620c11d1271ceb2@mail.gmail.com

Sorry for the delay.

It's not safe to write WAL after the checkpoint, as RequestXLogSwitch()
does. After restart, the system will start inserting WAL from the
checkpoint redo point, which is just before the XLOG_SWITCH record, and
will overwrite it.

It would be nice to have all WAL archived at shutdown, but this is not
the way to do it :-(.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

masao.fujii@gmail.com

about 17 years ago

In reply to: Heikki Linnakangas (#6)

Re: Clean shutdown and warm standby

Hi,

On Fri, Apr 24, 2009 at 3:20 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

It's not safe to write WAL after the checkpoint, as RequestXLogSwitch()
does. After restart, the system will start inserting WAL from the checkpoint
redo point, which is just before the XLOG_SWITCH record, and will overwrite
it.

Since, in this case, the WAL file including XLOG_SWITCH exists
in archive, I don't think that it's unsafe, i.e. XLOG_SWITCH would
be treated as the last applied record and not be overwritten. WAL
records would start to be inserted from the subsequent file (with
new timeline).

This is useful for warm-standby, but I'm afraid that this may delay
Shared Disk Failover which doesn't need to wait until all the WAL
files are archived at shutdown. Is there any solution to this problem?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

heikki.linnakangas@enterprisedb.com

about 17 years ago

In reply to: Fujii Masao (#7)

Re: Clean shutdown and warm standby

Fujii Masao wrote:

On Fri, Apr 24, 2009 at 3:20 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

It's not safe to write WAL after the checkpoint, as RequestXLogSwitch()
does. After restart, the system will start inserting WAL from the checkpoint
redo point, which is just before the XLOG_SWITCH record, and will overwrite
it.

Since, in this case, the WAL file including XLOG_SWITCH exists
in archive, I don't think that it's unsafe, i.e. XLOG_SWITCH would
be treated as the last applied record and not be overwritten. WAL
records would start to be inserted from the subsequent file (with
new timeline).

It will be overwritten in a normal non-archive-recovery startup.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

masao.fujii@gmail.com

about 17 years ago

In reply to: Heikki Linnakangas (#8)

Re: Clean shutdown and warm standby

Hi,

On Mon, Apr 27, 2009 at 8:43 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

Fujii Masao wrote:

On Fri, Apr 24, 2009 at 3:20 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

It's not safe to write WAL after the checkpoint, as RequestXLogSwitch()
does. After restart, the system will start inserting WAL from the
checkpoint
redo point, which is just before the XLOG_SWITCH record, and will
overwrite
it.

Since, in this case, the WAL file including XLOG_SWITCH exists
in archive, I don't think that it's unsafe, i.e. XLOG_SWITCH would
be treated as the last applied record and not be overwritten. WAL
records would start to be inserted from the subsequent file (with
new timeline).

It will be overwritten in a normal non-archive-recovery startup.

Hmm, you mean the case where the system crashes after
inserting XLOG_SWITCH and before archiving the WAL file
containing it? Okey, though it seems unlikely, XLOG_SWITCH
would be overwritten by subsequent non-archive-recovery.

I just have an idea; when the last applied WAL file has
archive_status, the system starts WAL insertion from the next
file after recovery. But, there is still race condition that the
system may crash after XLOG_SWITCH is written (fsynced)
and before .ready file is created.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#10

heikki.linnakangas@enterprisedb.com

about 17 years ago

In reply to: Fujii Masao (#9)

Re: Clean shutdown and warm standby

Fujii Masao wrote:

Hi,

On Mon, Apr 27, 2009 at 8:43 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

Fujii Masao wrote:

On Fri, Apr 24, 2009 at 3:20 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

It's not safe to write WAL after the checkpoint, as RequestXLogSwitch()
does. After restart, the system will start inserting WAL from the
checkpoint
redo point, which is just before the XLOG_SWITCH record, and will
overwrite
it.

Since, in this case, the WAL file including XLOG_SWITCH exists
in archive, I don't think that it's unsafe, i.e. XLOG_SWITCH would
be treated as the last applied record and not be overwritten. WAL
records would start to be inserted from the subsequent file (with
new timeline).

It will be overwritten in a normal non-archive-recovery startup.

Hmm, you mean the case where the system crashes after
inserting XLOG_SWITCH and before archiving the WAL file
containing it?

No, no crash is involved. Just a normal server shutdown and start:

1. Server shutdown is initiated
2. A shutdown checkpoint is recorded at XLOG point 1234, redo ptr is
also 1234.
3. A XLOG_SWITCH record is written at 1235, right after the checkpoint
record.
4. The last round of archiving is done. The partial WAL file containing
the checkpoint and XLOG_SWITCH record is archived.
5. Postmaster exits.

6. Postmaster is started again. Since the system was shut down cleanly,
no WAL recovery is done. The WAL insert pointer is initialized to right
after the redo pointer, location 1235, which is also the location of the
XLOG_SWITCH record.
7. The next WAL record written will be written at 1235, overwriting the
XLOG_SWITCH record.
8. When the WAL file fills up, the system will try to archive the same
WAL file again, this time with additional WAL records that after the
checkpoint record.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#11

masao.fujii@gmail.com

about 17 years ago

In reply to: Heikki Linnakangas (#10)

Re: Clean shutdown and warm standby

Hi,

On Mon, Apr 27, 2009 at 10:02 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

Fujii Masao wrote:

Hi,

On Mon, Apr 27, 2009 at 8:43 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

Fujii Masao wrote:

On Fri, Apr 24, 2009 at 3:20 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

It's not safe to write WAL after the checkpoint, as RequestXLogSwitch()
does. After restart, the system will start inserting WAL from the
checkpoint
redo point, which is just before the XLOG_SWITCH record, and will
overwrite
it.

Since, in this case, the WAL file including XLOG_SWITCH exists
in archive, I don't think that it's unsafe, i.e. XLOG_SWITCH would
be treated as the last applied record and not be overwritten. WAL
records would start to be inserted from the subsequent file (with
new timeline).

It will be overwritten in a normal non-archive-recovery startup.

Hmm, you mean the case where the system crashes after
inserting XLOG_SWITCH and before archiving the WAL file
containing it?

No, no crash is involved. Just a normal server shutdown and start:

1. Server shutdown is initiated
2. A shutdown checkpoint is recorded at XLOG point 1234, redo ptr is also
1234.
3. A XLOG_SWITCH record is written at 1235, right after the checkpoint
record.
4. The last round of archiving is done. The partial WAL file containing the
checkpoint and XLOG_SWITCH record is archived.
5. Postmaster exits.

6. Postmaster is started again. Since the system was shut down cleanly, no
WAL recovery is done. The WAL insert pointer is initialized to right after
the redo pointer, location 1235, which is also the location of the
XLOG_SWITCH record.
7. The next WAL record written will be written at 1235, overwriting the
XLOG_SWITCH record.
8. When the WAL file fills up, the system will try to archive the same WAL
file again, this time with additional WAL records that after the checkpoint
record.

Oh, you are right. I've missed that case :(

Thanks for the detailed description!

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#12

Andreas Pflug

pgadmin@pse-consulting.de

about 17 years ago

In reply to: Heikki Linnakangas (#10)

Re: Clean shutdown and warm standby

Heikki Linnakangas wrote:

No, no crash is involved. Just a normal server shutdown and start:

1. Server shutdown is initiated
2. A shutdown checkpoint is recorded at XLOG point 1234, redo ptr is
also 1234.
3. A XLOG_SWITCH record is written at 1235, right after the checkpoint
record.
4. The last round of archiving is done. The partial WAL file
containing the checkpoint and XLOG_SWITCH record is archived.
5. Postmaster exits.

6. Postmaster is started again. Since the system was shut down
cleanly, no WAL recovery is done. The WAL insert pointer is
initialized to right after the redo pointer, location 1235, which is
also the location of the XLOG_SWITCH record.
7. The next WAL record written will be written at 1235, overwriting
the XLOG_SWITCH record.
8. When the WAL file fills up, the system will try to archive the same
WAL file again, this time with additional WAL records that after the
checkpoint record.

So to get this down to a solution, it appears to be correct to execute
the RequestXLogSwitch right before CreateCheckPoint?

Regards,
Andreas

#13

heikki.linnakangas@enterprisedb.com

about 17 years ago

In reply to: Andreas Pflug (#12)

Re: Clean shutdown and warm standby

Andreas Pflug wrote:

Heikki Linnakangas wrote:

No, no crash is involved. Just a normal server shutdown and start:

1. Server shutdown is initiated
2. A shutdown checkpoint is recorded at XLOG point 1234, redo ptr is
also 1234.
3. A XLOG_SWITCH record is written at 1235, right after the checkpoint
record.
4. The last round of archiving is done. The partial WAL file
containing the checkpoint and XLOG_SWITCH record is archived.
5. Postmaster exits.

6. Postmaster is started again. Since the system was shut down
cleanly, no WAL recovery is done. The WAL insert pointer is
initialized to right after the redo pointer, location 1235, which is
also the location of the XLOG_SWITCH record.
7. The next WAL record written will be written at 1235, overwriting
the XLOG_SWITCH record.
8. When the WAL file fills up, the system will try to archive the same
WAL file again, this time with additional WAL records that after the
checkpoint record.

So to get this down to a solution, it appears to be correct to execute
the RequestXLogSwitch right before CreateCheckPoint?

Hmm, then the checkpoint record isn't archived. That might be
acceptable, though, since all data would be safe in the preceding WAL.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#14

Tom Lane

tgl@sss.pgh.pa.us

about 17 years ago

In reply to: Heikki Linnakangas (#13)

Re: Clean shutdown and warm standby

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

Andreas Pflug wrote:

So to get this down to a solution, it appears to be correct to execute
the RequestXLogSwitch right before CreateCheckPoint?

Hmm, then the checkpoint record isn't archived. That might be
acceptable, though, since all data would be safe in the preceding WAL.

Not at all, because the database would be very unhappy at restart
if it can't find the checkpoint record pg_control is pointing to.

regards, tom lane

#15

Andreas Pflug

pgadmin@pse-consulting.de

about 17 years ago

In reply to: Tom Lane (#14)

Re: Clean shutdown and warm standby

Tom Lane wrote:

Not at all, because the database would be very unhappy at restart
if it can't find the checkpoint record pg_control is pointing to.

So for several weeks now all postings just say how it will _not_ work.
Does this boil down to "There's no way to make sure that a graceful
failover won't lose data"?

Regards,
Andreas

#16

heikki.linnakangas@enterprisedb.com

about 17 years ago

In reply to: Tom Lane (#14)

Re: Clean shutdown and warm standby

Tom Lane wrote:

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

Andreas Pflug wrote:

So to get this down to a solution, it appears to be correct to execute
the RequestXLogSwitch right before CreateCheckPoint?

Hmm, then the checkpoint record isn't archived. That might be
acceptable, though, since all data would be safe in the preceding WAL.

Not at all, because the database would be very unhappy at restart
if it can't find the checkpoint record pg_control is pointing to.

At a normal startup, the checkpoint record would be there as usual. And
an archive recovery starts at the location indicated by the backup label.

AFAICS calling RequestXLogSwitch() before CreateCheckPoint would be
equivalent to calling "pg_switch_xlog()" just before shutting down.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#17

guillaume.smet@gmail.com

about 17 years ago

In reply to: Heikki Linnakangas (#16)

Re: Clean shutdown and warm standby

On Tue, Apr 28, 2009 at 5:22 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

At a normal startup, the checkpoint record would be there as usual. And an
archive recovery starts at the location indicated by the backup label.

AFAICS calling RequestXLogSwitch() before CreateCheckPoint would be
equivalent to calling "pg_switch_xlog()" just before shutting down.

That's what I had in mind when writing the patch but I didn't know the
implications of this particular checkpoint.

So moving the call before CreateCheckPoint is what I really intended
now that I have in mind these implications and I don't why it would be
a problem to miss this checkpoint in the logs archived.

--
Guillaume

#18

guillaume.smet@gmail.com

about 17 years ago

In reply to: Guillaume Smet (#17)

Re: Clean shutdown and warm standby

On Tue, Apr 28, 2009 at 5:35 PM, Guillaume Smet
<guillaume.smet@gmail.com> wrote:

On Tue, Apr 28, 2009 at 5:22 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

At a normal startup, the checkpoint record would be there as usual. And an
archive recovery starts at the location indicated by the backup label.

AFAICS calling RequestXLogSwitch() before CreateCheckPoint would be
equivalent to calling "pg_switch_xlog()" just before shutting down.

That's what I had in mind when writing the patch but I didn't know the
implications of this particular checkpoint.

So moving the call before CreateCheckPoint is what I really intended
now that I have in mind these implications and I don't know why it would be
a problem to miss this checkpoint in the logs archived.

What do we decide about this problem?

Should we just call RequestXLogSwitch() before the creation of the
shutdown checkpoint or do we need a more complex patch? If so can
anybody explain the potential problem of this approach so we can
figure how to fix it?

Thanks.

--
Guillaume

#19

heikki.linnakangas@enterprisedb.com

about 17 years ago

In reply to: Guillaume Smet (#18)

Re: Clean shutdown and warm standby

Guillaume Smet wrote:

On Tue, Apr 28, 2009 at 5:35 PM, Guillaume Smet
<guillaume.smet@gmail.com> wrote:

On Tue, Apr 28, 2009 at 5:22 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

At a normal startup, the checkpoint record would be there as usual. And an
archive recovery starts at the location indicated by the backup label.

AFAICS calling RequestXLogSwitch() before CreateCheckPoint would be
equivalent to calling "pg_switch_xlog()" just before shutting down.

That's what I had in mind when writing the patch but I didn't know the
implications of this particular checkpoint.

So moving the call before CreateCheckPoint is what I really intended
now that I have in mind these implications and I don't know why it would be
a problem to miss this checkpoint in the logs archived.

What do we decide about this problem?

Should we just call RequestXLogSwitch() before the creation of the
shutdown checkpoint or do we need a more complex patch? If so can
anybody explain the potential problem of this approach so we can
figure how to fix it?

I've committed a patch to do the RequstXLogSwitch() before shutdown
checkpoint as discussed. It seems safe to me. (sorry for the delay, and
thanks for the reminder)

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#20

Simon Riggs

simon@2ndQuadrant.com

about 17 years ago

In reply to: Heikki Linnakangas (#19)

Re: Clean shutdown and warm standby

On Thu, 2009-05-28 at 14:04 +0300, Heikki Linnakangas wrote:

I've committed a patch to do the RequstXLogSwitch() before shutdown
checkpoint as discussed. It seems safe to me. (sorry for the delay, and
thanks for the reminder)

Not sure if that is a fix that will work in all cases.

There is a potential timing problem with when the archiver is shutdown:
that may now be fixed in 8.4, see what you think.

Also if archiving is currently stalled, then files will not be
transferred, even if you switch xlogs. So this is at best a partial fix
to the problem and the need for a manual check of file contents
remains.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#21