Postgresql 9.2 has standby server lost data?

Started by Paula Pricealmost 11 years ago7 messagesgeneral
Jump to latest
#1Paula Price
pj8abug@gmail.com

I have Postgresql 9.2.10 streaming replication set up with log shipping in
case the replication falls behind. I discovered that the log-shipping had
been disabled at some point in time. I enabled the log shipping again.

If at some point in time the streaming replication fell behind and the
standby server was not able to retrieve the necessary WAL file(s) from the
primary, would the standby server continue to function normally? Do I need
to rebuild the standby server? I have restarted the standby server and it
is up and running with no issues. I need to know if the
data integrity has been compromised.

I have run this query to determine the lag time for the standby(in case
this tells me anything):
"SELECT now(), now() - pg_last_xact_replay_timestamp() AS time_lag;
RESULT:
"2015-06-19 00:40:48.83701+00";"00:00:01.078616"

Thank you,
Paula P

#2Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Paula Price (#1)
Re: Postgresql 9.2 has standby server lost data?

On 06/18/2015 05:45 PM, Paula Price wrote:

I have Postgresql 9.2.10 streaming replication set up with log shipping in
case the replication falls behind. I discovered that the log-shipping had
been disabled at some point in time. I enabled the log shipping again.

If at some point in time the streaming replication fell behind and the
standby server was not able to retrieve the necessary WAL file(s) from the
primary, would the standby server continue to function normally? Do I need
to rebuild the standby server? I have restarted the standby server and it
is up and running with no issues.

Well that seems at odds with it being unable to retrieve the WAL files.
This leads to these questions:

1) What makes you think it did not retrieve the WAL files via streaming?

2) What does the postgres log show at the time you restarted the standby?

I need to know if the

data integrity has been compromised.

I have run this query to determine the lag time for the standby(in case
this tells me anything):
"SELECT now(), now() - pg_last_xact_replay_timestamp() AS time_lag;
RESULT:
"2015-06-19 00:40:48.83701+00";"00:00:01.078616"

Thank you,
Paula P

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Paula Price
pj8abug@gmail.com
In reply to: Adrian Klaver (#2)
Re: Postgresql 9.2 has standby server lost data?

On Fri, Jun 19, 2015 at 12:01 PM, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

On 06/18/2015 05:45 PM, Paula Price wrote:

I have Postgresql 9.2.10 streaming replication set up with log shipping in
case the replication falls behind. I discovered that the log-shipping had
been disabled at some point in time. I enabled the log shipping again.

If at some point in time the streaming replication fell behind and the
standby server was not able to retrieve the necessary WAL file(s) from the
primary, would the standby server continue to function normally? Do I
need
to rebuild the standby server? I have restarted the standby server and it
is up and running with no issues.

Well that seems at odds with it being unable to retrieve the WAL files.
This leads to these questions:

1) What makes you think it did not retrieve the WAL files via streaming?


It *may* *not *have fallen behind via replication. We do have standby
servers that fall behind, but since we have log-shipping it is not a
concern. On this server, i have no idea how long we were running without
log-shipping. I have no idea how many log files I would have to go through
to find out when log-shipping stopped.
My basic question is:
If a standby server falls behind with streaming replication AND the standby
server cannot obtain the WAL file needed from the primary, will you get an
error from the standby server? Or does it just hiccup and try to carry on?​

2) What does the postgres log show at the time you restarted the standby?

​2015-06-18 01:12:41.871 UTC::@:[28168]:2015-06-18 01:12:41 UTC: LOG:

00000: database system was shut down in recovery at 2015-06-18 01:12:14 UTC

2015-06-18 01:12:41.871 UTC::@:[28168]:2015-06-18 01:12:41 UTC: LOCATION:

StartupXLOG, xlog.c:6298

2015-06-18 01:12:41.904 UTC::@:[28168]:2015-06-18 01:12:41 UTC: LOG:

00000: entering standby mode

2015-06-18 01:12:41.904 UTC::@:[28168]:2015-06-18 01:12:41 UTC: LOCATION:

StartupXLOG, xlog.c:6384

2015-06-18 01:12:41.987 UTC::@:[28168]:2015-06-18 01:12:41 UTC: LOG:

00000: redo starts at 867/FDF32E18

2015-06-18 01:12:41.987 UTC::@:[28168]:2015-06-18 01:12:41 UTC: LOCATION:

StartupXLOG, xlog.c:6855

2015-06-18 01:12:42.486 UTC::[unknown]@[unknown]:[28213]:2015-06-18

01:12:42 UTC: LOG: 00000: connection received: host=[local]

2015-06-18 01:12:42.486 UTC::[unknown]@[unknown]:[28213]:2015-06-18

01:12:42 UTC: LOCATION: BackendInitialize, postmaster.c:3501

2015-06-18 01:12:42.486 UTC:[local]:postgres@postgres:[28213]:2015-06-18

01:12:42 UTC: FATAL: 57P03: the database system is starting up

2015-06-18 01:12:42.486 UTC:[local]:postgres@postgres:[28213]:2015-06-18

01:12:42 UTC: LOCATION: ProcessStartupPacket, postmaster.c:1792

2015-06-18 01:12:43.488 UTC::[unknown]@[unknown]:[28270]:2015-06-18

01:12:43 UTC: LOG: 00000: connection received: host=[local]

2015-06-18 01:12:43.488 UTC::[unknown]@[unknown]:[28270]:2015-06-18

01:12:43 UTC: LOCATION: BackendInitialize, postmaster.c:3501

2015-06-18 01:12:43.488 UTC:[local]:postgres@postgres:[28270]:2015-06-18

01:12:43 UTC: FATAL: 57P03: the database system is starting up

2015-06-18 01:12:43.488 UTC:[local]:postgres@postgres:[28270]:2015-06-18

01:12:43 UTC: LOCATION: ProcessStartupPacket, postmaster.c:1792

2015-06-18 01:12:44.489 UTC::[unknown]@[unknown]:[28327]:2015-06-18

01:12:44 UTC: LOG: 00000: connection received: host=[local]

2015-06-18 01:12:44.489 UTC::[unknown]@[unknown]:[28327]:2015-06-18

01:12:44 UTC: LOCATION: BackendInitialize, postmaster.c:3501

2015-06-18 01:12:44.489 UTC:[local]:postgres@postgres:[28327]:2015-06-18

01:12:44 UTC: FATAL: 57P03: the database system is starting up

2015-06-18 01:12:44.489 UTC:[local]:postgres@postgres:[28327]:2015-06-18

01:12:44 UTC: LOCATION: ProcessStartupPacket, postmaster.c:1792

2015-06-18 01:12:44.490 UTC::@:[28168]:2015-06-18 01:12:41 UTC: LOG:

00000: consistent recovery state reached at 868/112AF7F8

2015-06-18 01:12:44.490 UTC::@:[28168]:2015-06-18 01:12:41 UTC: LOCATION:

CheckRecoveryConsistency, xlog.c:7405

2015-06-18 01:12:44.490 UTC::@:[28168]:2015-06-18 01:12:41 UTC: LOG:

00000: invalid record length at 868/112AFB00

2015-06-18 01:12:44.490 UTC::@:[28168]:2015-06-18 01:12:41 UTC: LOCATION:

ReadRecord, xlog.c:4078

2015-06-18 01:12:44.490 UTC::@:[28166]:2015-06-18 01:12:41 UTC: LOG:

00000: database system is ready to accept read only connections

2015-06-18 01:12:44.490 UTC::@:[28166]:2015-06-18 01:12:41 UTC: LOCATION:

sigusr1_handler, postmaster.c:4314

Show quoted text

I need to know if the

data integrity has been compromised.

I have run this query to determine the lag time for the standby(in case
this tells me anything):
"SELECT now(), now() - pg_last_xact_replay_timestamp() AS time_lag;
RESULT:
"2015-06-19 00:40:48.83701+00";"00:00:01.078616"

Thank you,
Paula P

--
Adrian Klaver
adrian.klaver@aklaver.com

#4Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Paula Price (#3)
Re: Postgresql 9.2 has standby server lost data?

On 06/19/2015 01:05 PM, Paula Price wrote:

On Fri, Jun 19, 2015 at 12:01 PM, Adrian Klaver
<adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:

On 06/18/2015 05:45 PM, Paula Price wrote:

I have Postgresql 9.2.10 streaming replication set up with log
shipping in
case the replication falls behind. I discovered that the
log-shipping had
been disabled at some point in time. I enabled the log shipping
again.

If at some point in time the streaming replication fell behind
and the
standby server was not able to retrieve the necessary WAL
file(s) from the
primary, would the standby server continue to function
normally? Do I need
to rebuild the standby server? I have restarted the standby
server and it
is up and running with no issues.

Well that seems at odds with it being unable to retrieve the WAL
files. This leads to these questions:

1) What makes you think it did not retrieve the WAL files via streaming?


It _may_ _not _have fallen behind via replication. We do have standby
servers that fall behind, but since we have log-shipping it is not a
concern. On this server, i have no idea how long we were running
without log-shipping. I have no idea how many log files I would have to
go through to find out when log-shipping stopped.
My basic question is:
If a standby server falls behind with streaming replication AND the
standby server cannot obtain the WAL file needed from the primary, will
you get an error from the standby server? Or does it just hiccup and
try to carry on?​

No it will fall over:

http://www.postgresql.org/docs/9.2/interactive/runtime-config-replication.html#RUNTIME-CONFIG-REPLICATION-SENDER

wal_keep_segments (integer)

Specifies the minimum number of past log file segments kept in the
pg_xlog directory, in case a standby server needs to fetch them for
streaming replication. Each segment is normally 16 megabytes. If a
standby server connected to the sending server falls behind by more than
wal_keep_segments segments, the sending server might remove a WAL
segment still needed by the standby, in which case the replication
connection will be terminated. Downstream connections will also
eventually fail as a result. (However, the standby server can recover by
fetching the segment from archive, if WAL archiving is in use.)

This sets only the minimum number of segments retained in pg_xlog;
the system might need to retain more segments for WAL archival or to
recover from a checkpoint. If wal_keep_segments is zero (the default),
the system doesn't keep any extra segments for standby purposes, so the
number of old WAL segments available to standby servers is a function of
the location of the previous checkpoint and status of WAL archiving.
This parameter can only be set in the postgresql.conf file or on the
server command line.

When you started up if the necessary WAL files where not on the server
you would have seen Postgres throwing errors in the log.

I would check out the below to verify:

http://www.postgresql.org/docs/9.2/interactive/warm-standby.html#STREAMING-REPLICATION

25.2.5.2. Monitoring

2) What does the postgres log show at the time you restarted the
standby?

​2015-06-18 01:12:41.871 UTC::@:[28168]:2015-06-18 01:12:41 UTC:
LOG: 00000: database system was shut down in recovery at
2015-06-18 01:12:14 UTC

2015-06-18 01:12:41.871 UTC::@:[28168]:2015-06-18 01:12:41 UTC:
LOCATION: StartupXLOG, xlog.c:6298

2015-06-18 01:12:41.904 UTC::@:[28168]:2015-06-18 01:12:41 UTC:
LOG: 00000: entering standby mode

2015-06-18 01:12:41.904 UTC::@:[28168]:2015-06-18 01:12:41 UTC:
LOCATION: StartupXLOG, xlog.c:6384

2015-06-18 01:12:41.987 UTC::@:[28168]:2015-06-18 01:12:41 UTC:
LOG: 00000: redo starts at 867/FDF32E18

2015-06-18 01:12:41.987 UTC::@:[28168]:2015-06-18 01:12:41 UTC:
LOCATION: StartupXLOG, xlog.c:6855

2015-06-18 01:12:42.486
UTC::[unknown]@[unknown]:[28213]:2015-06-18 01:12:42 UTC: LOG:
00000: connection received: host=[local]

2015-06-18 01:12:42.486
UTC::[unknown]@[unknown]:[28213]:2015-06-18 01:12:42 UTC:
LOCATION: BackendInitialize, postmaster.c:3501

2015-06-18 01:12:42.486
UTC:[local]:postgres@postgres:[28213]:2015-06-18 01:12:42 UTC:
FATAL: 57P03: the database system is starting up

2015-06-18 01:12:42.486
UTC:[local]:postgres@postgres:[28213]:2015-06-18 01:12:42 UTC:
LOCATION: ProcessStartupPacket, postmaster.c:1792

2015-06-18 01:12:43.488
UTC::[unknown]@[unknown]:[28270]:2015-06-18 01:12:43 UTC: LOG:
00000: connection received: host=[local]

2015-06-18 01:12:43.488
UTC::[unknown]@[unknown]:[28270]:2015-06-18 01:12:43 UTC:
LOCATION: BackendInitialize, postmaster.c:3501

2015-06-18 01:12:43.488
UTC:[local]:postgres@postgres:[28270]:2015-06-18 01:12:43 UTC:
FATAL: 57P03: the database system is starting up

2015-06-18 01:12:43.488
UTC:[local]:postgres@postgres:[28270]:2015-06-18 01:12:43 UTC:
LOCATION: ProcessStartupPacket, postmaster.c:1792

2015-06-18 01:12:44.489
UTC::[unknown]@[unknown]:[28327]:2015-06-18 01:12:44 UTC: LOG:
00000: connection received: host=[local]

2015-06-18 01:12:44.489
UTC::[unknown]@[unknown]:[28327]:2015-06-18 01:12:44 UTC:
LOCATION: BackendInitialize, postmaster.c:3501

2015-06-18 01:12:44.489
UTC:[local]:postgres@postgres:[28327]:2015-06-18 01:12:44 UTC:
FATAL: 57P03: the database system is starting up

2015-06-18 01:12:44.489
UTC:[local]:postgres@postgres:[28327]:2015-06-18 01:12:44 UTC:
LOCATION: ProcessStartupPacket, postmaster.c:1792

2015-06-18 01:12:44.490 UTC::@:[28168]:2015-06-18 01:12:41 UTC:
LOG: 00000: consistent recovery state reached at 868/112AF7F8

2015-06-18 01:12:44.490 UTC::@:[28168]:2015-06-18 01:12:41 UTC:
LOCATION: CheckRecoveryConsistency, xlog.c:7405

2015-06-18 01:12:44.490 UTC::@:[28168]:2015-06-18 01:12:41 UTC:
LOG: 00000: invalid record length at 868/112AFB00

2015-06-18 01:12:44.490 UTC::@:[28168]:2015-06-18 01:12:41 UTC:
LOCATION: ReadRecord, xlog.c:4078

2015-06-18 01:12:44.490 UTC::@:[28166]:2015-06-18 01:12:41 UTC:
LOG: 00000: database system is ready to accept read only
connections

2015-06-18 01:12:44.490 UTC::@:[28166]:2015-06-18 01:12:41 UTC:
LOCATION: sigusr1_handler, postmaster.c:4314

I need to know if the

data integrity has been compromised.

I have run this query to determine the lag time for the
standby(in case
this tells me anything):
"SELECT now(), now() - pg_last_xact_replay_timestamp() AS time_lag;
RESULT:
"2015-06-19 00:40:48.83701+00";"00:00:01.078616"

Thank you,
Paula P

--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Jerry Sievers
gsievers19@comcast.net
In reply to: Adrian Klaver (#4)
Re: Postgresql 9.2 has standby server lost data?

Adrian Klaver <adrian.klaver@aklaver.com> writes:

On 06/19/2015 01:05 PM, Paula Price wrote:

On Fri, Jun 19, 2015 at 12:01 PM, Adrian Klaver
<adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:

On 06/18/2015 05:45 PM, Paula Price wrote:

I have Postgresql 9.2.10 streaming replication set up with log
shipping in
case the replication falls behind. I discovered that the
log-shipping had
been disabled at some point in time. I enabled the log shipping
again.

If at some point in time the streaming replication fell behind
and the
standby server was not able to retrieve the necessary WAL
file(s) from the
primary, would the standby server continue to function
normally? Do I need
to rebuild the standby server? I have restarted the standby
server and it
is up and running with no issues.

Well that seems at odds with it being unable to retrieve the WAL
files. This leads to these questions:

1) What makes you think it did not retrieve the WAL files via streaming?

​
It _may_ _not _have fallen behind via replication. We do have standby
servers that fall behind, but since we have log-shipping it is not a
concern. On this server, i have no idea how long we were running
without log-shipping. I have no idea how many log files I would have to
go through to find out when log-shipping stopped.
My basic question is:
If a standby server falls behind with streaming replication AND the
standby server cannot obtain the WAL file needed from the primary, will
you get an error from the standby server? Or does it just hiccup and
try to carry on?​

No it will fall over:

I wouldn't describe it that way...

To a user, the standby will function and appear normal, unless they
notice that the data is not current.

In the server logs, there will be indications that replication is stuck
waiting for WAL.

HTH

http://www.postgresql.org/docs/9.2/interactive/runtime-config-replication.html#RUNTIME-CONFIG-REPLICATION-SENDER

wal_keep_segments (integer)

Specifies the minimum number of past log file segments kept in the
pg_xlog directory, in case a standby server needs to fetch them for
streaming replication. Each segment is normally 16 megabytes. If a
standby server connected to the sending server falls behind by more
than wal_keep_segments segments, the sending server might remove a WAL
segment still needed by the standby, in which case the replication
connection will be terminated. Downstream connections will also
eventually fail as a result. (However, the standby server can recover
by fetching the segment from archive, if WAL archiving is in use.)

This sets only the minimum number of segments retained in pg_xlog;
the system might need to retain more segments for WAL archival or to
recover from a checkpoint. If wal_keep_segments is zero (the default),
the system doesn't keep any extra segments for standby purposes, so
the number of old WAL segments available to standby servers is a
function of the location of the previous checkpoint and status of WAL
archiving. This parameter can only be set in the postgresql.conf file
or on the server command line.

When you started up if the necessary WAL files where not on the server
you would have seen Postgres throwing errors in the log.

I would check out the below to verify:

http://www.postgresql.org/docs/9.2/interactive/warm-standby.html#STREAMING-REPLICATION

25.2.5.2. Monitoring

2) What does the postgres log show at the time you restarted the
standby?

​2015-06-18 01:12:41.871 UTC::@:[28168]:2015-06-18 01:12:41 UTC:
LOG: 00000: database system was shut down in recovery at
2015-06-18 01:12:14 UTC

2015-06-18 01:12:41.871 UTC::@:[28168]:2015-06-18 01:12:41 UTC:
LOCATION: StartupXLOG, xlog.c:6298

2015-06-18 01:12:41.904 UTC::@:[28168]:2015-06-18 01:12:41 UTC:
LOG: 00000: entering standby mode

2015-06-18 01:12:41.904 UTC::@:[28168]:2015-06-18 01:12:41 UTC:
LOCATION: StartupXLOG, xlog.c:6384

2015-06-18 01:12:41.987 UTC::@:[28168]:2015-06-18 01:12:41 UTC:
LOG: 00000: redo starts at 867/FDF32E18

2015-06-18 01:12:41.987 UTC::@:[28168]:2015-06-18 01:12:41 UTC:
LOCATION: StartupXLOG, xlog.c:6855

2015-06-18 01:12:42.486
UTC::[unknown]@[unknown]:[28213]:2015-06-18 01:12:42 UTC: LOG:
00000: connection received: host=[local]

2015-06-18 01:12:42.486
UTC::[unknown]@[unknown]:[28213]:2015-06-18 01:12:42 UTC:
LOCATION: BackendInitialize, postmaster.c:3501

2015-06-18 01:12:42.486
UTC:[local]:postgres@postgres:[28213]:2015-06-18 01:12:42 UTC:
FATAL: 57P03: the database system is starting up

2015-06-18 01:12:42.486
UTC:[local]:postgres@postgres:[28213]:2015-06-18 01:12:42 UTC:
LOCATION: ProcessStartupPacket, postmaster.c:1792

2015-06-18 01:12:43.488
UTC::[unknown]@[unknown]:[28270]:2015-06-18 01:12:43 UTC: LOG:
00000: connection received: host=[local]

2015-06-18 01:12:43.488
UTC::[unknown]@[unknown]:[28270]:2015-06-18 01:12:43 UTC:
LOCATION: BackendInitialize, postmaster.c:3501

2015-06-18 01:12:43.488
UTC:[local]:postgres@postgres:[28270]:2015-06-18 01:12:43 UTC:
FATAL: 57P03: the database system is starting up

2015-06-18 01:12:43.488
UTC:[local]:postgres@postgres:[28270]:2015-06-18 01:12:43 UTC:
LOCATION: ProcessStartupPacket, postmaster.c:1792

2015-06-18 01:12:44.489
UTC::[unknown]@[unknown]:[28327]:2015-06-18 01:12:44 UTC: LOG:
00000: connection received: host=[local]

2015-06-18 01:12:44.489
UTC::[unknown]@[unknown]:[28327]:2015-06-18 01:12:44 UTC:
LOCATION: BackendInitialize, postmaster.c:3501

2015-06-18 01:12:44.489
UTC:[local]:postgres@postgres:[28327]:2015-06-18 01:12:44 UTC:
FATAL: 57P03: the database system is starting up

2015-06-18 01:12:44.489
UTC:[local]:postgres@postgres:[28327]:2015-06-18 01:12:44 UTC:
LOCATION: ProcessStartupPacket, postmaster.c:1792

2015-06-18 01:12:44.490 UTC::@:[28168]:2015-06-18 01:12:41 UTC:
LOG: 00000: consistent recovery state reached at 868/112AF7F8

2015-06-18 01:12:44.490 UTC::@:[28168]:2015-06-18 01:12:41 UTC:
LOCATION: CheckRecoveryConsistency, xlog.c:7405

2015-06-18 01:12:44.490 UTC::@:[28168]:2015-06-18 01:12:41 UTC:
LOG: 00000: invalid record length at 868/112AFB00

2015-06-18 01:12:44.490 UTC::@:[28168]:2015-06-18 01:12:41 UTC:
LOCATION: ReadRecord, xlog.c:4078

2015-06-18 01:12:44.490 UTC::@:[28166]:2015-06-18 01:12:41 UTC:
LOG: 00000: database system is ready to accept read only
connections

2015-06-18 01:12:44.490 UTC::@:[28166]:2015-06-18 01:12:41 UTC:
LOCATION: sigusr1_handler, postmaster.c:4314

​

I need to know if the

data integrity has been compromised.

I have run this query to determine the lag time for the
standby(in case
this tells me anything):
"SELECT now(), now() - pg_last_xact_replay_timestamp() AS time_lag;
RESULT:
"2015-06-19 00:40:48.83701+00";"00:00:01.078616"

Thank you,
Paula P

--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net
p: 312.241.7800

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Paula Price (#1)
Re: Postgresql 9.2 has standby server lost data?

Paula Price wrote:

I have Postgresql 9.2.10 streaming replication set up with log shipping in
case the replication falls behind. I discovered that the log-shipping had
been disabled at some point in time. I enabled the log shipping again.

If at some point in time the streaming replication fell behind and the
standby server was not able to retrieve the necessary WAL file(s) from the
primary, would the standby server continue to function normally? Do I need
to rebuild the standby server? I have restarted the standby server and it
is up and running with no issues. I need to know if the
data integrity has been compromised.

I have run this query to determine the lag time for the standby(in case
this tells me anything):
"SELECT now(), now() - pg_last_xact_replay_timestamp() AS time_lag;
RESULT:
"2015-06-19 00:40:48.83701+00";"00:00:01.078616"

Your were lucky and replication did not fall behind.

If it had, and replication had tried to resort to WAL archives,
replication would have got stuck there.

There is no way that recovery can omit a portion of WAL during replay.

Yours,
Laurenz Albe

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Paula Price
pj8abug@gmail.com
In reply to: Jerry Sievers (#5)
Re: Postgresql 9.2 has standby server lost data?

​The script I used to check the lag time between the primary and the
standby would show that the standby server was not even close, right?

Paula​

On Sat, Jun 20, 2015 at 9:51 AM, Jerry Sievers <gsievers19@comcast.net>
wrote:

Show quoted text

Adrian Klaver <adrian.klaver@aklaver.com> writes:

On 06/19/2015 01:05 PM, Paula Price wrote:

On Fri, Jun 19, 2015 at 12:01 PM, Adrian Klaver
<adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:

On 06/18/2015 05:45 PM, Paula Price wrote:

I have Postgresql 9.2.10 streaming replication set up with log
shipping in
case the replication falls behind. I discovered that the
log-shipping had
been disabled at some point in time. I enabled the log shipping
again.

If at some point in time the streaming replication fell behind
and the
standby server was not able to retrieve the necessary WAL
file(s) from the
primary, would the standby server continue to function
normally? Do I need
to rebuild the standby server? I have restarted the standby
server and it
is up and running with no issues.

Well that seems at odds with it being unable to retrieve the WAL
files. This leads to these questions:

1) What makes you think it did not retrieve the WAL files via

streaming?


It _may_ _not _have fallen behind via replication. We do have standby
servers that fall behind, but since we have log-shipping it is not a
concern. On this server, i have no idea how long we were running
without log-shipping. I have no idea how many log files I would have to
go through to find out when log-shipping stopped.
My basic question is:
If a standby server falls behind with streaming replication AND the
standby server cannot obtain the WAL file needed from the primary, will
you get an error from the standby server? Or does it just hiccup and
try to carry on?​

No it will fall over:

I wouldn't describe it that way...

To a user, the standby will function and appear normal, unless they
notice that the data is not current.

In the server logs, there will be indications that replication is stuck
waiting for WAL.

HTH

http://www.postgresql.org/docs/9.2/interactive/runtime-config-replication.html#RUNTIME-CONFIG-REPLICATION-SENDER

wal_keep_segments (integer)

Specifies the minimum number of past log file segments kept in the
pg_xlog directory, in case a standby server needs to fetch them for
streaming replication. Each segment is normally 16 megabytes. If a
standby server connected to the sending server falls behind by more
than wal_keep_segments segments, the sending server might remove a WAL
segment still needed by the standby, in which case the replication
connection will be terminated. Downstream connections will also
eventually fail as a result. (However, the standby server can recover
by fetching the segment from archive, if WAL archiving is in use.)

This sets only the minimum number of segments retained in pg_xlog;
the system might need to retain more segments for WAL archival or to
recover from a checkpoint. If wal_keep_segments is zero (the default),
the system doesn't keep any extra segments for standby purposes, so
the number of old WAL segments available to standby servers is a
function of the location of the previous checkpoint and status of WAL
archiving. This parameter can only be set in the postgresql.conf file
or on the server command line.

When you started up if the necessary WAL files where not on the server
you would have seen Postgres throwing errors in the log.

I would check out the below to verify:

http://www.postgresql.org/docs/9.2/interactive/warm-standby.html#STREAMING-REPLICATION

25.2.5.2. Monitoring

2) What does the postgres log show at the time you restarted the
standby?

​2015-06-18 01:12:41.871 UTC::@:[28168]:2015-06-18 01:12:41

UTC:

LOG: 00000: database system was shut down in recovery at
2015-06-18 01:12:14 UTC

2015-06-18 01:12:41.871 UTC::@:[28168]:2015-06-18 01:12:41 UTC:
LOCATION: StartupXLOG, xlog.c:6298

2015-06-18 01:12:41.904 UTC::@:[28168]:2015-06-18 01:12:41 UTC:
LOG: 00000: entering standby mode

2015-06-18 01:12:41.904 UTC::@:[28168]:2015-06-18 01:12:41 UTC:
LOCATION: StartupXLOG, xlog.c:6384

2015-06-18 01:12:41.987 UTC::@:[28168]:2015-06-18 01:12:41 UTC:
LOG: 00000: redo starts at 867/FDF32E18

2015-06-18 01:12:41.987 UTC::@:[28168]:2015-06-18 01:12:41 UTC:
LOCATION: StartupXLOG, xlog.c:6855

2015-06-18 01:12:42.486
UTC::[unknown]@[unknown]:[28213]:2015-06-18 01:12:42 UTC: LOG:
00000: connection received: host=[local]

2015-06-18 01:12:42.486
UTC::[unknown]@[unknown]:[28213]:2015-06-18 01:12:42 UTC:
LOCATION: BackendInitialize, postmaster.c:3501

2015-06-18 01:12:42.486
UTC:[local]:postgres@postgres:[28213]:2015-06-18 01:12:42 UTC:
FATAL: 57P03: the database system is starting up

2015-06-18 01:12:42.486
UTC:[local]:postgres@postgres:[28213]:2015-06-18 01:12:42 UTC:
LOCATION: ProcessStartupPacket, postmaster.c:1792

2015-06-18 01:12:43.488
UTC::[unknown]@[unknown]:[28270]:2015-06-18 01:12:43 UTC: LOG:
00000: connection received: host=[local]

2015-06-18 01:12:43.488
UTC::[unknown]@[unknown]:[28270]:2015-06-18 01:12:43 UTC:
LOCATION: BackendInitialize, postmaster.c:3501

2015-06-18 01:12:43.488
UTC:[local]:postgres@postgres:[28270]:2015-06-18 01:12:43 UTC:
FATAL: 57P03: the database system is starting up

2015-06-18 01:12:43.488
UTC:[local]:postgres@postgres:[28270]:2015-06-18 01:12:43 UTC:
LOCATION: ProcessStartupPacket, postmaster.c:1792

2015-06-18 01:12:44.489
UTC::[unknown]@[unknown]:[28327]:2015-06-18 01:12:44 UTC: LOG:
00000: connection received: host=[local]

2015-06-18 01:12:44.489
UTC::[unknown]@[unknown]:[28327]:2015-06-18 01:12:44 UTC:
LOCATION: BackendInitialize, postmaster.c:3501

2015-06-18 01:12:44.489
UTC:[local]:postgres@postgres:[28327]:2015-06-18 01:12:44 UTC:
FATAL: 57P03: the database system is starting up

2015-06-18 01:12:44.489
UTC:[local]:postgres@postgres:[28327]:2015-06-18 01:12:44 UTC:
LOCATION: ProcessStartupPacket, postmaster.c:1792

2015-06-18 01:12:44.490 UTC::@:[28168]:2015-06-18 01:12:41 UTC:
LOG: 00000: consistent recovery state reached at 868/112AF7F8

2015-06-18 01:12:44.490 UTC::@:[28168]:2015-06-18 01:12:41 UTC:
LOCATION: CheckRecoveryConsistency, xlog.c:7405

2015-06-18 01:12:44.490 UTC::@:[28168]:2015-06-18 01:12:41 UTC:
LOG: 00000: invalid record length at 868/112AFB00

2015-06-18 01:12:44.490 UTC::@:[28168]:2015-06-18 01:12:41 UTC:
LOCATION: ReadRecord, xlog.c:4078

2015-06-18 01:12:44.490 UTC::@:[28166]:2015-06-18 01:12:41 UTC:
LOG: 00000: database system is ready to accept read only
connections

2015-06-18 01:12:44.490 UTC::@:[28166]:2015-06-18 01:12:41 UTC:
LOCATION: sigusr1_handler, postmaster.c:4314

I need to know if the

data integrity has been compromised.

I have run this query to determine the lag time for the
standby(in case
this tells me anything):
"SELECT now(), now() - pg_last_xact_replay_timestamp() AS

time_lag;

RESULT:
"2015-06-19 00:40:48.83701+00";"00:00:01.078616"

Thank you,
Paula P

--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net
p: 312.241.7800