Checkpoint Err on Startup of Rsynced System

Started by Jim Longwillalmost 10 years ago7 messagesgeneral
Jump to latest
#1Jim Longwill
longwill@psmfc.org

I am trying to setup a 2nd, identical, db server (M2) for development
and I've run into a problem with starting up the 2nd Postgres installation.

Here's what I've done:
1) did a 'clone' of 1st (production) machine M1 (so both machines on
Cent OS 7.2)
2) setup an rsync operation, did a complete 'rsync' from M1 to M2
3) did a final 'CHECKPOINT' command on M1 postgres
4) shutdown postgres on M1 with 'pg_ctl stop'
5) did final 'rsync' operation (then restarted postgres on M1 with
'pg_ctl start')
6) tried to startup postgres on M2

It won't start, & in the log file gives the error message:
...
< 2016-05-31 09:02:52.337 PDT >LOG: invalid primary checkpoint record
< 2016-05-31 09:02:52.337 PDT >LOG: invalid secondary checkpoint record
< 2016-05-31 09:02:52.337 PDT >PANIC: could not locate a valid
checkpoint record
< 2016-05-31 09:02:53.184 PDT >LOG: startup process (PID 26680) was
terminated by signal 6: Aborted
< 2016-05-31 09:02:53.184 PDT >LOG: aborting startup due to startup
process failure

I've tried several times to do this but always get this result. So, do
I need to do a new 'initdb..' operation on machine M2 + restore from M1
backups? Or is there another way to fix this?

--o--o--o--o--o--o--o--o--o--o--o--o--
Jim Longwill
PSMFC Regional Mark Processing Center
JLongwill@psmfc.org
--o--o--o--o--o--o--o--o--o--o--o--o--

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Scott Mead
scottm@openscg.com
In reply to: Jim Longwill (#1)
Re: Checkpoint Err on Startup of Rsynced System

On Tue, May 31, 2016 at 1:13 PM, Jim Longwill <longwill@psmfc.org> wrote:

I am trying to setup a 2nd, identical, db server (M2) for development and
I've run into a problem with starting up the 2nd Postgres installation.

Here's what I've done:
1) did a 'clone' of 1st (production) machine M1 (so both machines on
Cent OS 7.2)
2) setup an rsync operation, did a complete 'rsync' from M1 to M2
3) did a final 'CHECKPOINT' command on M1 postgres
4) shutdown postgres on M1 with 'pg_ctl stop'
5) did final 'rsync' operation (then restarted postgres on M1 with
'pg_ctl start')
6) tried to startup postgres on M2

It won't start, & in the log file gives the error message:
...
< 2016-05-31 09:02:52.337 PDT >LOG: invalid primary checkpoint record
< 2016-05-31 09:02:52.337 PDT >LOG: invalid secondary checkpoint record
< 2016-05-31 09:02:52.337 PDT >PANIC: could not locate a valid checkpoint
record
< 2016-05-31 09:02:53.184 PDT >LOG: startup process (PID 26680) was
terminated by signal 6: Aborted
< 2016-05-31 09:02:53.184 PDT >LOG: aborting startup due to startup
process failure

I've tried several times to do this but always get this result. So, do I
need to do a new 'initdb..' operation on machine M2 + restore from M1
backups? Or is there another way to fix this?

You should have stopped M1 prior to taking the backup. If you can't do
that, it can be done online via:

1. Setup archiving
2. select pg_start_backup('some label');
3. <run rsync>
4. select pg_stop_backup();

Without archiving and the pg_[start|stop]_backup, you're not guaranteed
anything. You could use an atomic snapshot (LVM, storage, etc...), but
it's got to be a true snapshot. Without that, you need archiving + start /
stop backup.

Last section of:
https://wiki.postgresql.org/wiki/Simple_Configuration_Recommendation#Physical_Database_Backups
will take you to:
https://www.postgresql.org/docs/current/static/continuous-archiving.html

--Scott

--o--o--o--o--o--o--o--o--o--o--o--o--

Jim Longwill
PSMFC Regional Mark Processing Center
JLongwill@psmfc.org
--o--o--o--o--o--o--o--o--o--o--o--o--

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
--
Scott Mead
Sr. Architect
*OpenSCG <http://openscg.com&gt;*
http://openscg.com

#3Alan Hodgson
ahodgson@lists.simkin.ca
In reply to: Jim Longwill (#1)
Re: Checkpoint Err on Startup of Rsynced System

On Tuesday, May 31, 2016 10:13:14 AM Jim Longwill wrote:

I am trying to setup a 2nd, identical, db server (M2) for development
and I've run into a problem with starting up the 2nd Postgres installation.

Here's what I've done:
1) did a 'clone' of 1st (production) machine M1 (so both machines on
Cent OS 7.2)
2) setup an rsync operation, did a complete 'rsync' from M1 to M2
3) did a final 'CHECKPOINT' command on M1 postgres
4) shutdown postgres on M1 with 'pg_ctl stop'
5) did final 'rsync' operation (then restarted postgres on M1 with
'pg_ctl start')
6) tried to startup postgres on M2

It won't start, & in the log file gives the error message:
...
< 2016-05-31 09:02:52.337 PDT >LOG: invalid primary checkpoint record
< 2016-05-31 09:02:52.337 PDT >LOG: invalid secondary checkpoint record
< 2016-05-31 09:02:52.337 PDT >PANIC: could not locate a valid
checkpoint record
< 2016-05-31 09:02:53.184 PDT >LOG: startup process (PID 26680) was
terminated by signal 6: Aborted
< 2016-05-31 09:02:53.184 PDT >LOG: aborting startup due to startup
process failure

I've tried several times to do this but always get this result. So, do
I need to do a new 'initdb..' operation on machine M2 + restore from M1
backups? Or is there another way to fix this?

What you describe should work fine. In order of likelihood of why it doesnt, I
could guess:

1 - you're not waiting for the database to fully shut down before running the
last rsync
2 - you're not in fact rsync'ing the entire data directory
3 - the target server is running a different version of PostgreSQL or has a
different machine architecture

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Venkata B Nagothi
nag1010@gmail.com
In reply to: Jim Longwill (#1)
Re: Checkpoint Err on Startup of Rsynced System

On Wed, Jun 1, 2016 at 3:13 AM, Jim Longwill <longwill@psmfc.org> wrote:

I am trying to setup a 2nd, identical, db server (M2) for development and
I've run into a problem with starting up the 2nd Postgres installation.

Here's what I've done:
1) did a 'clone' of 1st (production) machine M1 (so both machines on
Cent OS 7.2)
2) setup an rsync operation, did a complete 'rsync' from M1 to M2
3) did a final 'CHECKPOINT' command on M1 postgres
4) shutdown postgres on M1 with 'pg_ctl stop'
5) did final 'rsync' operation (then restarted postgres on M1 with
'pg_ctl start')
6) tried to startup postgres on M2

If you rsync the data-directory of an live running postgres instance, that
is not going to work. As Scott said earlier, you need to do "select
pg_start_backup('labelname');" before you initiate rsync and "select
pg_stop_backup()" after you complete rsync. That way, postgresql would know
that you are rsyncing and also identifies the required WALs to be copied
over.

Or if you can shutdown M1 for sometime then, simply shutdown M1 copy over
(or rsync) the data-directory to M2 and then start the M2 instance. That
should work.

Regards,
Venkata B N

Fujitsu Australia

#5Jim Longwill
longwill@psmfc.org
In reply to: Scott Mead (#2)
Re: Checkpoint Err on Startup of Rsynced System

Scott,
Thanks. If I understand you correctly.. Actually, we did have M1
shutdown when the inital clone was done (some weeks ago). That was done
using the VMWare system, not rsync. My main problem is that I don't
have WAL archiving setup yet (I've not changed the Postgres defaults on
this so far). That's part of what the new machine M2 is for.. to
practice doing this before adjusting our production machine (M1). As
regards doing a snapshot, I thought that the manual 'CHECKPOINT' would
take care of it.

So, this time around I may try to do a manual (initdb.. & pg_restore
from backup files) on the new machine in order to get a roughly
equivalent installation going. One early goal I have is to get
archiving setup & working at beyond the minimal level.

Jim Longwill

Show quoted text

On 05/31/2016 11:50 AM, Scott Mead wrote:

On Tue, May 31, 2016 at 1:13 PM, Jim Longwill <longwill@psmfc.org
<mailto:longwill@psmfc.org>> wrote:

I am trying to setup a 2nd, identical, db server (M2) for
development and I've run into a problem with starting up the 2nd
Postgres installation.

Here's what I've done:
1) did a 'clone' of 1st (production) machine M1 (so both
machines on Cent OS 7.2)
2) setup an rsync operation, did a complete 'rsync' from M1 to M2
3) did a final 'CHECKPOINT' command on M1 postgres
4) shutdown postgres on M1 with 'pg_ctl stop'
5) did final 'rsync' operation (then restarted postgres on M1
with 'pg_ctl start')
6) tried to startup postgres on M2

It won't start, & in the log file gives the error message:
...
< 2016-05-31 09:02:52.337 PDT >LOG: invalid primary checkpoint record
< 2016-05-31 09:02:52.337 PDT >LOG: invalid secondary checkpoint
record
< 2016-05-31 09:02:52.337 PDT >PANIC: could not locate a valid
checkpoint record
< 2016-05-31 09:02:53.184 PDT >LOG: startup process (PID 26680)
was terminated by signal 6: Aborted
< 2016-05-31 09:02:53.184 PDT >LOG: aborting startup due to
startup process failure

I've tried several times to do this but always get this result.
So, do I need to do a new 'initdb..' operation on machine M2 +
restore from M1 backups? Or is there another way to fix this?

You should have stopped M1 prior to taking the backup. If you can't do
that, it can be done online via:

1. Setup archiving
2. select pg_start_backup('some label');
3. <run rsync>
4. select pg_stop_backup();

Without archiving and the pg_[start|stop]_backup, you're not
guaranteed anything. You could use an atomic snapshot (LVM, storage,
etc...), but it's got to be a true snapshot. Without that, you need
archiving + start / stop backup.

Last section of:
https://wiki.postgresql.org/wiki/Simple_Configuration_Recommendation#Physical_Database_Backups
will take you to:
https://www.postgresql.org/docs/current/static/continuous-archiving.html

--Scott

--o--o--o--o--o--o--o--o--o--o--o--o--
Jim Longwill
PSMFC Regional Mark Processing Center
JLongwill@psmfc.org <mailto:JLongwill@psmfc.org>
--o--o--o--o--o--o--o--o--o--o--o--o--

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org
<mailto:pgsql-general@postgresql.org>)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
--
Scott Mead
Sr. Architect
/OpenSCG <http://openscg.com&gt;/
http://openscg.com

#6Jeff Janes
jeff.janes@gmail.com
In reply to: Jim Longwill (#1)
Re: Checkpoint Err on Startup of Rsynced System

On Tue, May 31, 2016 at 10:13 AM, Jim Longwill <longwill@psmfc.org> wrote:

I am trying to setup a 2nd, identical, db server (M2) for development and
I've run into a problem with starting up the 2nd Postgres installation.

Here's what I've done:
1) did a 'clone' of 1st (production) machine M1 (so both machines on Cent
OS 7.2)
2) setup an rsync operation, did a complete 'rsync' from M1 to M2
3) did a final 'CHECKPOINT' command on M1 postgres
4) shutdown postgres on M1 with 'pg_ctl stop'
5) did final 'rsync' operation (then restarted postgres on M1 with
'pg_ctl start')
6) tried to startup postgres on M2

It won't start, & in the log file gives the error message:
...
< 2016-05-31 09:02:52.337 PDT >LOG: invalid primary checkpoint record
< 2016-05-31 09:02:52.337 PDT >LOG: invalid secondary checkpoint record
< 2016-05-31 09:02:52.337 PDT >PANIC: could not locate a valid checkpoint
record
< 2016-05-31 09:02:53.184 PDT >LOG: startup process (PID 26680) was
terminated by signal 6: Aborted
< 2016-05-31 09:02:53.184 PDT >LOG: aborting startup due to startup process
failure

I've tried several times to do this but always get this result. So, do I
need to do a new 'initdb..' operation on machine M2 + restore from M1
backups? Or is there another way to fix this?

It sounds like you did not include pg_xlog in your rsync. What you
have done is basically a cold backup. Cold backups must include
pg_xlog, at least if you want them to work without WAL archival. If
you look farther up in the log, it should tell you what xlog file it
needs, and you can copy that from M1 if it is still there.

Cheers,

Jeff

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Jim Longwill
JLongwill@psmfc.org
In reply to: Jeff Janes (#6)
Re: Checkpoint Err on Startup of Rsynced System

Jeff Janes,

Ok. I checked this further and just found that the pg_xlog area is
symlinked to another area.. and indeed that other area was not being
rsynced (!) and I thought it was. So, I just fixed this, re-ran it and
now it is working. Now I believe I have a stable postgres running on M2.

So, thanks Jeff for mentioning pg_xlog. Thanks to others as well for
your input.

--Jim Longwill

On 06/01/2016 08:44 AM, Jeff Janes wrote:

On Tue, May 31, 2016 at 10:13 AM, Jim Longwill <longwill@psmfc.org> wrote:

I am trying to setup a 2nd, identical, db server (M2) for development and
I've run into a problem with starting up the 2nd Postgres installation.

Here's what I've done:
1) did a 'clone' of 1st (production) machine M1 (so both machines on Cent
OS 7.2)
2) setup an rsync operation, did a complete 'rsync' from M1 to M2
3) did a final 'CHECKPOINT' command on M1 postgres
4) shutdown postgres on M1 with 'pg_ctl stop'
5) did final 'rsync' operation (then restarted postgres on M1 with
'pg_ctl start')
6) tried to startup postgres on M2

It won't start, & in the log file gives the error message:
...
< 2016-05-31 09:02:52.337 PDT >LOG: invalid primary checkpoint record
< 2016-05-31 09:02:52.337 PDT >LOG: invalid secondary checkpoint record
< 2016-05-31 09:02:52.337 PDT >PANIC: could not locate a valid checkpoint
record
< 2016-05-31 09:02:53.184 PDT >LOG: startup process (PID 26680) was
terminated by signal 6: Aborted
< 2016-05-31 09:02:53.184 PDT >LOG: aborting startup due to startup process
failure

I've tried several times to do this but always get this result. So, do I
need to do a new 'initdb..' operation on machine M2 + restore from M1
backups? Or is there another way to fix this?

It sounds like you did not include pg_xlog in your rsync. What you
have done is basically a cold backup. Cold backups must include
pg_xlog, at least if you want them to work without WAL archival. If
you look farther up in the log, it should tell you what xlog file it
needs, and you can copy that from M1 if it is still there.

Cheers,

Jeff

--
--o--o--o--o--o--o--o--o--o--o--o--o--
Jim Longwill
PSMFC Regional Mark Processing Center
Ph:503-595-3146; FAX:503-595-3446
JLongwill@psmfc.org
--o--o--o--o--o--o--o--o--o--o--o--o--

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general