New Slave - timeline ERROR

Started by drum.lucas@gmail.comabout 10 years ago18 messagesgeneral
Jump to latest
#1drum.lucas@gmail.com
drum.lucas@gmail.com

I've started a new SLAVE PostgreSQL server set up.

** NOTE: I run the pg_basebackup from another STANDBY SERVER. Not from the
MASTER*

1 - screen -t basebackup

2 - su - postgres

3 - cd ~/9.2/data/

4 - ssh postgres@slave01 'pg_basebackup --pgdata=- --format=tar
--label=bb_master --progress --host=localhost --port=5432
--username=replicator --xlog | pv --quiet --rate-limit 100M' | tar -x
--no-same-owner

5 - I've commented the "primary_conninfo =" and "standby_mode=" so the
slave can get the files from WAL_ARCHIVE

6 - Afte I got the logs:

postgres(iostreams)[10037]: 2016-01-09 00:07:26.604
UTC|10085|LOG: database system is ready to accept read only
connections

7 - After the server finished the WAL_ARCHIVE, I turned on replication from
MASTER on recovery.conf:

*recovery.conf on the New Slave:*

restore_command = 'exec nice -n 19 ionice -c 2 -n 7
../../bin/restore_wal_segment.bash "../wal_archive/%f" "%p"'
archive_cleanup_command = 'exec nice -n 19 ionice -c 2 -n 7
../../bin/pg_archivecleaup_mv.bash -d "../wal_archive" "%r"'
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=192.168.100.XX port=5432 user=replicator
application_name=replication_slave02'

But, once I've restarted the POSTGRESQL I got this error:

WAL segment `../wal_archive/00000005.history` not found2016-01-09
01:13:39.183 UTC|774|FATAL: timeline 2 of the primary does not match
recovery target timeline 4

What can I do to solve the problem?

It's really important as it's a production New Slave. Thank you!

#2bricklen
bricklen@gmail.com
In reply to: drum.lucas@gmail.com (#1)
Re: New Slave - timeline ERROR

On Fri, Jan 8, 2016 at 8:44 PM, drum.lucas@gmail.com <drum.lucas@gmail.com>
wrote:

Hi, I'm a bit too lazy to try suss out the exact reasons for your failure,
but here is a reasonably thorough guide to set up replication:
http://dba.stackexchange.com/a/53546/24393

A few tips:
- Having the master ship WALs to the slaves is handy if you can pull it
off. If you are doing it over the wire and using rsync, "-z" for
compression is recommended. If you are doing the tar format of the
pg_basebackup, you *must* have the master ship the WALs to the slave
otherwise it won't be able to synchronize (the "stream" method ships WALs
over the wire so the end result is a synchronized system.

- I always run pg_basebackup from the slave I am building, for simplicity.
- I create new slaves almost every day (we have thousands of databases)
using a bash script and it almost much never fails. In essence it is a big
wrapper around the pg_basebackup command (though we are using pg93 mostly).

The base backup command that I run from the slave I am building:
pg_basebackup --pgdata=$PGDATA --host=$MASTER_IP --port=$PGPORT
--username=replication --no-password --xlog-method=stream --format=plain
--progress --verbose

The recovery.conf:
standby_mode = 'on'
primary_conninfo = 'user=replication host=$IP_OF_UPSTREAM_SLAVE_OR_MASTER
port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres'
recovery_target_timeline = 'latest'
archive_cleanup_command = '/usr/pgsql-9.3/bin/pg_archivecleanup
/path/to/WALs %r'
restore_command = 'cp /path/to/WALs/%f "%p" 2>>
/your/PGDATA/path/pg_log/standby.log'

#3drum.lucas@gmail.com
drum.lucas@gmail.com
In reply to: bricklen (#2)
Re: New Slave - timeline ERROR

Hi @bricklen. Thanks for your reply. I've been working on it for 20h =(
So....

The master is currently shipping the WALs to the slave.
When pg_basebackup has done, I got a successful log:

postgres(iostreams)[10037]: 2016-01-09 00:07:26.604
UTC|10085|LOG: database system is ready to accept read only
connections

The problem happens when, after pg_basebackup, I change the recovery.conf
to replicate from live, changing these 2 lines:

standby_mode = on

primary_conninfo = 'host=IP_TO_THE_OTHER_SLAVE port=5432
user=replicator application_name=replication_slave02'

I read the link you have sent: http://dba.stackexchange.com/a/53546/24393
I'm doing all exactly the same, but only one parameter is different:

time pg_basebackup --pgdata=$PGDATA --host=IP_OF_MASTER --port=5432
--username=replication --password --xlog-method=stream --format=plain
--progress --verbose

I'm not using --xlog-method=strem
I'm using:
ssh postgres@slave1 'pg_basebackup --pgdata=- --format=tar
--label=bb_master --progress --host=localhost --port=5432
--username=replicator *--xlog* | pv --quiet --rate-limit 100M' | tar -x
--no-same-owner

on http://www.postgresql.org/docs/9.2/static/app-pgbasebackup.html is:
-xlog

Using this option is equivalent of using -X with method fetch.
So... I have 2 TB of data. I wouldn't like to run the pg_basebackup again
with the option: --xlog-method=stream without have sure that is going to
work....

Do you have any other ideia? Do you know if --xlog it's the problem and I
should re-run the pg_basebackup again with the *--xlog-method=stream*
option?

Thank you

Lucas Possamai

kinghost.co.nz
<http://forum.kinghost.co.nz/memberlist.php?mode=viewprofile&amp;u=2&amp;sid=e999f8370385657a65d41d5ff60b0b38&gt;

On 10 January 2016 at 06:06, bricklen <bricklen@gmail.com> wrote:

Show quoted text

On Fri, Jan 8, 2016 at 8:44 PM, drum.lucas@gmail.com <drum.lucas@gmail.com

wrote:

Hi, I'm a bit too lazy to try suss out the exact reasons for your failure,
but here is a reasonably thorough guide to set up replication:
http://dba.stackexchange.com/a/53546/24393

A few tips:
- Having the master ship WALs to the slaves is handy if you can pull it
off. If you are doing it over the wire and using rsync, "-z" for
compression is recommended. If you are doing the tar format of the
pg_basebackup, you *must* have the master ship the WALs to the slave
otherwise it won't be able to synchronize (the "stream" method ships WALs
over the wire so the end result is a synchronized system.

- I always run pg_basebackup from the slave I am building, for simplicity.
- I create new slaves almost every day (we have thousands of databases)
using a bash script and it almost much never fails. In essence it is a big
wrapper around the pg_basebackup command (though we are using pg93 mostly).

The base backup command that I run from the slave I am building:
pg_basebackup --pgdata=$PGDATA --host=$MASTER_IP --port=$PGPORT
--username=replication --no-password --xlog-method=stream --format=plain
--progress --verbose

The recovery.conf:
standby_mode = 'on'
primary_conninfo = 'user=replication host=$IP_OF_UPSTREAM_SLAVE_OR_MASTER
port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres'
recovery_target_timeline = 'latest'
archive_cleanup_command = '/usr/pgsql-9.3/bin/pg_archivecleanup
/path/to/WALs %r'
restore_command = 'cp /path/to/WALs/%f "%p" 2>>
/your/PGDATA/path/pg_log/standby.log'

#4bricklen
bricklen@gmail.com
In reply to: drum.lucas@gmail.com (#3)
Re: New Slave - timeline ERROR

On Sat, Jan 9, 2016 at 12:36 PM, drum.lucas@gmail.com <drum.lucas@gmail.com>
wrote:

Do you have any other ideia? Do you know if --xlog it's the problem and I
should re-run the pg_basebackup again with the *--xlog-method=stream*
option?

If the master is successfully ships WALs to the slave you are setting up
you do not need the "stream" option.

#5drum.lucas@gmail.com
drum.lucas@gmail.com
In reply to: bricklen (#4)
Re: New Slave - timeline ERROR

Hi,

If the master is successfully ships WALs to the slave you are setting up

you do not need the "stream" option.

yes.. the master is successfully shipping the WALs....

Is there anything else? Help, please hehehehe

Lucas Possamai

kinghost.co.nz
<http://forum.kinghost.co.nz/memberlist.php?mode=viewprofile&amp;u=2&amp;sid=e999f8370385657a65d41d5ff60b0b38&gt;

On 10 January 2016 at 10:34, bricklen <bricklen@gmail.com> wrote:

Show quoted text

On Sat, Jan 9, 2016 at 12:36 PM, drum.lucas@gmail.com <
drum.lucas@gmail.com> wrote:

Do you have any other ideia? Do you know if --xlog it's the problem and I
should re-run the pg_basebackup again with the *--xlog-method=stream*
option?

If the master is successfully ships WALs to the slave you are setting up
you do not need the "stream" option.

#6bricklen
bricklen@gmail.com
In reply to: drum.lucas@gmail.com (#5)
Re: New Slave - timeline ERROR

On Sat, Jan 9, 2016 at 1:49 PM, drum.lucas@gmail.com <drum.lucas@gmail.com>
wrote:

Hi,

If the master is successfully ships WALs to the slave you are setting up

you do not need the "stream" option.

yes.. the master is successfully shipping the WALs....

Is there anything else? Help, please hehehehe

If you are able to stop Postgres on the slave you are taking the base
backup from, you could do this:

1). Stop postgres on slave1
2). Rsync slave1 to slave2 to copy only the deltas.
3). When you start up slave2 the WALs that the master has shipped to slave2
should apply and bring your system up to consistency.

#7drum.lucas@gmail.com
drum.lucas@gmail.com
In reply to: bricklen (#6)
Re: New Slave - timeline ERROR

Hi,

If you are able to stop Postgres on the slave you are taking the base

backup from, you could do this:

I'm not... the data base is 2 TB.
So, a RSYNC would take DAYS..... And I'm not able to stop the SLAVE for
that long time

Lucas

Lucas Possamai

kinghost.co.nz
<http://forum.kinghost.co.nz/memberlist.php?mode=viewprofile&amp;u=2&amp;sid=e999f8370385657a65d41d5ff60b0b38&gt;

On 10 January 2016 at 10:53, bricklen <bricklen@gmail.com> wrote:

Show quoted text

On Sat, Jan 9, 2016 at 1:49 PM, drum.lucas@gmail.com <drum.lucas@gmail.com

wrote:

Hi,

If the master is successfully ships WALs to the slave you are setting up

you do not need the "stream" option.

yes.. the master is successfully shipping the WALs....

Is there anything else? Help, please hehehehe

If you are able to stop Postgres on the slave you are taking the base
backup from, you could do this:

1). Stop postgres on slave1
2). Rsync slave1 to slave2 to copy only the deltas.
3). When you start up slave2 the WALs that the master has shipped to
slave2 should apply and bring your system up to consistency.

#8bricklen
bricklen@gmail.com
In reply to: drum.lucas@gmail.com (#7)
Re: New Slave - timeline ERROR

On Sat, Jan 9, 2016 at 1:54 PM, drum.lucas@gmail.com <drum.lucas@gmail.com>
wrote:

Hi,

If you are able to stop Postgres on the slave you are taking the base

backup from, you could do this:

I'm not... the data base is 2 TB.
So, a RSYNC would take DAYS..... And I'm not able to stop the SLAVE for
that long time

Depending on when you took the base backup and how many changes have
occurred at your source (slave1) database cluster, the rsync execution time
may or may not take as long as a new base backup if is only only shipping
deltas (changed files).

#9drum.lucas@gmail.com
drum.lucas@gmail.com
In reply to: bricklen (#8)
Re: New Slave - timeline ERROR

Hmm... I see...

Depending on when you took the base backup and how many changes have
occurred at your source (slave1) database cluster, the rsync execution time
may or may not take as long as a new base backup if is only only shipping
deltas (changed files).

I could stop the slave then.... But I'm afraid getting it back online and
get some other errors hehehehe =\

Lucas Possamai

kinghost.co.nz
<http://forum.kinghost.co.nz/memberlist.php?mode=viewprofile&amp;u=2&amp;sid=e999f8370385657a65d41d5ff60b0b38&gt;

On 10 January 2016 at 10:59, bricklen <bricklen@gmail.com> wrote:

Show quoted text

On Sat, Jan 9, 2016 at 1:54 PM, drum.lucas@gmail.com <drum.lucas@gmail.com

wrote:

Hi,

If you are able to stop Postgres on the slave you are taking the base

backup from, you could do this:

I'm not... the data base is 2 TB.
So, a RSYNC would take DAYS..... And I'm not able to stop the SLAVE for
that long time

Depending on when you took the base backup and how many changes have
occurred at your source (slave1) database cluster, the rsync execution time
may or may not take as long as a new base backup if is only only shipping
deltas (changed files).

#10bricklen
bricklen@gmail.com
In reply to: drum.lucas@gmail.com (#9)
Re: New Slave - timeline ERROR

On Sat, Jan 9, 2016 at 2:10 PM, drum.lucas@gmail.com <drum.lucas@gmail.com>
wrote:

I could stop the slave then.... But I'm afraid getting it back online and
get some other errors

At this point I think your options are slim. If you are feeling
adventurous, you can try doing the rsync with the slave running, then do a
second rsync with the slave stopped or do it from the master after putting
the master in backup mode (eg. executing "pg_start_backup('slave_backup')")

#11drum.lucas@gmail.com
drum.lucas@gmail.com
In reply to: bricklen (#10)
Re: New Slave - timeline ERROR

*rsync would be something like:*

from slave1:
rsync -av data/pg_xlog slave2:/var/lib/postgresql/data/

Is that correct?

At this point I think your options are slim. If you are feeling

adventurous, you can try doing the rsync with the slave running, then do a
second rsync with the slave stopped or do it from the master after putting
the master in backup mode (eg. executing "pg_start_backup('slave_
backup')")

I didn't unterstand why doing RSYNC twice... sorry

Lucas

Lucas Possamai

kinghost.co.nz
<http://forum.kinghost.co.nz/memberlist.php?mode=viewprofile&amp;u=2&amp;sid=e999f8370385657a65d41d5ff60b0b38&gt;

On 10 January 2016 at 11:19, bricklen <bricklen@gmail.com> wrote:

Show quoted text

On Sat, Jan 9, 2016 at 2:10 PM, drum.lucas@gmail.com <drum.lucas@gmail.com

wrote:

I could stop the slave then.... But I'm afraid getting it back online and
get some other errors

At this point I think your options are slim. If you are feeling
adventurous, you can try doing the rsync with the slave running, then do a
second rsync with the slave stopped or do it from the master after putting
the master in backup mode (eg. executing "pg_start_backup('slave_backup')")

#12bricklen
bricklen@gmail.com
In reply to: drum.lucas@gmail.com (#11)
Re: New Slave - timeline ERROR

On Sat, Jan 9, 2016 at 2:22 PM, drum.lucas@gmail.com <drum.lucas@gmail.com>
wrote:

*rsync would be something like:*

from slave1:
rsync -av data/pg_xlog slave2:/var/lib/postgresql/data/

Normally I run something like the following from the slave I am setting up.

rsync -azr --progress --partial postgres@$MASTER_IP:/var/lib/postgresql/data
/var/lib/postgresql/data/ --exclude postmaster.pid

I didn't unterstand why doing RSYNC twice... sorry

Unless the source db cluster you are rsync'ing from is stopped, there will
be changes to data files replicated from the master. The second rsync might
not be necessary given the WALs are shipping from the master to slave2.

#13drum.lucas@gmail.com
drum.lucas@gmail.com
In reply to: bricklen (#12)
Re: New Slave - timeline ERROR

rsync -azr --progress --partial postgres@$MASTER_IP:
/var/lib/postgresql/data/var/lib/postgresql/data/ --exclude
postmaster.pid

Ah ok! So this will do an incrementa, right? not supposed to copy ALL the
base/ again?

Lucas Possamai

kinghost.co.nz
<http://forum.kinghost.co.nz/memberlist.php?mode=viewprofile&amp;u=2&amp;sid=e999f8370385657a65d41d5ff60b0b38&gt;

On 10 January 2016 at 11:31, bricklen <bricklen@gmail.com> wrote:

Show quoted text

On Sat, Jan 9, 2016 at 2:22 PM, drum.lucas@gmail.com <drum.lucas@gmail.com

wrote:

*rsync would be something like:*

from slave1:
rsync -av data/pg_xlog slave2:/var/lib/postgresql/data/

Normally I run something like the following from the slave I am setting up.

rsync -azr --progress --partial postgres@$MASTER_IP:
/var/lib/postgresql/data /var/lib/postgresql/data/ --exclude
postmaster.pid

I didn't unterstand why doing RSYNC twice... sorry

Unless the source db cluster you are rsync'ing from is stopped, there will
be changes to data files replicated from the master. The second rsync might
not be necessary given the WALs are shipping from the master to slave2.

#14bricklen
bricklen@gmail.com
In reply to: drum.lucas@gmail.com (#13)
Re: New Slave - timeline ERROR

On Sat, Jan 9, 2016 at 2:35 PM, drum.lucas@gmail.com <drum.lucas@gmail.com>
wrote:

rsync -azr --progress --partial postgres@$MASTER_IP:

/var/lib/postgresql/data/var/lib/postgresql/data/ --exclude
postmaster.pid

Ah ok! So this will do an incrementa, right? not supposed to copy ALL the
base/ again?

Yes, this is for incremental copying from the upstream source.
Actually, you don't need the -r with -a (it is implied), and you can run it
first with --dry-run to see what it _would_ do.
If you are not shipping over the WAN, then omit the -z flag as you do not
need compression.

#15drum.lucas@gmail.com
drum.lucas@gmail.com
In reply to: bricklen (#14)
Re: New Slave - timeline ERROR

Should I point of replication new slave to same DB?

Lucas

On Sunday, 10 January 2016, bricklen <bricklen@gmail.com> wrote:

On Sat, Jan 9, 2016 at 2:35 PM, drum.lucas@gmail.com
<javascript:_e(%7B%7D,'cvml','drum.lucas@gmail.com');> <
drum.lucas@gmail.com
<javascript:_e(%7B%7D,'cvml','drum.lucas@gmail.com');>> wrote:

rsync -azr --progress --partial postgres@$MASTER_IP:

/var/lib/postgresql/data/var/lib/postgresql/data/ --exclude
postmaster.pid

Ah ok! So this will do an incrementa, right? not supposed to copy ALL the
base/ again?

Yes, this is for incremental copying from the upstream source.
Actually, you don't need the -r with -a (it is implied), and you can run
it first with --dry-run to see what it _would_ do.
If you are not shipping over the WAN, then omit the -z flag as you do not
need compression.

--

Lucas Possamai

kinghost.co.nz
<http://forum.kinghost.co.nz/memberlist.php?mode=viewprofile&amp;u=2&amp;sid=e999f8370385657a65d41d5ff60b0b38&gt;

#16John R Pierce
pierce@hogranch.com
In reply to: drum.lucas@gmail.com (#15)
Re: New Slave - timeline ERROR

On 1/9/2016 4:33 PM, drum.lucas@gmail.com wrote:

Should I point of replication new slave to same DB?

I can't even guess what you're asking here.

--
john r pierce, recycling bits in santa cruz

#17drum.lucas@gmail.com
drum.lucas@gmail.com
In reply to: drum.lucas@gmail.com (#1)
Re: New Slave - timeline ERROR

What is the --pgdata=- in your original command? Are you perhaps in the

wrong directory and not getting all the required files?

I run the pg_basebackup from the Slave on /var/lib/pgsql/9.2/data.
So I'm not in the wrong directory...

I'm out of fresh ideas. The rsync command is what I would go with, given

that I can't think of any other commands to try.

I chose the pg_basebackup command just to not stop any database. It's out
of circumstances to stop even the slave one... sorry...

I really don't know what else to do. Have tried everything!

Lucas

On 10 January 2016 at 13:31, bricklen <bricklen@gmail.com> wrote:

Show quoted text

Bottom-posting is the convention in the postgresql lists, and makes it
easier to follow a long thread.

On Sat, Jan 9, 2016 at 3:16 PM, drum.lucas@gmail.com <drum.lucas@gmail.com

wrote:

My servers are not in the same network. A new pg_backup would take 30
hours to complete as I use --rate-limit 100MB.

If you had enough bandwidth, you could do some shell magic to parallelize
the rsync commands, or use something like
http://moo.nac.uci.edu/~hjm/parsync/ to do that. If you are limited by
bandwidth, then a single rsync run is probably what you're stuck with.

I really need to put his server up! =\

If you were running zfs you could also take a snapshot of the fs and use
that for your base backup, but I assume you would have mentioned that if it
was an option.

I don't think that running a pg_basebackup one more time will solve the
problem, because I've already done that!
I could run actually, but the problem is that it takes 30h! hahahahah

What is the --pgdata=- in your original command? Are you perhaps in the
wrong directory and not getting all the required files?

I'm out of fresh ideas. The rsync command is what I would go with, given
that I can't think of any other commands to try.

*Have a look:*
http://www.postgresql.org/docs/9.2/static/app-pgbasebackup.html

Note that there are some limitations in an online backup from the standby:

The backup history file is not created in the database cluster backed up.

There is no guarantee that all WAL files required for the backup are
archived at the end of backup. If you are planning to use the backup for an
archive recovery and want to ensure that all required files are available
at that moment, you need to include them into the backup by using -x
option.

You had that in your original command I believe.

#18drum.lucas@gmail.com
drum.lucas@gmail.com
In reply to: drum.lucas@gmail.com (#17)
Re: New Slave - timeline ERROR

Hi guys..
I started a new PG_BASEBACKUP and it's working now..

The problem was the line: standby_mode = on on the recovery.conf on the
STANDBY server.

After the basebackup, I comented this line and started the postgreSQL. BUT,
you shouldn't do that.

On the last time I didn't comment and it worked.

Thank you!

Lucas Possamai

kinghost.co.nz
<http://forum.kinghost.co.nz/memberlist.php?mode=viewprofile&amp;u=2&amp;sid=e999f8370385657a65d41d5ff60b0b38&gt;

On 10 January 2016 at 19:22, drum.lucas@gmail.com <drum.lucas@gmail.com>
wrote:

Show quoted text

What is the --pgdata=- in your original command? Are you perhaps in the

wrong directory and not getting all the required files?

I run the pg_basebackup from the Slave on /var/lib/pgsql/9.2/data.
So I'm not in the wrong directory...

I'm out of fresh ideas. The rsync command is what I would go with, given

that I can't think of any other commands to try.

I chose the pg_basebackup command just to not stop any database. It's out
of circumstances to stop even the slave one... sorry...

I really don't know what else to do. Have tried everything!

Lucas

On 10 January 2016 at 13:31, bricklen <bricklen@gmail.com> wrote:

Bottom-posting is the convention in the postgresql lists, and makes it
easier to follow a long thread.

On Sat, Jan 9, 2016 at 3:16 PM, drum.lucas@gmail.com <
drum.lucas@gmail.com> wrote:

My servers are not in the same network. A new pg_backup would take 30
hours to complete as I use --rate-limit 100MB.

If you had enough bandwidth, you could do some shell magic to parallelize
the rsync commands, or use something like
http://moo.nac.uci.edu/~hjm/parsync/ to do that. If you are limited by
bandwidth, then a single rsync run is probably what you're stuck with.

I really need to put his server up! =\

If you were running zfs you could also take a snapshot of the fs and use
that for your base backup, but I assume you would have mentioned that if it
was an option.

I don't think that running a pg_basebackup one more time will solve the
problem, because I've already done that!
I could run actually, but the problem is that it takes 30h! hahahahah

What is the --pgdata=- in your original command? Are you perhaps in the
wrong directory and not getting all the required files?

I'm out of fresh ideas. The rsync command is what I would go with, given
that I can't think of any other commands to try.

*Have a look:*
http://www.postgresql.org/docs/9.2/static/app-pgbasebackup.html

Note that there are some limitations in an online backup from the

standby:

The backup history file is not created in the database cluster backed up.

There is no guarantee that all WAL files required for the backup are
archived at the end of backup. If you are planning to use the backup for an
archive recovery and want to ensure that all required files are available
at that moment, you need to include them into the backup by using -x
option.

You had that in your original command I believe.