Data corruption after restarting replica

Started by Novák, Petrabout 11 years ago7 messagesbugsgeneral
Jump to latest
#1Novák, Petr
novakp@avast.com
bugsgeneral

Hi all,

we're experiencing data corruption after switching streamed replica to
primary.
This is not the first time I've encountered this issue, so I'l try to
describe it in more detail.

For this particular cluster we have 6 servers in two datacenters (3 in
each). There are two instances running on each server, each with its own
port and datadir. On the first two servers in each datacenter one instance
is primary and the other is replica for the primary from the other server.
Third server holds two offsite replicas from the other datacenter (for DR
purposes)

Each replica was set up by taking pg_basebackup from primary (pg_basebackup
-h <hostname> -p 5430 -D /data2/basebackup -P -v -U <user> -x -c fast).
Then directories from initdb were replaced with the ones from basebackup
(only the configuration files remained) and the replica started and was
successfully connected to primary. It was running with no problem keeping
up with the primary. We were experiencing some connection problem between
the two datacenters, but replication didn't break.

Then we needed to take one datacenter offline due to hardware maintenance.
So I've switched the applications down, verified that no more clients were
connected to primary, then shut the primary down and restarted replica
without recovery.conf and the application were started using the new db
with no problem. Other replica even successfully reconnected to this new
primary.

Few hours from the switch lines appeared in the server log (which didn't
appear before), indicating a corruption:

ERROR: index "account_username_key" contains unexpected zero page at block
1112135
ERROR: right sibling's left-link doesn't match: block 476354 links to
1062443 instead of expected 250322 in index "account_pkey"

..and many more reporting corruption in several other indexes.

The issue was resolved by creating new indexes and dropping the affected
ones, although there were already some duplicities in the data, that has to
be resolved, as some of the indexes were unique.

This particular case uses Postgres 9.1.14 on both primary and replica. But
I've experienced similar behavior on 9.2.9. OS Centos 6.6 in all cases.
This may mean, that there can be something wrong with our configuration or
the replication setup steps, but I've set up another instance using the
same steps with no problem.

Fsync related setting are at their defaults. Data directories are on RAID10
arrays, with BBUs. Filesystem is ext4 mounted with nobarrier option.

Database is fairly large ~120GB with several 50mil+ tables, lots of indexes
and FK constraints. It is mostly queried, updates/inserts/deletes are only
several rows/s.

Any help will be appreciated.

Petr Novak

System Engineer
Avast s.r.o.

#2Novák, Petr
novakp@avast.com
In reply to: Novák, Petr (#1)
bugsgeneral
Fwd: Data corruption after restarting replica

Hello,

sorry for posting to second list, but as I've received no reply
there, I'm trying my luck here.

Thanks
Petr

---------- Forwarded message ----------
From: Novák, Petr <novakp@avast.com>
Date: Tue, Feb 10, 2015 at 12:49 PM
Subject: Data corruption after restarting replica
To: pgsql-bugs@postgresql.org

Hi all,

we're experiencing data corruption after switching streamed replica to primary.
This is not the first time I've encountered this issue, so I'l try to
describe it in more detail.

For this particular cluster we have 6 servers in two datacenters (3 in
each). There are two instances running on each server, each with its
own port and datadir. On the first two servers in each datacenter one
instance is primary and the other is replica for the primary from the
other server. Third server holds two offsite replicas from the other
datacenter (for DR purposes)

Each replica was set up by taking pg_basebackup from primary
(pg_basebackup -h <hostname> -p 5430 -D /data2/basebackup -P -v -U
<user> -x -c fast). Then directories from initdb were replaced with
the ones from basebackup (only the configuration files remained) and
the replica started and was successfully connected to primary. It was
running with no problem keeping up with the primary. We were
experiencing some connection problem between the two datacenters, but
replication didn't break.

Then we needed to take one datacenter offline due to hardware
maintenance. So I've switched the applications down, verified that no
more clients were connected to primary, then shut the primary down and
restarted replica without recovery.conf and the application were
started using the new db with no problem. Other replica even
successfully reconnected to this new primary.

Few hours from the switch lines appeared in the server log (which
didn't appear before), indicating a corruption:

ERROR: index "account_username_key" contains unexpected zero page at
block 1112135
ERROR: right sibling's left-link doesn't match: block 476354 links to
1062443 instead of expected 250322 in index "account_pkey"

..and many more reporting corruption in several other indexes.

The issue was resolved by creating new indexes and dropping the
affected ones, although there were already some duplicities in the
data, that has to be resolved, as some of the indexes were unique.

This particular case uses Postgres 9.1.14 on both primary and replica.
But I've experienced similar behavior on 9.2.9. OS Centos 6.6 in all
cases. This may mean, that there can be something wrong with our
configuration or the replication setup steps, but I've set up another
instance using the same steps with no problem.

Fsync related setting are at their defaults. Data directories are on
RAID10 arrays, with BBUs. Filesystem is ext4 mounted with nobarrier
option.

Database is fairly large ~120GB with several 50mil+ tables, lots of
indexes and FK constraints. It is mostly queried,
updates/inserts/deletes are only several rows/s.

Any help will be appreciated.

Petr Novak

System Engineer
Avast s.r.o.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Novák, Petr (#2)
bugsgeneral
Re: Fwd: Data corruption after restarting replica

On 02/16/2015 02:44 AM, Novák, Petr wrote:

Hello,

sorry for posting to second list, but as I've received no reply
there, I'm trying my luck here.

Thanks
Petr

---------- Forwarded message ----------
From: Novák, Petr <novakp@avast.com>
Date: Tue, Feb 10, 2015 at 12:49 PM
Subject: Data corruption after restarting replica
To: pgsql-bugs@postgresql.org

Hi all,

we're experiencing data corruption after switching streamed replica to primary.
This is not the first time I've encountered this issue, so I'l try to
describe it in more detail.

For this particular cluster we have 6 servers in two datacenters (3 in
each). There are two instances running on each server, each with its
own port and datadir. On the first two servers in each datacenter one
instance is primary and the other is replica for the primary from the
other server. Third server holds two offsite replicas from the other
datacenter (for DR purposes)

Each replica was set up by taking pg_basebackup from primary
(pg_basebackup -h <hostname> -p 5430 -D /data2/basebackup -P -v -U
<user> -x -c fast). Then directories from initdb were replaced with
the ones from basebackup (only the configuration files remained) and
the replica started and was successfully connected to primary. It was
running with no problem keeping up with the primary. We were
experiencing some connection problem between the two datacenters, but
replication didn't break.

Then we needed to take one datacenter offline due to hardware
maintenance. So I've switched the applications down, verified that no
more clients were connected to primary, then shut the primary down and
restarted replica without recovery.conf and the application were
started using the new db with no problem. Other replica even
successfully reconnected to this new primary.

What other replica?

Few hours from the switch lines appeared in the server log (which
didn't appear before), indicating a corruption:

ERROR: index "account_username_key" contains unexpected zero page at
block 1112135
ERROR: right sibling's left-link doesn't match: block 476354 links to
1062443 instead of expected 250322 in index "account_pkey"

..and many more reporting corruption in several other indexes.

What happened to the primary you shut down?

The issue was resolved by creating new indexes and dropping the
affected ones, although there were already some duplicities in the
data, that has to be resolved, as some of the indexes were unique.

This particular case uses Postgres 9.1.14 on both primary and replica.
But I've experienced similar behavior on 9.2.9. OS Centos 6.6 in all
cases. This may mean, that there can be something wrong with our
configuration or the replication setup steps, but I've set up another
instance using the same steps with no problem.

Fsync related setting are at their defaults. Data directories are on
RAID10 arrays, with BBUs. Filesystem is ext4 mounted with nobarrier
option.

Database is fairly large ~120GB with several 50mil+ tables, lots of
indexes and FK constraints. It is mostly queried,
updates/inserts/deletes are only several rows/s.

Any help will be appreciated.

Petr Novak

System Engineer
Avast s.r.o.

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4dinesh kumar
dineshkumar02@gmail.com
In reply to: Novák, Petr (#2)
bugsgeneral
Re: Fwd: Data corruption after restarting replica

Hi,

On Mon, Feb 16, 2015 at 2:44 AM, Novák, Petr <novakp@avast.com> wrote:

Hello,

sorry for posting to second list, but as I've received no reply
there, I'm trying my luck here.

Thanks
Petr

---------- Forwarded message ----------
From: Novák, Petr <novakp@avast.com>
Date: Tue, Feb 10, 2015 at 12:49 PM
Subject: Data corruption after restarting replica
To: pgsql-bugs@postgresql.org

Hi all,

we're experiencing data corruption after switching streamed replica to
primary.
This is not the first time I've encountered this issue, so I'l try to
describe it in more detail.

For this particular cluster we have 6 servers in two datacenters (3 in
each). There are two instances running on each server, each with its
own port and datadir. On the first two servers in each datacenter one
instance is primary and the other is replica for the primary from the
other server. Third server holds two offsite replicas from the other
datacenter (for DR purposes)

Each replica was set up by taking pg_basebackup from primary
(pg_basebackup -h <hostname> -p 5430 -D /data2/basebackup -P -v -U
<user> -x -c fast). Then directories from initdb were replaced with
the ones from basebackup (only the configuration files remained) and
the replica started and was successfully connected to primary. It was
running with no problem keeping up with the primary. We were
experiencing some connection problem between the two datacenters, but
replication didn't break.

Then we needed to take one datacenter offline due to hardware
maintenance. So I've switched the applications down, verified that no
more clients were connected to primary, then shut the primary down and
restarted replica without recovery.conf and the application were
started using the new db with no problem. Other replica even
successfully reconnected to this new primary.

Before restarting replica, did you make sure that, all master transactions
applied to replication node.
May we know, why did you restarted replica without recovery.conf. Do you
want to maintain the same timeline for the xlogs. Or any specific other
reasons. ??

Regards,
Dinesh
manojadinesh.blogspot.com

Show quoted text

Few hours from the switch lines appeared in the server log (which
didn't appear before), indicating a corruption:

ERROR: index "account_username_key" contains unexpected zero page at
block 1112135
ERROR: right sibling's left-link doesn't match: block 476354 links to
1062443 instead of expected 250322 in index "account_pkey"

..and many more reporting corruption in several other indexes.

The issue was resolved by creating new indexes and dropping the
affected ones, although there were already some duplicities in the
data, that has to be resolved, as some of the indexes were unique.

This particular case uses Postgres 9.1.14 on both primary and replica.
But I've experienced similar behavior on 9.2.9. OS Centos 6.6 in all
cases. This may mean, that there can be something wrong with our
configuration or the replication setup steps, but I've set up another
instance using the same steps with no problem.

Fsync related setting are at their defaults. Data directories are on
RAID10 arrays, with BBUs. Filesystem is ext4 mounted with nobarrier
option.

Database is fairly large ~120GB with several 50mil+ tables, lots of
indexes and FK constraints. It is mostly queried,
updates/inserts/deletes are only several rows/s.

Any help will be appreciated.

Petr Novak

System Engineer
Avast s.r.o.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Novák, Petr (#2)
bugsgeneral
Re: Fwd: Data corruption after restarting replica

On 16.2.2015 11:44, Novák, Petr wrote:

Hello,

sorry for posting to second list, but as I've received no reply
there, I'm trying my luck here.

Thanks
Petr

---------- Forwarded message ----------
From: Novák, Petr <novakp@avast.com>
Date: Tue, Feb 10, 2015 at 12:49 PM
Subject: Data corruption after restarting replica
To: pgsql-bugs@postgresql.org

Hi all,

we're experiencing data corruption after switching streamed replica
to primary.This is not the first time I've encountered this issue,
so I'l try to describe it in more detail.

For this particular cluster we have 6 servers in two datacenters (3
in each). There are two instances running on each server, each with
its own port and datadir. On the first two servers in each datacenter
one instance is primary and the other is replica for the primary from
the other server. Third server holds two offsite replicas from the
other datacenter (for DR purposes)

So essentially you have three servers in each datacenter, configured
like this:

server A
--------
instance A1 (primary)
instance A2 (standby for B1)

server B
--------
instance B1 (primary)
instance B2 (standby for B2)

server C (in the other datacenter)
----------------------------------
instance C1 (standby for A1)
instance C2 (standby for B1)

So if A or B fails, you can swich to the other machine to the replica in
th same datacenter, and if the whole datacenter fails then you have C in
the other one.

Correct?

Each replica was set up by taking pg_basebackup from primary
(pg_basebackup -h <hostname> -p 5430 -D /data2/basebackup -P -v -U
<user> -x -c fast). Then directories from initdb were replaced with
the ones from basebackup (only the configuration files remained) and
the replica started and was successfully connected to primary. It
was running with no problem keeping up with the primary. We were
experiencing some connection problem between the two datacenters,
but replication didn't break.

OK, that sounds about right. How long was the replication running before
the failover? Have you tried running some queries on the standby?

BTW are you running async or sync replication (I guess async)?

Then we needed to take one datacenter offline due to hardware
maintenance. So I've switched the applications down, verified that
no more clients were connected to primary, then shut the primary down
and restarted replica without recovery.conf and the application were
started using the new db with no problem. Other replica even
successfully reconnected to this new primary.

Other replica? If I understand your deployment right, and you've
switched to the other datacenter, all the related replicas (original
master + first standby) were in the other datacetenter (now offline). So
what replica reconnected to the newly promoted primary?

Restart after removing recovery.conf is a bit cumbersome because of the
restart (pg_ctl promote does not need that IIRC), but it's a legal way
to do the promote.

Few hours from the switch lines appeared in the server log (which
didn't appear before), indicating a corruption:

ERROR: index "account_username_key" contains unexpected zero page at
block 1112135 ERROR: right sibling's left-link doesn't match: block
476354 links to 1062443 instead of expected 250322 in index
"account_pkey"

..and many more reporting corruption in several other indexes.

The issue was resolved by creating new indexes and dropping the
affected ones, although there were already some duplicities in the
data, that has to be resolved, as some of the indexes were unique.

So the machine was running for a few hours just fine, and then something
happened and these errors started occuring?

Can you check whether the table was vacuumed recently (before the errors
started to appear) or what else interesting happened?

This particular case uses Postgres 9.1.14 on both primary and
replica. But I've experienced similar behavior on 9.2.9. OS Centos
6.6 in all cases. This may mean, that there can be something wrong
with our configuration or the replication setup steps, but I've set
up another instance using the same steps with no problem.

How different are the systems? I see you're running 6.6 on both ends,
but maybe there's some tiny difference? One common issue we see from
time to time are minor locale differences, affecting the indexes.

Fsync related setting are at their defaults. Data directories are on
RAID10 arrays, with BBUs. Filesystem is ext4 mounted with nobarrier
option.

Good ;-)

Database is fairly large ~120GB with several 50mil+ tables, lots of
indexes and FK constraints. It is mostly queried,
updates/inserts/deletes are only several rows/s.

So it was queried for a few hours? Any idea if it was using the indexes,
or if it was choosing different execution plans?

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Novák, Petr
novakp@avast.com
In reply to: Adrian Klaver (#3)
bugsgeneral
Re: Fwd: Data corruption after restarting replica

Hi Adrian,

On Wed, Feb 18, 2015 at 10:25 PM, Adrian Klaver
<adrian.klaver@aklaver.com> wrote:

On 02/16/2015 02:44 AM, Novák, Petr wrote:

Hello,

sorry for posting to second list, but as I've received no reply
there, I'm trying my luck here.

Thanks
Petr

---------- Forwarded message ----------
From: Novák, Petr <novakp@avast.com>
Date: Tue, Feb 10, 2015 at 12:49 PM
Subject: Data corruption after restarting replica
To: pgsql-bugs@postgresql.org

Hi all,

we're experiencing data corruption after switching streamed replica to
primary.
This is not the first time I've encountered this issue, so I'l try to
describe it in more detail.

For this particular cluster we have 6 servers in two datacenters (3 in
each). There are two instances running on each server, each with its
own port and datadir. On the first two servers in each datacenter one
instance is primary and the other is replica for the primary from the
other server. Third server holds two offsite replicas from the other
datacenter (for DR purposes)

Each replica was set up by taking pg_basebackup from primary
(pg_basebackup -h <hostname> -p 5430 -D /data2/basebackup -P -v -U
<user> -x -c fast). Then directories from initdb were replaced with
the ones from basebackup (only the configuration files remained) and
the replica started and was successfully connected to primary. It was
running with no problem keeping up with the primary. We were
experiencing some connection problem between the two datacenters, but
replication didn't break.

Then we needed to take one datacenter offline due to hardware
maintenance. So I've switched the applications down, verified that no
more clients were connected to primary, then shut the primary down and
restarted replica without recovery.conf and the application were
started using the new db with no problem. Other replica even
successfully reconnected to this new primary.

What other replica?

Each primary has two replicas. One is in the same datacenter as
primary (as a hot standby, should the primary server fail) and other
is in another datacenter (for DR purposes).
I've restarted the "DR" replica as the new primary and the "hot
standby" replica reconected to it.

Few hours from the switch lines appeared in the server log (which
didn't appear before), indicating a corruption:

ERROR: index "account_username_key" contains unexpected zero page at
block 1112135
ERROR: right sibling's left-link doesn't match: block 476354 links to
1062443 instead of expected 250322 in index "account_pkey"

..and many more reporting corruption in several other indexes.

What happened to the primary you shut down?

It has been reinstalled, as its role has been moved to another server.
But its logs didn't contain the index errors.

The issue was resolved by creating new indexes and dropping the
affected ones, although there were already some duplicities in the
data, that has to be resolved, as some of the indexes were unique.

This particular case uses Postgres 9.1.14 on both primary and replica.
But I've experienced similar behavior on 9.2.9. OS Centos 6.6 in all
cases. This may mean, that there can be something wrong with our
configuration or the replication setup steps, but I've set up another
instance using the same steps with no problem.

Fsync related setting are at their defaults. Data directories are on
RAID10 arrays, with BBUs. Filesystem is ext4 mounted with nobarrier
option.

Database is fairly large ~120GB with several 50mil+ tables, lots of
indexes and FK constraints. It is mostly queried,
updates/inserts/deletes are only several rows/s.

Any help will be appreciated.

Petr Novak

System Engineer
Avast s.r.o.

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Novák, Petr
novakp@avast.com
In reply to: dinesh kumar (#4)
bugsgeneral
Re: Fwd: Data corruption after restarting replica

Hi Dinesh

On Wed, Feb 18, 2015 at 11:01 PM, dinesh kumar <dineshkumar02@gmail.com> wrote:

Hi,

On Mon, Feb 16, 2015 at 2:44 AM, Novák, Petr <novakp@avast.com> wrote:

Hello,

sorry for posting to second list, but as I've received no reply
there, I'm trying my luck here.

Thanks
Petr

---------- Forwarded message ----------
From: Novák, Petr <novakp@avast.com>
Date: Tue, Feb 10, 2015 at 12:49 PM
Subject: Data corruption after restarting replica
To: pgsql-bugs@postgresql.org

Hi all,

we're experiencing data corruption after switching streamed replica to
primary.
This is not the first time I've encountered this issue, so I'l try to
describe it in more detail.

For this particular cluster we have 6 servers in two datacenters (3 in
each). There are two instances running on each server, each with its
own port and datadir. On the first two servers in each datacenter one
instance is primary and the other is replica for the primary from the
other server. Third server holds two offsite replicas from the other
datacenter (for DR purposes)

Each replica was set up by taking pg_basebackup from primary
(pg_basebackup -h <hostname> -p 5430 -D /data2/basebackup -P -v -U
<user> -x -c fast). Then directories from initdb were replaced with
the ones from basebackup (only the configuration files remained) and
the replica started and was successfully connected to primary. It was
running with no problem keeping up with the primary. We were
experiencing some connection problem between the two datacenters, but
replication didn't break.

Then we needed to take one datacenter offline due to hardware
maintenance. So I've switched the applications down, verified that no
more clients were connected to primary, then shut the primary down and
restarted replica without recovery.conf and the application were
started using the new db with no problem. Other replica even
successfully reconnected to this new primary.

Before restarting replica, did you make sure that, all master transactions
applied to replication node.

Yes.

May we know, why did you restarted replica without recovery.conf. Do you
want to maintain the same timeline for the xlogs. Or any specific other
reasons. ??

Exactly, to preserve the xlog timeline.

Regards,
Dinesh
manojadinesh.blogspot.com

Few hours from the switch lines appeared in the server log (which
didn't appear before), indicating a corruption:

ERROR: index "account_username_key" contains unexpected zero page at
block 1112135
ERROR: right sibling's left-link doesn't match: block 476354 links to
1062443 instead of expected 250322 in index "account_pkey"

..and many more reporting corruption in several other indexes.

The issue was resolved by creating new indexes and dropping the
affected ones, although there were already some duplicities in the
data, that has to be resolved, as some of the indexes were unique.

This particular case uses Postgres 9.1.14 on both primary and replica.
But I've experienced similar behavior on 9.2.9. OS Centos 6.6 in all
cases. This may mean, that there can be something wrong with our
configuration or the replication setup steps, but I've set up another
instance using the same steps with no problem.

Fsync related setting are at their defaults. Data directories are on
RAID10 arrays, with BBUs. Filesystem is ext4 mounted with nobarrier
option.

Database is fairly large ~120GB with several 50mil+ tables, lots of
indexes and FK constraints. It is mostly queried,
updates/inserts/deletes are only several rows/s.

Any help will be appreciated.

Petr Novak

System Engineer
Avast s.r.o.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general