pg_basebackup from 9.4-bdr to 9.4 results in corrupt index
Hello,
I'm working with a 9.4-bdr cluster and want to move away from BDR tech all
together. So my idea was to follow instructions on
http://bdr-project.org/docs/stable/ to first strip the node from BDR making
it into "regular" node and then moving the data from this node to official
9.4 instance.
Environment:
BDR_Node_1
BDR_Node_2
Regular_Node_1
Steps done on BDR_Node_2
1. Part node from BDR cluster
http://bdr-project.org/docs/stable/node-management-removing.html
2. Remove BDR from node
http://bdr-project.org/docs/stable/node-management-disabling.html
3. Create SR slot select
pg_create_physical_replication_slot('new_cluster_node');
After this. Steps done on Regular_Node_1, which is 9.4.22
1. Init the new node with pg_basebackup with source being BDR_Node_1
2. Update recovery.conf and postgres.conf
3. Start it with pg_ctl
All seems to be working well, but for this:
select * from information_schema.sequences;
produces "[XX000] ERROR: cache lookup failed for index 899807" for all of
the DBs that were configured with BDR with various OID.
I can not reproduce this error, when running query on any of the DBs on
9.4-bdr node.
Does anyone have any idea what may be causing this ?
Due to this I'm also unable to run pg_dump on new_cluster_node - it also
results in the same error.
That said. For more context - If it worked, next steps would have been to
1. Deactivate virtual IP on BDR_Node_2
2. Check for no replication lag on BDR_Node_2 slot "new_cluster_node"
3. Promote Regular_Node_1
4. Activate virtual IP on BDR_Node_2
Build the new 9.4 cluster from there on.
Any ideas / comments will be much appreciated!
Thank you in advance.
Best regards, Janis Puris
On 5/25/19 11:49 AM, Jānis Pūris wrote:
Hello,
I'm working with a 9.4-bdr cluster and want to move away from BDR tech all
together. So my idea was to follow instructions on
http://bdr-project.org/docs/stable/ to first strip the node from BDR
making it into "regular" node and then moving the data from this node to
official 9.4 instance.Environment:
BDR_Node_1
BDR_Node_2Regular_Node_1
Steps done on BDR_Node_2
1. Part node from BDR cluster
http://bdr-project.org/docs/stable/node-management-removing.html
2. Remove BDR from node
http://bdr-project.org/docs/stable/node-management-disabling.html
3. Create SR slot select
pg_create_physical_replication_slot('new_cluster_node');After this. Steps done on Regular_Node_1, which is 9.4.22
1. Init the new node with pg_basebackup with source being BDR_Node_1
2. Update recovery.conf and postgres.conf
3. Start it with pg_ctlAll seems to be working well, but for this:
select * from information_schema.sequences;
produces "[XX000] ERROR: cache lookup failed for index 899807" for all of
the DBs that were configured with BDR with various OID.I can not reproduce this error, when running query on any of the DBs on
9.4-bdr node.Does anyone have any idea what may be causing this ?
Due to this I'm also unable to run pg_dump on new_cluster_node - it also
results in the same error.That said. For more context - If it worked, next steps would have been to
1. Deactivate virtual IP on BDR_Node_2
2. Check for no replication lag on BDR_Node_2 slot "new_cluster_node"
3. Promote Regular_Node_1
4. Activate virtual IP on BDR_Node_2Build the new 9.4 cluster from there on.
Any ideas / comments will be much appreciated!
1. Are you sure that you removed all BDR from the node?
2. Is the corruption there in BDR_Node_1?
3. Can you rebuild the indexes?
--
Angular momentum makes the world go 'round.
On 5/25/19 9:49 AM, Jānis Pūris wrote:
Hello,
I'm working with a 9.4-bdr cluster and want to move away from BDR tech
all together. So my idea was to follow instructions on
http://bdr-project.org/docs/stable/ to first strip the node from BDR
making it into "regular" node and then moving the data from this node to
official 9.4 instance.Environment:
BDR_Node_1
BDR_Node_2Regular_Node_1
Steps done on BDR_Node_2
1. Part node from BDR cluster
http://bdr-project.org/docs/stable/node-management-removing.html
2. Remove BDR from node
http://bdr-project.org/docs/stable/node-management-disabling.html
3. Create SR slot select
pg_create_physical_replication_slot('new_cluster_node');After this. Steps done on Regular_Node_1, which is 9.4.22
1. Init the new node with pg_basebackup with source being BDR_Node_1
I am not clear about above:
1) You removed BDR_Node_2 from cluster
2) You took pg_basebackup from BDR_Node_1 to create a new regular cluster.
3) 1) & 2) seem to be at odds with each other.
2. Update recovery.conf and postgres.conf
3. Start it with pg_ctlAll seems to be working well, but for this:
select * from information_schema.sequences;
produces "[XX000] ERROR: cache lookup failed for index 899807" for all
of the DBs that were configured with BDR with various OID.I can not reproduce this error, when running query on any of the DBs on
9.4-bdr node.Does anyone have any idea what may be causing this ?
Due to this I'm also unable to run pg_dump on new_cluster_node - it also
results in the same error.That said. For more context - If it worked, next steps would have been to
1. Deactivate virtual IP on BDR_Node_2
2. Check for no replication lag on BDR_Node_2 slot "new_cluster_node"
3. Promote Regular_Node_1
4. Activate virtual IP on BDR_Node_2Build the new 9.4 cluster from there on.
Any ideas / comments will be much appreciated!
Thank you in advance.
Best regards, Janis Puris
--
Adrian Klaver
adrian.klaver@aklaver.com
Hi, Adrian
Apologies, it is a typo.
pg_basebackup was taken from BDR_Node_2, not BDR_Node_1
Thank you in advance.
Best regards, Janis Puris
On 25 May 2019 at 19:27:42, Adrian Klaver (adrian.klaver@aklaver.com) wrote:
I am not clear about above:
1) You removed BDR_Node_2 from cluster
2) You took pg_basebackup from BDR_Node_1 to create a new regular cluster.
3) 1) & 2) seem to be at odds with each other.
Hi Ron,
I can not reproduce this error on BDR_Node_2 (it is not BDR_Node_1 as
stated before. Typo)
I've been successful in transferring the data with pg_dump on BDR_Node_2
and then restoring it on Regular_Node_1. Then running "select * from
information_schema.sequences;" all is OK.
The problem with this approach is that I'm required to have minimal
downtime in this transition and we have a lot of data to transfer, which
would be lengthy process.
Thank you in advance.
Best regards, Janis Puris
On 25 May 2019 at 19:16:11, Ron (ronljohnsonjr@gmail.com) wrote:
1. Are you sure that you removed all BDR from the node?
2. Is the corruption there in BDR_Node_1?
3. Can you rebuild the indexes?
On 5/25/19 11:46 AM, Jānis Pūris wrote:
Hi Ron,
I can not reproduce this error on BDR_Node_2 (it is not BDR_Node_1 as
stated before. Typo)I've been successful in transferring the data with pg_dump on BDR_Node_2
and then restoring it on Regular_Node_1. Then running "select * from
information_schema.sequences;" all is OK.
So the issue is that a binary backup/restore via pg_basebackup fails but
a logical backup/restore via pg_dump/pg_restore works, correct?
You might want to take a look here:
https://github.com/2ndQuadrant/bdr/issues/140
It might make sense to you, it does not to me. Looks to me something is
being done on the binary level that makes this difficult. I guessing you
are going to have to talk to the BDR folks.
The problem with this approach is that I'm required to have minimal
downtime in this transition and we have a lot of data to transfer, which
would be lengthy process.Thank you in advance.
Best regards, Janis PurisOn 25 May 2019 at 19:16:11, Ron (ronljohnsonjr@gmail.com
<mailto:ronljohnsonjr@gmail.com>) wrote:1. Are you sure that you removed all BDR from the node?
2. Is the corruption there in BDR_Node_1?
3. Can you rebuild the indexes?
--
Adrian Klaver
adrian.klaver@aklaver.com
Managed to find following in the announcement of BDR on 9.6:
"BDR has always been an extension, but *on 9.4 it required a heavily
patched PostgreSQL, one that isn’t fully on-disk-format compatible with
stock community PostgreSQL 9.4.* The goal all along has been to allow it to
run as an extension on an unmodified PostgreSQL … and now we’re there."
Source: https://www.2ndquadrant.com/en/blog/bdr-is-coming-to-postgresql-9-6/
I suppose this answers why logical dump / restore works, but pg_basebackup
fails - the stock 9.4 and BDR 9.4 are not compatible on disk level.
On 25 May 2019 at 23:26:26, Adrian Klaver (adrian.klaver@aklaver.com) wrote:
On 5/25/19 11:46 AM, Jānis Pūris wrote:
Hi Ron,
I can not reproduce this error on BDR_Node_2 (it is not BDR_Node_1 as
stated before. Typo)I've been successful in transferring the data with pg_dump on BDR_Node_2
and then restoring it on Regular_Node_1. Then running "select * from
information_schema.sequences;" all is OK.
So the issue is that a binary backup/restore via pg_basebackup fails but
a logical backup/restore via pg_dump/pg_restore works, correct?
You might want to take a look here:
https://github.com/2ndQuadrant/bdr/issues/140
It might make sense to you, it does not to me. Looks to me something is
being done on the binary level that makes this difficult. I guessing you
are going to have to talk to the BDR folks.
The problem with this approach is that I'm required to have minimal
downtime in this transition and we have a lot of data to transfer, which
would be lengthy process.Thank you in advance.
Best regards, Janis PurisOn 25 May 2019 at 19:16:11, Ron (ronljohnsonjr@gmail.com
<mailto:ronljohnsonjr@gmail.com>) wrote:1. Are you sure that you removed all BDR from the node?
2. Is the corruption there in BDR_Node_1?
3. Can you rebuild the indexes?
--
Adrian Klaver
adrian.klaver@aklaver.com