Primary key Index Error
Postgres recovered from data folder , after that some queries shows error
select * from table2 order by app_id ; - its work ( 50000000 data)
select * from table2 order by app_id desc ; - its work
Here app_id contains binary index
select * from table2 order by id ; - its work ( 50000000 data)
select * from table2 order by id desc ; - shows following error
manoj[local] postgres ERROR: index "tbl2_id_pkey" contains unexpected zero
page at block 311121
manoj [local] postgres HINT: Please REINDEX it.
index "tbl2_id_pkey" is primary key
Reindex not pratical this table (250 GB data)
How i can solve this ?
Regards
Manoj K P
--
View this message in context: http://postgresql.1045698.n5.nabble.com/Primary-key-Index-Error-tp4931714p4931714.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
On Mon, Oct 24, 2011 at 3:22 AM, Manoj K P <manoj@comodo.com> wrote:
Postgres recovered from data folder , after that some queries shows error
select * from table2 order by app_id ; - its work ( 50000000 data)
select * from table2 order by app_id desc ; - its workHere app_id contains binary index
select * from table2 order by id ; - its work ( 50000000 data)
select * from table2 order by id desc ; - shows following errormanoj[local] postgres ERROR: index "tbl2_id_pkey" contains unexpected zero
page at block 311121
manoj [local] postgres HINT: Please REINDEX it.index "tbl2_id_pkey" is primary key
Reindex not pratical this table (250 GB data)
How i can solve this ?
It looks like you have data corruption. Your first priority should be
to do the minimum necessary to get a usable database dump and then to
get back online. I would consider your database off line until this
is done. Essentially, the only practical ways to repair the error you
see would be to reindex or restore from backup. Did you have any
unexpected power events? Server crashes?
merlin
On 10/24/2011 06:38 PM, Merlin Moncure wrote:
On Mon, Oct 24, 2011 at 3:22 AM, Manoj K P<manoj@comodo.com> wrote:
Postgres recovered from data folder , after that some queries shows error
select * from table2 order by app_id ; - its work ( 50000000 data)
select * from table2 order by app_id desc ; - its workHere app_id contains binary index
select * from table2 order by id ; - its work ( 50000000 data)
select * from table2 order by id desc ; - shows following errormanoj[local] postgres ERROR: index "tbl2_id_pkey" contains unexpected zero
page at block 311121
manoj [local] postgres HINT: Please REINDEX it.index "tbl2_id_pkey" is primary key
Reindex not pratical this table (250 GB data)
How i can solve this ?
It looks like you have data corruption. Your first priority should be
to do the minimum necessary to get a usable database dump and then to
get back online. I would consider your database off line until this
is done. Essentially, the only practical ways to repair the error you
see would be to reindex or restore from backup. Did you have any
unexpected power events? Server crashes?merlin
Actually this is backup server ,no power failure& Server crashes happens in between , The data folder is base backup(|pg_start_backup)| on the server. pg_dump not practical in this case because of data size.
After recovering from base backup run WAL recovery file successfully and its updates current date data.
When i am running query on big table (more than 5 Million data) shows error other wise its work fine.
select id , name from table2 order by id ; It uses the index
select id , name from table2 order by id desc ; - shows the error
Error happens only using (DESC clause against primary key) , all other case its work fine
REINDEX will take ages
--
*Manoj K P*
*Postgres DBA*
*Comodo India*
Attachments:
On Mon, Oct 24, 2011 at 8:32 AM, manoj <manoj@comodo.com> wrote:
On 10/24/2011 06:38 PM, Merlin Moncure wrote:
On Mon, Oct 24, 2011 at 3:22 AM, Manoj K P <manoj@comodo.com> wrote:
Postgres recovered from data folder , after that some queries shows error
select * from table2 order by app_id ; - its work ( 50000000 data)
select * from table2 order by app_id desc ; - its workHere app_id contains binary index
select * from table2 order by id ; - its work ( 50000000 data)
select * from table2 order by id desc ; - shows following errormanoj[local] postgres ERROR: index "tbl2_id_pkey" contains unexpected zero
page at block 311121
manoj [local] postgres HINT: Please REINDEX it.index "tbl2_id_pkey" is primary key
Reindex not pratical this table (250 GB data)
How i can solve this ?
It looks like you have data corruption. Your first priority should be
to do the minimum necessary to get a usable database dump and then to
get back online. I would consider your database off line until this
is done. Essentially, the only practical ways to repair the error you
see would be to reindex or restore from backup. Did you have any
unexpected power events? Server crashes?merlin
Actually this is backup server ,no power failure & Server crashes happens
in between , The data folder is base backup(pg_start_backup) on the server.
pg_dump not practical in this case because of data size.
After recovering from base backup run WAL recovery file successfully and
its updates current date data.
When i am running query on big table (more than 5 Million data) shows
error other wise its work fine.select id , name from table2 order by id ; It uses the index
select id , name from table2 order by id desc ; - shows the error
Error happens only using (DESC clause against primary key) , all other case
its work fineREINDEX will take ages
do you happen to still have the database logs on the backup from
startup to end of recovery? anything interesting in there?
merlin
*Server log*
Oct 1 00:06:59 server_host_name postgres[1453]: [5-1] 2011-10-01
00:06:59.831 EDT 1453 4e869041.5ad postgres [local] postgres LOG: duration:
418583.238 ms statement: select pg_start_backup('fortnightly');
Oct 2 03:03:18 server_host_name postgres[1453]: [6-1] 2011-10-02
03:03:18.243 EDT 1453 4e869041.5ad postgres [local] postgres LOG: duration:
8034.385 ms statement: select pg_stop_backup();
In between stop and start process server_host_name is receiving all type of
DML & DDL and generating new WAL file
Taking base backup in between start and stop process
/*Client Log Details*/
Oct 25 05:16:18 client_server_name postgres[28858]: [2-1] 2011-10-25
05:16:18.202 BST 28858 LOG: could not open file
"pg_xlog/00002710000047B10000008C" (log file 18353, segment 140): No such
file or directory
Oct 25 05:16:18 client_server_name postgres[28858]: [3-1] 2011-10-25
05:16:18.203 BST 28858 LOG: invalid checkpoint record
Oct 25 05:16:18 client_server_name postgres[28858]: [4-1] 2011-10-25
05:16:18.203 BST 28858 FATAL: could not locate required checkpoint record
Oct 25 05:16:18 client_server_name postgres[28858]: [4-2] 2011-10-25
05:16:18.203 BST 28858 HINT: If you are not restoring from a backup, try
removing the file "/mnt/new_cluster/backup_label".
Oct 25 05:16:18 client_server_name postgres[28857]: [1-1] 2011-10-25
05:16:18.205 BST 28857 LOG: startup process (PID 28858) exited with exit
code 1
Oct 25 05:16:18 client_server_name postgres[28857]: [2-1] 2011-10-25
05:16:18.205 BST 28857 LOG: aborting startup due to startup process failure
Oct 25 05:20:53 client_server_name postgres[29030]: [2-1] 2011-10-25
05:20:53.630 BST 29030 LOG: could not open file
"pg_xlog/00002710000047B100000068" (log file 18353, segment 104): No such
file or directory
Oct 25 05:20:53 client_server_name postgres[29030]: [3-1] 2011-10-25
05:20:53.630 BST 29030 FATAL: could not find redo location referenced by
checkpoint record
Oct 25 05:20:53 client_server_name postgres[29030]: [3-2] 2011-10-25
05:20:53.630 BST 29030 HINT: If you are not restoring from a backup, try
removing the file "/mnt/new_cluster/backup_label".
Oct 25 05:20:53 client_server_name postgres[29029]: [1-1] 2011-10-25
05:20:53.633 BST 29029 LOG: startup process (PID 29030) exited with exit
code 1
Oct 25 05:20:53 client_server_name postgres[29029]: [2-1] 2011-10-25
05:20:53.633 BST 29029 LOG: aborting startup due to startup process failure
manually copy following file to pg_xlog folder
00002710000047B10000008C
00002710000047B100000068
After words i can start postgres and accessing the database , but same error
--
View this message in context: http://postgresql.1045698.n5.nabble.com/Primary-key-Index-Error-tp4931714p4935172.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
On Tue, Oct 25, 2011 at 11:40 AM, Manoj K P <manoj@comodo.com> wrote:
*Server log* Oct 1 00:06:59 server_host_name postgres[1453]: [5-1]
2011-10-01 00:06:59.831 EDT 1453 4e869041.5ad postgres [local] postgres LOG:
duration: 418583.238 ms statement: select pg_start_backup('fortnightly');
Oct 2 03:03:18 server_host_name postgres[1453]: [6-1] 2011-10-02
03:03:18.243 EDT 1453 4e869041.5ad postgres [local] postgres LOG: duration:
8034.385 ms statement: select pg_stop_backup(); In between stop and start
process server_host_name is receiving all type of DML & DDL and generating
new WAL file Taking base backup in between start and stop process *Client
Log Details* Oct 25 05:16:18 client_server_name postgres[28858]: [2-1]
2011-10-25 05:16:18.202 BST 28858 LOG: could not open file
"pg_xlog/00002710000047B10000008C" (log file 18353, segment 140): No such
file or directory Oct 25 05:16:18 client_server_name postgres[28858]: [3-1]
2011-10-25 05:16:18.203 BST 28858 LOG: invalid checkpoint record Oct 25
05:16:18 client_server_name postgres[28858]: [4-1] 2011-10-25 05:16:18.203
BST 28858 FATAL: could not locate required checkpoint record Oct 25 05:16:18
client_server_name postgres[28858]: [4-2] 2011-10-25 05:16:18.203 BST 28858
HINT: If you are not restoring from a backup, try removing the file
"/mnt/new_cluster/backup_label". Oct 25 05:16:18 client_server_name
postgres[28857]: [1-1] 2011-10-25 05:16:18.205 BST 28857 LOG: startup
process (PID 28858) exited with exit code 1 Oct 25 05:16:18
client_server_name postgres[28857]: [2-1] 2011-10-25 05:16:18.205 BST 28857
LOG: aborting startup due to startup process failure Oct 25 05:20:53
client_server_name postgres[29030]: [2-1] 2011-10-25 05:20:53.630 BST 29030
LOG: could not open file "pg_xlog/00002710000047B100000068" (log file 18353,
segment 104): No such file or directory Oct 25 05:20:53 client_server_name
postgres[29030]: [3-1] 2011-10-25 05:20:53.630 BST 29030 FATAL: could not
find redo location referenced by checkpoint record Oct 25 05:20:53
client_server_name postgres[29030]: [3-2] 2011-10-25 05:20:53.630 BST 29030
HINT: If you are not restoring from a backup, try removing the file
"/mnt/new_cluster/backup_label". Oct 25 05:20:53 client_server_name
postgres[29029]: [1-1] 2011-10-25 05:20:53.633 BST 29029 LOG: startup
process (PID 29030) exited with exit code 1 Oct 25 05:20:53
client_server_name postgres[29029]: [2-1] 2011-10-25 05:20:53.633 BST 29029
LOG: aborting startup due to startup process failure manually copy following
file to pg_xlog folder 00002710000047B10000008C 00002710000047B100000068
After words i can start postgres and accessing the database , but same
error
As per the logs, do you see missing XLOG files in Archive Destination ? becz
these kind of situations mostly missing files will be in WAL-Archive
location. You need to copy to pg_xlog directory and start the instance.
As Merlin Said, you need to dig more to know why its crashing by increasing
the LOG-DEBUG level's. Increasing DEBUG level may occupy good amount of
space in log-location, so make sure you have good space for logs to get what
exactly happening at the time of backup in particular. Am not sure whether
its safe to attach ***backtrace*** to instance for information.
---
Regards,
Raghavendra
EnterpriseDB Corporation
Blog: http://raghavt.blogspot.com/
On Tue, Oct 25, 2011 at 2:41 AM, Raghavendra
<raghavendra.rao@enterprisedb.com> wrote:
On Tue, Oct 25, 2011 at 11:40 AM, Manoj K P <manoj@comodo.com> wrote:
Server log Oct 1 00:06:59 server_host_name postgres[1453]: [5-1]
2011-10-01 00:06:59.831 EDT 1453 4e869041.5ad postgres [local] postgres LOG:
duration: 418583.238 ms statement: select pg_start_backup('fortnightly');
Oct 2 03:03:18 server_host_name postgres[1453]: [6-1] 2011-10-02
03:03:18.243 EDT 1453 4e869041.5ad postgres [local] postgres LOG: duration:
8034.385 ms statement: select pg_stop_backup(); In between stop and start
process server_host_name is receiving all type of DML & DDL and generating
new WAL file Taking base backup in between start and stop process Client Log
Details Oct 25 05:16:18 client_server_name postgres[28858]: [2-1] 2011-10-25
05:16:18.202 BST 28858 LOG: could not open file
"pg_xlog/00002710000047B10000008C" (log file 18353, segment 140): No such
file or directory Oct 25 05:16:18 client_server_name postgres[28858]: [3-1]
2011-10-25 05:16:18.203 BST 28858 LOG: invalid checkpoint record Oct 25
05:16:18 client_server_name postgres[28858]: [4-1] 2011-10-25 05:16:18.203
BST 28858 FATAL: could not locate required checkpoint record Oct 25 05:16:18
client_server_name postgres[28858]: [4-2] 2011-10-25 05:16:18.203 BST 28858
HINT: If you are not restoring from a backup, try removing the file
"/mnt/new_cluster/backup_label". Oct 25 05:16:18 client_server_name
postgres[28857]: [1-1] 2011-10-25 05:16:18.205 BST 28857 LOG: startup
process (PID 28858) exited with exit code 1 Oct 25 05:16:18
client_server_name postgres[28857]: [2-1] 2011-10-25 05:16:18.205 BST 28857
LOG: aborting startup due to startup process failure Oct 25 05:20:53
client_server_name postgres[29030]: [2-1] 2011-10-25 05:20:53.630 BST 29030
LOG: could not open file "pg_xlog/00002710000047B100000068" (log file 18353,
segment 104): No such file or directory Oct 25 05:20:53 client_server_name
postgres[29030]: [3-1] 2011-10-25 05:20:53.630 BST 29030 FATAL: could not
find redo location referenced by checkpoint record Oct 25 05:20:53
client_server_name postgres[29030]: [3-2] 2011-10-25 05:20:53.630 BST 29030
HINT: If you are not restoring from a backup, try removing the file
"/mnt/new_cluster/backup_label". Oct 25 05:20:53 client_server_name
postgres[29029]: [1-1] 2011-10-25 05:20:53.633 BST 29029 LOG: startup
process (PID 29030) exited with exit code 1 Oct 25 05:20:53
client_server_name postgres[29029]: [2-1] 2011-10-25 05:20:53.633 BST 29029
LOG: aborting startup due to startup process failure manually copy following
file to pg_xlog folder 00002710000047B10000008C 00002710000047B100000068
After words i can start postgres and accessing the database , but same
errorAs per the logs, do you see missing XLOG files in Archive Destination ? becz
these kind of situations mostly missing files will be in WAL-Archive
location. You need to copy to pg_xlog directory and start the instance.As Merlin Said, you need to dig more to know why its crashing by increasing
the LOG-DEBUG level's. Increasing DEBUG level may occupy good amount of
space in log-location, so make sure you have good space for logs to get what
exactly happening at the time of backup in particular. Am not sure whether
its safe to attach ***backtrace*** to instance for information.
yeah. also, what's the setting of archive_command (or is it even set)?
taking a 'hot' filesystem backup without having an archive_command
and not doing any other intervention to guarantee the necessary WAL
segments are present will not give you a complete backup. my money is
on you having an invalid backup procedure. the only way to take a
filesystem snapshot without dealing with WAL files is to bring the
database down.
merlin