Primary key Index Error

Started by Manoj K Pover 14 years ago7 messagesgeneral
Jump to latest
#1Manoj K P
manoj@comodo.com

Postgres recovered from data folder , after that some queries shows error

select * from table2 order by app_id ; - its work ( 50000000 data)
select * from table2 order by app_id desc ; - its work

Here app_id contains binary index

select * from table2 order by id ; - its work ( 50000000 data)
select * from table2 order by id desc ; - shows following error

manoj[local] postgres ERROR: index "tbl2_id_pkey" contains unexpected zero
page at block 311121
manoj [local] postgres HINT: Please REINDEX it.

index "tbl2_id_pkey" is primary key

Reindex not pratical this table (250 GB data)

How i can solve this ?

Regards
Manoj K P

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Primary-key-Index-Error-tp4931714p4931714.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

#2Merlin Moncure
mmoncure@gmail.com
In reply to: Manoj K P (#1)
Re: Primary key Index Error

On Mon, Oct 24, 2011 at 3:22 AM, Manoj K P <manoj@comodo.com> wrote:

Postgres recovered from data folder , after that some queries shows error

select * from table2 order by app_id  ; - its work ( 50000000 data)
select * from table2 order by app_id   desc ; - its work

Here app_id contains  binary index

select * from table2 order by id  ; - its work ( 50000000 data)
select * from table2 order by id   desc ; - shows following error

manoj[local] postgres ERROR:  index "tbl2_id_pkey" contains unexpected zero
page at block 311121
manoj [local] postgres HINT:  Please REINDEX it.

index "tbl2_id_pkey" is primary key

Reindex not pratical this table (250 GB data)

How i can solve this ?

It looks like you have data corruption. Your first priority should be
to do the minimum necessary to get a usable database dump and then to
get back online. I would consider your database off line until this
is done. Essentially, the only practical ways to repair the error you
see would be to reindex or restore from backup. Did you have any
unexpected power events? Server crashes?

merlin

#3Manoj K P
manoj@comodo.com
In reply to: Merlin Moncure (#2)
Re: Primary key Index Error

On 10/24/2011 06:38 PM, Merlin Moncure wrote:

On Mon, Oct 24, 2011 at 3:22 AM, Manoj K P<manoj@comodo.com> wrote:

Postgres recovered from data folder , after that some queries shows error

select * from table2 order by app_id ; - its work ( 50000000 data)
select * from table2 order by app_id desc ; - its work

Here app_id contains binary index

select * from table2 order by id ; - its work ( 50000000 data)
select * from table2 order by id desc ; - shows following error

manoj[local] postgres ERROR: index "tbl2_id_pkey" contains unexpected zero
page at block 311121
manoj [local] postgres HINT: Please REINDEX it.

index "tbl2_id_pkey" is primary key

Reindex not pratical this table (250 GB data)

How i can solve this ?

It looks like you have data corruption. Your first priority should be
to do the minimum necessary to get a usable database dump and then to
get back online. I would consider your database off line until this
is done. Essentially, the only practical ways to repair the error you
see would be to reindex or restore from backup. Did you have any
unexpected power events? Server crashes?

merlin

Actually this is backup server ,no power failure& Server crashes happens in between , The data folder is base backup(|pg_start_backup)| on the server. pg_dump not practical in this case because of data size.
After recovering from base backup run WAL recovery file successfully and its updates current date data.
When i am running query on big table (more than 5 Million data) shows error other wise its work fine.

select id , name from table2 order by id ; It uses the index

select id , name from table2 order by id desc ; - shows the error

Error happens only using (DESC clause against primary key) , all other case its work fine

REINDEX will take ages

--
*Manoj K P*
*Postgres DBA*
*Comodo India*

Attachments:

smime.p7sapplication/pkcs7-signature; name=smime.p7sDownload
#4Merlin Moncure
mmoncure@gmail.com
In reply to: Manoj K P (#3)
Re: Primary key Index Error

On Mon, Oct 24, 2011 at 8:32 AM, manoj <manoj@comodo.com> wrote:

On 10/24/2011 06:38 PM, Merlin Moncure wrote:

On Mon, Oct 24, 2011 at 3:22 AM, Manoj K P <manoj@comodo.com> wrote:

Postgres recovered from data folder , after that some queries shows error

select * from table2 order by app_id  ; - its work ( 50000000 data)
select * from table2 order by app_id   desc ; - its work

Here app_id contains  binary index

select * from table2 order by id  ; - its work ( 50000000 data)
select * from table2 order by id   desc ; - shows following error

manoj[local] postgres ERROR:  index "tbl2_id_pkey" contains unexpected zero
page at block 311121
manoj [local] postgres HINT:  Please REINDEX it.

index "tbl2_id_pkey" is primary key

Reindex not pratical this table (250 GB data)

How i can solve this ?

It looks like you have data corruption. Your first priority should be
to do the minimum necessary to get a usable database dump and then to
get back online. I would consider your database off line until this
is done. Essentially, the only practical ways to repair the error you
see would be to reindex or restore from backup. Did you have any
unexpected power events? Server crashes?

merlin

  Actually this is backup server ,no power failure & Server crashes happens
in between , The data folder is base backup(pg_start_backup) on the server.
pg_dump not practical in this case because of data size.
 After recovering from base backup run  WAL recovery file successfully and
its updates current date data.
When i am running query on big table (more than 5 Million data) shows
error other wise its work fine.

select id , name from table2 order by id  ; It uses the index

select id , name from table2 order by id desc ; - shows the error

Error happens only using (DESC clause against primary key) , all other case
its work fine

 REINDEX will take ages

do you happen to still have the database logs on the backup from
startup to end of recovery? anything interesting in there?

merlin

#5Manoj K P
manoj@comodo.com
In reply to: Merlin Moncure (#4)
Re: Primary key Index Error

*Server log*

Oct 1 00:06:59 server_host_name postgres[1453]: [5-1] 2011-10-01
00:06:59.831 EDT 1453 4e869041.5ad postgres [local] postgres LOG: duration:
418583.238 ms statement: select pg_start_backup('fortnightly');

Oct 2 03:03:18 server_host_name postgres[1453]: [6-1] 2011-10-02
03:03:18.243 EDT 1453 4e869041.5ad postgres [local] postgres LOG: duration:
8034.385 ms statement: select pg_stop_backup();

In between stop and start process server_host_name is receiving all type of
DML & DDL and generating new WAL file

Taking base backup in between start and stop process

/*Client Log Details*/

Oct 25 05:16:18 client_server_name postgres[28858]: [2-1] 2011-10-25
05:16:18.202 BST 28858 LOG: could not open file
"pg_xlog/00002710000047B10000008C" (log file 18353, segment 140): No such
file or directory
Oct 25 05:16:18 client_server_name postgres[28858]: [3-1] 2011-10-25
05:16:18.203 BST 28858 LOG: invalid checkpoint record
Oct 25 05:16:18 client_server_name postgres[28858]: [4-1] 2011-10-25
05:16:18.203 BST 28858 FATAL: could not locate required checkpoint record
Oct 25 05:16:18 client_server_name postgres[28858]: [4-2] 2011-10-25
05:16:18.203 BST 28858 HINT: If you are not restoring from a backup, try
removing the file "/mnt/new_cluster/backup_label".
Oct 25 05:16:18 client_server_name postgres[28857]: [1-1] 2011-10-25
05:16:18.205 BST 28857 LOG: startup process (PID 28858) exited with exit
code 1
Oct 25 05:16:18 client_server_name postgres[28857]: [2-1] 2011-10-25
05:16:18.205 BST 28857 LOG: aborting startup due to startup process failure

Oct 25 05:20:53 client_server_name postgres[29030]: [2-1] 2011-10-25
05:20:53.630 BST 29030 LOG: could not open file
"pg_xlog/00002710000047B100000068" (log file 18353, segment 104): No such
file or directory
Oct 25 05:20:53 client_server_name postgres[29030]: [3-1] 2011-10-25
05:20:53.630 BST 29030 FATAL: could not find redo location referenced by
checkpoint record
Oct 25 05:20:53 client_server_name postgres[29030]: [3-2] 2011-10-25
05:20:53.630 BST 29030 HINT: If you are not restoring from a backup, try
removing the file "/mnt/new_cluster/backup_label".
Oct 25 05:20:53 client_server_name postgres[29029]: [1-1] 2011-10-25
05:20:53.633 BST 29029 LOG: startup process (PID 29030) exited with exit
code 1
Oct 25 05:20:53 client_server_name postgres[29029]: [2-1] 2011-10-25
05:20:53.633 BST 29029 LOG: aborting startup due to startup process failure

manually copy following file to pg_xlog folder

00002710000047B10000008C
00002710000047B100000068

After words i can start postgres and accessing the database , but same error

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Primary-key-Index-Error-tp4931714p4935172.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

#6Raghavendra
raghavendra.rao@enterprisedb.com
In reply to: Manoj K P (#5)
Re: Primary key Index Error

On Tue, Oct 25, 2011 at 11:40 AM, Manoj K P <manoj@comodo.com> wrote:

*Server log* Oct 1 00:06:59 server_host_name postgres[1453]: [5-1]
2011-10-01 00:06:59.831 EDT 1453 4e869041.5ad postgres [local] postgres LOG:
duration: 418583.238 ms statement: select pg_start_backup('fortnightly');
Oct 2 03:03:18 server_host_name postgres[1453]: [6-1] 2011-10-02
03:03:18.243 EDT 1453 4e869041.5ad postgres [local] postgres LOG: duration:
8034.385 ms statement: select pg_stop_backup(); In between stop and start
process server_host_name is receiving all type of DML & DDL and generating
new WAL file Taking base backup in between start and stop process *Client
Log Details* Oct 25 05:16:18 client_server_name postgres[28858]: [2-1]
2011-10-25 05:16:18.202 BST 28858 LOG: could not open file
"pg_xlog/00002710000047B10000008C" (log file 18353, segment 140): No such
file or directory Oct 25 05:16:18 client_server_name postgres[28858]: [3-1]
2011-10-25 05:16:18.203 BST 28858 LOG: invalid checkpoint record Oct 25
05:16:18 client_server_name postgres[28858]: [4-1] 2011-10-25 05:16:18.203
BST 28858 FATAL: could not locate required checkpoint record Oct 25 05:16:18
client_server_name postgres[28858]: [4-2] 2011-10-25 05:16:18.203 BST 28858
HINT: If you are not restoring from a backup, try removing the file
"/mnt/new_cluster/backup_label". Oct 25 05:16:18 client_server_name
postgres[28857]: [1-1] 2011-10-25 05:16:18.205 BST 28857 LOG: startup
process (PID 28858) exited with exit code 1 Oct 25 05:16:18
client_server_name postgres[28857]: [2-1] 2011-10-25 05:16:18.205 BST 28857
LOG: aborting startup due to startup process failure Oct 25 05:20:53
client_server_name postgres[29030]: [2-1] 2011-10-25 05:20:53.630 BST 29030
LOG: could not open file "pg_xlog/00002710000047B100000068" (log file 18353,
segment 104): No such file or directory Oct 25 05:20:53 client_server_name
postgres[29030]: [3-1] 2011-10-25 05:20:53.630 BST 29030 FATAL: could not
find redo location referenced by checkpoint record Oct 25 05:20:53
client_server_name postgres[29030]: [3-2] 2011-10-25 05:20:53.630 BST 29030
HINT: If you are not restoring from a backup, try removing the file
"/mnt/new_cluster/backup_label". Oct 25 05:20:53 client_server_name
postgres[29029]: [1-1] 2011-10-25 05:20:53.633 BST 29029 LOG: startup
process (PID 29030) exited with exit code 1 Oct 25 05:20:53
client_server_name postgres[29029]: [2-1] 2011-10-25 05:20:53.633 BST 29029
LOG: aborting startup due to startup process failure manually copy following
file to pg_xlog folder 00002710000047B10000008C 00002710000047B100000068
After words i can start postgres and accessing the database , but same
error

As per the logs, do you see missing XLOG files in Archive Destination ? becz
these kind of situations mostly missing files will be in WAL-Archive
location. You need to copy to pg_xlog directory and start the instance.

As Merlin Said, you need to dig more to know why its crashing by increasing
the LOG-DEBUG level's. Increasing DEBUG level may occupy good amount of
space in log-location, so make sure you have good space for logs to get what
exactly happening at the time of backup in particular. Am not sure whether
its safe to attach ***backtrace*** to instance for information.

---
Regards,
Raghavendra
EnterpriseDB Corporation
Blog: http://raghavt.blogspot.com/

#7Merlin Moncure
mmoncure@gmail.com
In reply to: Raghavendra (#6)
Re: Primary key Index Error

On Tue, Oct 25, 2011 at 2:41 AM, Raghavendra
<raghavendra.rao@enterprisedb.com> wrote:

On Tue, Oct 25, 2011 at 11:40 AM, Manoj K P <manoj@comodo.com> wrote:

Server log Oct 1 00:06:59 server_host_name postgres[1453]: [5-1]
2011-10-01 00:06:59.831 EDT 1453 4e869041.5ad postgres [local] postgres LOG:
duration: 418583.238 ms statement: select pg_start_backup('fortnightly');
Oct 2 03:03:18 server_host_name postgres[1453]: [6-1] 2011-10-02
03:03:18.243 EDT 1453 4e869041.5ad postgres [local] postgres LOG: duration:
8034.385 ms statement: select pg_stop_backup(); In between stop and start
process server_host_name is receiving all type of DML & DDL and generating
new WAL file Taking base backup in between start and stop process Client Log
Details Oct 25 05:16:18 client_server_name postgres[28858]: [2-1] 2011-10-25
05:16:18.202 BST 28858 LOG: could not open file
"pg_xlog/00002710000047B10000008C" (log file 18353, segment 140): No such
file or directory Oct 25 05:16:18 client_server_name postgres[28858]: [3-1]
2011-10-25 05:16:18.203 BST 28858 LOG: invalid checkpoint record Oct 25
05:16:18 client_server_name postgres[28858]: [4-1] 2011-10-25 05:16:18.203
BST 28858 FATAL: could not locate required checkpoint record Oct 25 05:16:18
client_server_name postgres[28858]: [4-2] 2011-10-25 05:16:18.203 BST 28858
HINT: If you are not restoring from a backup, try removing the file
"/mnt/new_cluster/backup_label". Oct 25 05:16:18 client_server_name
postgres[28857]: [1-1] 2011-10-25 05:16:18.205 BST 28857 LOG: startup
process (PID 28858) exited with exit code 1 Oct 25 05:16:18
client_server_name postgres[28857]: [2-1] 2011-10-25 05:16:18.205 BST 28857
LOG: aborting startup due to startup process failure Oct 25 05:20:53
client_server_name postgres[29030]: [2-1] 2011-10-25 05:20:53.630 BST 29030
LOG: could not open file "pg_xlog/00002710000047B100000068" (log file 18353,
segment 104): No such file or directory Oct 25 05:20:53 client_server_name
postgres[29030]: [3-1] 2011-10-25 05:20:53.630 BST 29030 FATAL: could not
find redo location referenced by checkpoint record Oct 25 05:20:53
client_server_name postgres[29030]: [3-2] 2011-10-25 05:20:53.630 BST 29030
HINT: If you are not restoring from a backup, try removing the file
"/mnt/new_cluster/backup_label". Oct 25 05:20:53 client_server_name
postgres[29029]: [1-1] 2011-10-25 05:20:53.633 BST 29029 LOG: startup
process (PID 29030) exited with exit code 1 Oct 25 05:20:53
client_server_name postgres[29029]: [2-1] 2011-10-25 05:20:53.633 BST 29029
LOG: aborting startup due to startup process failure manually copy following
file to pg_xlog folder 00002710000047B10000008C 00002710000047B100000068
After words i can start postgres and accessing the database , but same
error

As per the logs, do you see missing XLOG files in Archive Destination ? becz
these kind of situations mostly missing files will be in WAL-Archive
location. You need to copy to pg_xlog directory and start the instance.

As Merlin Said, you need to dig more to know why its crashing by increasing
the LOG-DEBUG level's. Increasing DEBUG level may occupy good amount of
space in log-location, so make sure you have good space for logs to get what
exactly happening at the time of backup in particular. Am not sure whether
its safe to attach ***backtrace*** to instance for information.

yeah. also, what's the setting of archive_command (or is it even set)?
taking a 'hot' filesystem backup without having an archive_command
and not doing any other intervention to guarantee the necessary WAL
segments are present will not give you a complete backup. my money is
on you having an invalid backup procedure. the only way to take a
filesystem snapshot without dealing with WAL files is to bring the
database down.

merlin