Postgres crashes,help to recover

Started by Andrusover 20 years ago9 messagesgeneral
Jump to latest
#1Andrus
eetasoft@online.ee

I have Postgres 8 running on Windows XP
The size of data subdirectory is 326 MB

Today morning suddenly one table in one database, firma1.klient is
corrupted. When trying to backup it using pgAdmin III I get the log below
and backup is not created.

I have:

1. Compressed backup of the whole database as of 15.7
2. Compressed backup of the firma1 schema of this database as of 19.7 where
corrupted table klient resides.

How to get the database back working by repairing firma1.klient table or by
restoring this from schema backup.
Why Postgres crashes ? I use default postgres.conf file which has probably
fsync on

Log when trying to backup table:

.....
pg_dump: restoring data for table "klient"
pg_dump: dumping contents of table klient
pg_dump: ERROR: out of memory
DETAIL: Failed on request of size 544565107.
pg_dump: SQL command to dump the contents of table "klient" failed:
PQendcopy() failed.
pg_dump: Error message from server: ERROR: out of memory
DETAIL: Failed on request of size 544565107.
pg_dump: The command was: COPY firma1.klient (kood, nimi, a_a, p_kood,
regnr, vatpayno, piirkond, postiindek, tanav, kontaktisi, telefon, faks,
email, wwwpage, liik, viitenr, riik, riik2, riigikood, hinnak, erihinnak,
myygikood, objekt2, objekt5, objekt7, maksetin, omakseti, krediit,
ostukredii, masin, info, maksja, "timestamp", atimestamp, elanikud, pindala,
grmaja, apindala, kpindala, idmakett, tulemus, omandisuhe, username,
changedby, parool, hinnaale, mitteakt, kontakteer, klikaart, mhprotsent,
aadress, grupp, verskp, firma_enne, tegevusala, instkuupae, firmarv,
tookohti, versioon, teenlepkp, jur, kasutab, est, rus, miniest, minirus,
plakat, keel) TO stdout;
pg_dump: *** aborted because of error

Process returned exit code 1.

#2Richard Huxton
dev@archonet.com
In reply to: Andrus (#1)
Re: Postgres crashes,help to recover

Andrus wrote:

I have Postgres 8 running on Windows XP
The size of data subdirectory is 326 MB

Today morning suddenly one table in one database, firma1.klient is
corrupted. When trying to backup it using pgAdmin III I get the log below
and backup is not created.

What caused this? Presumably you had a power/system-failure or similar?

I have:

1. Compressed backup of the whole database as of 15.7
2. Compressed backup of the firma1 schema of this database as of 19.7 where
corrupted table klient resides.

Do you mean file-level backups, or backups taken using pg_dump/pgadmin?
You can't take file-backups of anything less than the entire data directory.

How to get the database back working by repairing firma1.klient table or by
restoring this from schema backup.

If your backup is recent enough, that's probably the quickest route.

Why Postgres crashes ? I use default postgres.conf file which has probably
fsync on

*WHEN* did Postgresql crash, originally that is? This error was caused
by something - when did something go horribly wrong?

And do your disks honour the fsync? Was a power failure the cause of this?

Log when trying to backup table:

.....
pg_dump: restoring data for table "klient"
pg_dump: dumping contents of table klient
pg_dump: ERROR: out of memory
DETAIL: Failed on request of size 544565107.

Well - unless you have a piece of data that's 544MB that certainly looks
like corruption.
It's entirely possible you can identify the row that's causing this
problem and dump all the data either side of it. However, if your backup
is good, then I'd just restore that.

--
Richard Huxton
Archonet Ltd

#3Andrus
eetasoft@online.ee
In reply to: Andrus (#1)
Re: Postgres crashes,help to recover

Today morning suddenly one table in one database, firma1.klient is
corrupted. When trying to backup it using pgAdmin III I get the log
below and backup is not created.

What caused this? Presumably you had a power/system-failure or similar?

Windows XP does not respond and I pressed reset key yesterday evening.
However, after that the database continues working yesterday.

I use default postgres.conf file created by installer. In my knowledge this
crash does not occur.
I'm very intresting about reasons of this crash.
I created copy of the whole data directory.

running

select * from firma1.klient;

from pgAdmin yields:

server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

in windows eventlog this writes:

LOG: checkpoint record is at 0/4FD3ECB8

LOG: all server processes terminated; reinitializing

WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.

LOG: terminating any other active server processes
LOG: server process (PID 2756) was terminated by signal 5

I have:

1. Compressed backup of the whole database as of 15.7
2. Compressed backup of the firma1 schema of this database as of 19.7
where corrupted table klient resides.

Do you mean file-level backups, or backups taken using pg_dump/pgadmin?

I have compressed backups created using pgadmin.
Whole database backup (database contains two, andmed and firma1 schemas) is
from 15.7 and fresh, firma1 schema backup is form 19.7

How to get the database back working by repairing firma1.klient table or
by restoring this from schema backup.

If your backup is recent enough, that's probably the quickest route.

My problem is that I have whole backup only as 15.7 evening

I have the current backup (as 19.7 evening) only firma1 schema.

Corrupted table klient resides in firma1 schema.

I need to restore firma1 backup as of 19.7 to the new database created from
15.7 backup.

Unfortunately, firma1 schema is cross-referenced with other schema (andmed)
So I have no idea how to restore.

Is it possible to convert compressed backup file to plain text or to get
data from it ?

Why Postgres crashes ? I use default postgres.conf file which has
probably fsync on

*WHEN* did Postgresql crash, originally that is? This error was caused by
something - when did something go horribly wrong?

Yesterday evening I pressed the reset button because windows task manager
stops responding ( By experimenting with setforegroundwindow Windows API
call I ran 20 copies of charmap.exe and tried to kill them all from task
manager). However, after re-booting computer database continues working
yesterday.

I also restored new database yesterday with 500 tables to this cluster.

And do your disks honour the fsync? Was a power failure the cause of this?

I have usual office PC using Quantum FireballP LM20.5 20 GB IDE HDD with
XP drivers.
How to determine is fsync working or not ?
I use default postgres.conf file ( added only listen_addresses = '*' )

Log when trying to backup table:

.....

Well - unless you have a piece of data that's 544MB that certainly looks
like corruption.

Tables are small. Whole data directory (including wal segments and 2 other
nonimportant databases) sizes is about 350 MB.

It's entirely possible you can identify the row that's causing this
problem and dump all the data either side of it. However, if your backup
is good, then I'd just restore that.

I have up-to date backup of firma1 schema only. Whole backup is a bit old.

Is it possible to dump the corrupted table, truncate it and re-load it? I
think thank referential integrity is not checked in truncate and refrential
integrity does not prevent loading this table.

Will truncate command fix the corrupted table?

Andrus.

#4Andrus
eetasoft@online.ee
In reply to: Andrus (#1)
Re: Postgres crashes,help to recover

It seems that corrupted table klient contains data from some other table.

select * from firma1.klient limit 3686;

Seems to return all data from table.
Starting at row 3687 table contains data from other table.

VACUUM command returns:

INFO: vacuuming "firma1.klient"

ERROR: invalid page header in block 1639 of relation "klient"

#5Andrus
eetasoft@online.ee
In reply to: Andrus (#1)
Re: Postgres crashes,help to recover

klient table has oid 66079

file 66079 size is 13 MB

in correct copy klient table file size is 5MB

it seems that other table, nomenkla oid=65783 is added to the end of
klient table.

How to repair file 66079 so that it contains only 3686 rows from beginning ?

#6Magnus Hagander
magnus@hagander.net
In reply to: Andrus (#5)
Re: Postgres crashes,help to recover

And do your disks honour the fsync? Was a power failure the

cause of this?

I have usual office PC using Quantum FireballP LM20.5 20 GB
IDE HDD with
XP drivers.
How to determine is fsync working or not ?
I use default postgres.conf file ( added only listen_addresses = '*' )

What version exactly was this? There was some changes in 8.0.2 in this
area on win32. The default sync method was also changed at this point.

What's the output of "show wal_sync_method"?

Finally, go into device manager, find your disk, get properties, look
under Policies, is the box for "Enable write caching on the disk"
checked?

//Magnus

#7Andrus
eetasoft@online.ee
In reply to: Magnus Hagander (#6)
Re: Postgres crashes,help to recover

What version exactly was this? There was some changes in 8.0.2 in this
area on win32. The default sync method was also changed at this point.

"PostgreSQL 8.0.2 on i686-pc-mingw32, compiled by GCC gcc.exe (GCC) 3.4.2
(mingw-special)"

What's the output of "show wal_sync_method"?

"open_datasync"

Finally, go into device manager, find your disk, get properties, look
under Policies, is the box for "Enable write caching on the disk"
checked?

It is checked.

Does Postgres require this to be unchecked ? It is difficult to force
customers to change it.

Andrus.

#8Magnus Hagander
magnus@hagander.net
In reply to: Andrus (#7)
Re: Postgres crashes,help to recover

What version exactly was this? There was some changes in

8.0.2 in this

area on win32. The default sync method was also changed at

this point.

"PostgreSQL 8.0.2 on i686-pc-mingw32, compiled by GCC gcc.exe
(GCC) 3.4.2 (mingw-special)"

There we go. That explains it - you have the new code.

What's the output of "show wal_sync_method"?

"open_datasync"

This setting is only safe if you disable write cache.

Finally, go into device manager, find your disk, get

properties, look

under Policies, is the box for "Enable write caching on the disk"
checked?

It is checked.

Does Postgres require this to be unchecked ? It is difficult
to force customers to change it.

No, doesn't erquire it. There are a couple of different scenarios:

1) Box is checked. wal_sync_method=open_datasync. This may cause data
loss!
2) Box is checked. wal_sync_method=fsync_writethrough. This is safe.
3) Box is unchecked. wal_sync_method=open_datasync. This is safe.
4) Box is unchecked. wal_sync_method=fsync_writethrough. This is safe.

In general I would say that 1 is of course the fastest, but it's not
safe. 3 should normally be the fastest if the data is on a disk that's
only used by postgresql. 2 is probably faster if you have other
applications that also write data to the same disk. 4 is probably
*never* fastest :-)

This all assumes you don't have a battery backed cache. If you have a
controller with battery backed cache, 1 should still be the fastest, but
now it's suddenly safe.

(The basics of these changes are documented in the release notes at
http://www.postgresql.org/docs/8.0/static/release-8-0-2.html)

//Magnus

#9Andrus
eetasoft@online.ee
In reply to: Magnus Hagander (#8)
Re: Postgres crashes,help to recover

"open_datasync"

This setting is only safe if you disable write cache.

Thanks you for explanation.

I expected that default installation does not cause data loss in any
maschine configuration.

I don't remember was write cache enabling Windows default setting or was it
set by me.

Andrus.