Database corruption in RH 6.2/prepackaged PG
Saku Airila (saku@bitblit.fi) reports a bug with a severity of 1
The lower the number the more severe it is.
Short Description
Database corruption in RH 6.2/prepackaged PG
Long Description
Random database corruption. The system I'm running has ~10 databases
online on a single server without other load. Sometimes one of the
databases corrupts itself beyond repair. I have vacuumdb and pg_dump
running nightly for cleaning the db and making a backup.
Postgres version:
PostgreSQL 6.5.3 on i686-pc-linux-gnu, compiled by gcc egcs-2.91.66
System:
Red Hat Linux release 6.2 (Zoot)
Kernel 2.2.14-5.0 on an i686
650 Mhz AMD Duron, MSI K7T Pro mainboard, 64 MB + 100 MB swap,
ASUS 53C896 U2 SCSI, IBM DDYS-T09170N 9 GB disk,
D-Link DFE500TX (DEC tulip) ethernet.
Problem description:
The nightly cron jobs return me the following message:
----
dumpClasses(): command failed. Explanation from backend: 'pqReadData() -- backend closed the channel unexpectedly.
This probably means the backend terminated abnormally
before or while processing the request.
'.
----
I get the following message to the syslog wher trying to dump or
vacuum the db manually:
----
Jan 13 16:32:38 db kernel: Unable to handle kernel paging request at virtual add
ress 0003000b
Jan 13 16:32:38 db kernel: current->tss.cr3 = 00e99000, %cr3 = 00e99000
Jan 13 16:32:38 db kernel: *pde = 00000000
Jan 13 16:32:38 db kernel: Oops: 0000
Jan 13 16:32:38 db kernel: CPU: 0
Jan 13 16:32:38 db kernel: EIP: 0010:[update_vm_cache_conditional+111/284]
Jan 13 16:32:38 db kernel: EFLAGS: 00010206
Jan 13 16:32:38 db kernel: eax: 00000000 ebx: 00030003 ecx: c2250220 edx:
00001050
Jan 13 16:32:38 db kernel: esi: 00000000 edi: 00001000 ebp: 0002e000 esp:
c0e9be9c
Jan 13 16:32:38 db kernel: ds: 0018 es: 0018 ss: 0018
Jan 13 16:32:38 db kernel: Process postmaster (pid: 28997, process nr: 33, stack
page=c0e9b000)
Jan 13 16:32:38 db kernel: Stack: 0002e000 c099a000 c3f30000 0c225022 c013bf19 c
2250220 0002e000 c099a000
Jan 13 16:32:38 db kernel: 00001000 40251c40 c33570e0 ffffffea c225026c 0
0002000 c1a8d7e0 c1a8d7e0
Jan 13 16:32:38 db kernel: c1a8d7e0 0002e000 00000000 c0e9bf08 00000000 0
0000000 c33efa00 00000000
Jan 13 16:32:38 db kernel: Call Trace: [ext2_file_write+1042/1559] [refile_buffe
r+82/178] [__brelse+19/82] [ext2_update_inode+825/840] [sys_recv+30/35] [sys_wri
te+214/248] [ext2_file_write+0/1559]
Jan 13 16:32:38 db kernel: [system_call+52/56]
Jan 13 16:32:38 db kernel: Code: 39 4b 08 75 f0 39 6b 0c 75 eb ff 43 14 b8 02 00
00 00 0f ab
I don't know if this is really a PostgreSQL problem or a Linux problem, but I'm quite sure the harware itself is ok.
If more information is needed, I'm happy to send the database dump,
altough the system being a confidential production system I need to
make sure the dump will not be disclosed to any third parties.
Thanks,
Saku Airila, saku@bitblit.fi
Systems Engineer, Bitblit Oy, Helsinki, Finland
Sample Code
No file was uploaded with this report
pgsql-bugs@postgresql.org writes:
Problem description:
The nightly cron jobs return me the following message:
----
dumpClasses(): command failed. Explanation from backend: 'pqReadData() -- backend closed the channel unexpectedly.
This probably means the backend terminated abnormally
before or while processing the request.
Can't tell much from this. What would be useful is to look at the
postmaster log and a stack backtrace from the crashed backend.
The default startup script for your RH probably sends the postmaster
log file to the bit-bucket, so you'll have to change it. Make sure
the postmaster is invoked without -S switch, and redirect its stdout
and stderr to some handy log file, etc
postmaster -i -D wherever >/full/path/to/logfile 2>&1 &
(The extra & at the end is needed if you don't use -S.)
If you don't see a core file in $PGDATA/base/yourdb/core, then you
probably also need to add "ulimit -c unlimited" to the postmaster
start script, to allow dumping core from the postmaster and its
child processes.
Let us know when you have more detail ...
regards, tom lane
PS: BTW, it would probably save time all around if you first update
to Postgres 7.0.3 and then see if the bug is still there.