simple query terminated by signal 11

Started by Thomas Chillealmost 20 years ago5 messagesgeneral
Jump to latest
#1Thomas Chille
thomas.chille@gmail.com

Hi List,

i run in to an error while dumping a db.

after investigating it, i found a possible corrupted table. but i am not sure.
and i dont know how i can repair it? could it be a harddrive error?

Here are the logs:

# all fine: SELECT * FROM hst_sales_report WHERE id = 5078866

[6208 / 2006-06-19 18:46:17 CEST]LOG: 00000: connection received:
host=[local] port=
[6208 / 2006-06-19 18:46:17 CEST]LOCATION: BackendRun, postmaster.c:2679
[6208 / 2006-06-19 18:46:17 CEST]LOG: 00000: connection authorized:
user=postgres database=backoffice_db
[6208 / 2006-06-19 18:46:17 CEST]LOCATION: BackendRun, postmaster.c:2751
[6208 / 2006-06-19 18:46:17 CEST]LOG: 00000: statement: SELECT * FROM
hst_sales_report WHERE id = 5078866
[6208 / 2006-06-19 18:46:17 CEST]LOCATION: pg_parse_query, postgres.c:526
[6208 / 2006-06-19 18:46:18 CEST]LOG: 00000: duration: 117.638 ms
[6208 / 2006-06-19 18:46:18 CEST]LOCATION: exec_simple_query, postgres.c:1076
[6208 / 2006-06-19 18:46:18 CEST]LOG: 00000: disconnection: session
time: 0:00:00.12 user=postgres database=backoffice_db host=[local]
port=
[6208 / 2006-06-19 18:46:18 CEST]LOCATION: log_disconnections, postgres.c:3447

# now the error: SELECT * FROM hst_sales_report WHERE id = 5078867

[6216 / 2006-06-19 18:46:23 CEST]LOG: 00000: connection received:
host=[local] port=
[6216 / 2006-06-19 18:46:23 CEST]LOCATION: BackendRun, postmaster.c:2679
[6216 / 2006-06-19 18:46:23 CEST]LOG: 00000: connection authorized:
user=postgres database=backoffice_db
[6216 / 2006-06-19 18:46:23 CEST]LOCATION: BackendRun, postmaster.c:2751
[6216 / 2006-06-19 18:46:23 CEST]LOG: 00000: statement: SELECT * FROM
hst_sales_report WHERE id = 5078867
[6216 / 2006-06-19 18:46:23 CEST]LOCATION: pg_parse_query, postgres.c:526
[3762 / 2006-06-19 18:46:23 CEST]LOG: 00000: server process (PID
6216) was terminated by signal 11
[3762 / 2006-06-19 18:46:23 CEST]LOCATION: LogChildExit, postmaster.c:2358
[3762 / 2006-06-19 18:46:23 CEST]LOG: 00000: terminating any other
active server processes
[3762 / 2006-06-19 18:46:23 CEST]LOCATION: HandleChildCrash, postmaster.c:2251
[3985 / 2006-06-19 18:46:23 CEST]WARNING: 57P02: terminating
connection because of crash of another server process
[3985 / 2006-06-19 18:46:23 CEST]DETAIL: The postmaster has commanded
this server process to roll back the current transaction and exit,
because another server process exited abnormally and possibly
corrupted shared memory.
[3985 / 2006-06-19 18:46:23 CEST]HINT: In a moment you should be able
to reconnect to the database and repeat your command.
[3985 / 2006-06-19 18:46:23 CEST]LOCATION: quickdie, postgres.c:1945
[3762 / 2006-06-19 18:46:23 CEST]LOG: 00000: all server processes
terminated; reinitializing
[3762 / 2006-06-19 18:46:23 CEST]LOCATION: reaper, postmaster.c:2150
[6217 / 2006-06-19 18:46:23 CEST]LOG: 00000: database system was
interrupted at 2006-06-19 18:42:49 CEST
[6217 / 2006-06-19 18:46:23 CEST]LOCATION: StartupXLOG, xlog.c:4094
[6217 / 2006-06-19 18:46:23 CEST]LOG: 00000: checkpoint record is at
11/3E77AB1C
[6217 / 2006-06-19 18:46:23 CEST]LOCATION: StartupXLOG, xlog.c:4163
[6217 / 2006-06-19 18:46:23 CEST]LOG: 00000: redo record is at
11/3E774940; undo record is at 0/0; shutdown FALSE
[6217 / 2006-06-19 18:46:23 CEST]LOCATION: StartupXLOG, xlog.c:4191
[6217 / 2006-06-19 18:46:23 CEST]LOG: 00000: next transaction ID:
3899415; next OID: 46429694
[6217 / 2006-06-19 18:46:23 CEST]LOCATION: StartupXLOG, xlog.c:4194
[6217 / 2006-06-19 18:46:23 CEST]LOG: 00000: database system was not
properly shut down; automatic recovery in progress
[6217 / 2006-06-19 18:46:23 CEST]LOCATION: StartupXLOG, xlog.c:4250
[6217 / 2006-06-19 18:46:23 CEST]LOG: 00000: redo starts at 11/3E774940
[6217 / 2006-06-19 18:46:23 CEST]LOCATION: StartupXLOG, xlog.c:4287
[6217 / 2006-06-19 18:46:23 CEST]LOG: 00000: record with zero length
at 11/3E77AD20
[6217 / 2006-06-19 18:46:23 CEST]LOCATION: ReadRecord, xlog.c:2496
[6217 / 2006-06-19 18:46:23 CEST]LOG: 00000: redo done at 11/3E77ACF8
[6217 / 2006-06-19 18:46:23 CEST]LOCATION: StartupXLOG, xlog.c:4345
[6217 / 2006-06-19 18:46:23 CEST]LOG: 00000: database system is ready
[6217 / 2006-06-19 18:46:23 CEST]LOCATION: StartupXLOG, xlog.c:4557

Can anyone help me, please?

regards,
thomas!

#2Qingqing Zhou
zhouqq@cs.toronto.edu
In reply to: Thomas Chille (#1)
Re: simple query terminated by signal 11

""Thomas Chille"" <thomas.chille@gmail.com> wrote

Hi List,

i run in to an error while dumping a db.

after investigating it, i found a possible corrupted table. but i am not

sure.

and i dont know how i can repair it? could it be a harddrive error?

# now the error: SELECT * FROM hst_sales_report WHERE id = 5078867

[6216 / 2006-06-19 18:46:23 CEST]LOG: 00000: connection received:
host=[local] port=
[6216 / 2006-06-19 18:46:23 CEST]LOCATION: BackendRun, postmaster.c:2679
[6216 / 2006-06-19 18:46:23 CEST]LOG: 00000: connection authorized:
user=postgres database=backoffice_db
[6216 / 2006-06-19 18:46:23 CEST]LOCATION: BackendRun, postmaster.c:2751
[6216 / 2006-06-19 18:46:23 CEST]LOG: 00000: statement: SELECT * FROM
hst_sales_report WHERE id = 5078867
[6216 / 2006-06-19 18:46:23 CEST]LOCATION: pg_parse_query, postgres.c:526
[3762 / 2006-06-19 18:46:23 CEST]LOG: 00000: server process (PID
6216) was terminated by signal 11
[3762 / 2006-06-19 18:46:23 CEST]LOCATION: LogChildExit,

postmaster.c:2358

[3762 / 2006-06-19 18:46:23 CEST]LOG: 00000: terminating any other
active server processes
[3762 / 2006-06-19 18:46:23 CEST]LOCATION: HandleChildCrash,

postmaster.c:2251

[3985 / 2006-06-19 18:46:23 CEST]WARNING: 57P02: terminating
connection because of crash of another server process
[3985 / 2006-06-19 18:46:23 CEST]DETAIL: The postmaster has commanded
this server process to roll back the current transaction and exit,
because another server process exited abnormally and possibly
corrupted shared memory.

Which verison are you using? In any way, except a random hardware error, we
expect Postgres to be able to detect and report the problem instead of a
silent core dump. So can you gather the core dump and post it here?

Regards,
Qingqing

#3Thomas Chille
thomas@chille.de
In reply to: Qingqing Zhou (#2)
Re: simple query terminated by signal 11

Hi Qingqing,

thanks for your reply!

The postgresql version is 8.0.4 and runs on a debian based linux
server with kernel 2.6.11.2.

I never dealed with a core dump before. but after setting "ulimit -c
1024" i got it.

I don't know how to post it, because the size is 1,5 MB?! I try to
attch it as gzip.

I also could not install dbg on the erroneous system, so i tried to
examine the core dump on another machine (gentoo) with postgres 8.0.4
anf got the following output:

spoonpc01 ~ # gdb /usr/bin/postgres core
GNU gdb 6.4
Copyright 2005 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...(no debugging symbols found)
Using host libthread_db library "/lib/tls/libthread_db.so.1".

warning: core file may not match specified executable file.
(no debugging symbols found)
Core was generated by `postgres: postgres backoffice_db [local] SELECT '
.
Program terminated with signal 11, Segmentation fault.
#0 0x080753c2 in DataFill ()
(gdb) where
#0 0x080753c2 in DataFill ()
#1 0xb74253d4 in ?? ()
#2 0x0000001d in ?? ()
#3 0x08356fa8 in ?? ()
#4 0x08379420 in ?? ()
#5 0x00000000 in ?? ()
(gdb)

What i can say too, is that i can reproduce the error everytime with
the same query.

thanks in advonce

Show quoted text

On 6/20/06, Qingqing Zhou <zhouqq@cs.toronto.edu> wrote:

""Thomas Chille"" <thomas.chille@gmail.com> wrote

Hi List,

i run in to an error while dumping a db.

after investigating it, i found a possible corrupted table. but i am not

sure.

and i dont know how i can repair it? could it be a harddrive error?

# now the error: SELECT * FROM hst_sales_report WHERE id = 5078867

[6216 / 2006-06-19 18:46:23 CEST]LOG: 00000: connection received:
host=[local] port=
[6216 / 2006-06-19 18:46:23 CEST]LOCATION: BackendRun, postmaster.c:2679
[6216 / 2006-06-19 18:46:23 CEST]LOG: 00000: connection authorized:
user=postgres database=backoffice_db
[6216 / 2006-06-19 18:46:23 CEST]LOCATION: BackendRun, postmaster.c:2751
[6216 / 2006-06-19 18:46:23 CEST]LOG: 00000: statement: SELECT * FROM
hst_sales_report WHERE id = 5078867
[6216 / 2006-06-19 18:46:23 CEST]LOCATION: pg_parse_query, postgres.c:526
[3762 / 2006-06-19 18:46:23 CEST]LOG: 00000: server process (PID
6216) was terminated by signal 11
[3762 / 2006-06-19 18:46:23 CEST]LOCATION: LogChildExit,

postmaster.c:2358

[3762 / 2006-06-19 18:46:23 CEST]LOG: 00000: terminating any other
active server processes
[3762 / 2006-06-19 18:46:23 CEST]LOCATION: HandleChildCrash,

postmaster.c:2251

[3985 / 2006-06-19 18:46:23 CEST]WARNING: 57P02: terminating
connection because of crash of another server process
[3985 / 2006-06-19 18:46:23 CEST]DETAIL: The postmaster has commanded
this server process to roll back the current transaction and exit,
because another server process exited abnormally and possibly
corrupted shared memory.

Which verison are you using? In any way, except a random hardware error, we
expect Postgres to be able to detect and report the problem instead of a
silent core dump. So can you gather the core dump and post it here?

Regards,
Qingqing

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Attachments:

core.gzapplication/x-gzip; name=core.gzDownload
#4Qingqing Zhou
zhouqq@cs.toronto.edu
In reply to: Thomas Chille (#1)
Re: simple query terminated by signal 11

""Thomas Chille"" <thomas@chille.de> wrote

I don't know how to post it, because the size is 1,5 MB?! I try to
attch it as gzip.

No ... I mean the "bt" result of the core dump.

$gdb <postgres_exe_path> -c <core_file_name>
bt

.
Program terminated with signal 11, Segmentation fault.
#0 0x080753c2 in DataFill ()
(gdb) where
#0 0x080753c2 in DataFill ()
#1 0xb74253d4 in ?? ()
#2 0x0000001d in ?? ()
#3 0x08356fa8 in ?? ()
#4 0x08379420 in ?? ()
#5 0x00000000 in ?? ()
(gdb)

Since it is repeatable in your machine, you can compile a new postgres
version with "--enable-cassert" (enable assertions in code) and
"--enable-debug" (enable gcc debug support) configuration. Then run it on
your data and "bt" the core dump.

Regards,
Qingqing

#5Thomas Chille
thomas@chille.de
In reply to: Qingqing Zhou (#4)
Re: simple query terminated by signal 11

Thanks for your Tipps!

Since it is repeatable in your machine, you can compile a new postgres
version with "--enable-cassert" (enable assertions in code) and
"--enable-debug" (enable gcc debug support) configuration. Then run it on
your data and "bt" the core dump.

I try to found out the reason for that behavoir.

For now i could drop this damaged table und restore it from an older
backup, so all works fine again.

regards,
thomas!