URGENT! pg_dump doesn't work!

Started by Wimover 23 years ago9 messagesgeneral
Jump to latest
#1Wim
wdh@belbone.be

Hello guys,

I have a problem with my postgres 7.2.1 database.
I can't perform a pg_dump one my database...
The message I get back is:

pg_dump: query to obtain list of tables failed: server closed the
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_dump failed on belbonedb_v2, exiting

Whe I connect to the database and do:

belbonedb_v2=# \dt networks

I get:

ERROR: AllocSetFree: cannot find block containing chunk 4aee70

Can I fix this error?

Thanx!

Wim

#2Nigel J. Andrews
nandrews@investsystems.co.uk
In reply to: Wim (#1)
Re: URGENT! pg_dump doesn't work!

On Mon, 22 Jul 2002, Wim wrote:

Hello guys,

I have a problem with my postgres 7.2.1 database.
I can't perform a pg_dump one my database...
The message I get back is:

pg_dump: query to obtain list of tables failed: server closed the
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_dump failed on belbonedb_v2, exiting

Whe I connect to the database and do:

belbonedb_v2=# \dt networks

I get:

ERROR: AllocSetFree: cannot find block containing chunk 4aee70

Can I fix this error?

Is this perhaps another of those hardware errors that seem to be turning up at
the moment?

So Wim, did you have improper shutdowns? Are you confident in your memory and
hard disk(s)?

--
Nigel J. Andrews
Director

---
Logictree Systems Limited
Computer Consultants

#3Wim
wdh@belbone.be
In reply to: Nigel J. Andrews (#2)
Re: URGENT! pg_dump doesn't work!

Nigel J. Andrews wrote:

On Mon, 22 Jul 2002, Wim wrote:

Hello guys,

I have a problem with my postgres 7.2.1 database.
I can't perform a pg_dump one my database...
The message I get back is:

pg_dump: query to obtain list of tables failed: server closed the
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_dump failed on belbonedb_v2, exiting

Whe I connect to the database and do:

belbonedb_v2=# \dt networks

I get:

ERROR: AllocSetFree: cannot find block containing chunk 4aee70

Can I fix this error?

Is this perhaps another of those hardware errors that seem to be turning up at
the moment?

So Wim, did you have improper shutdowns? Are you confident in your memory and
hard disk(s)?

The database is never killed with the -9 and I have no problems with my
hard disks or memory...
Is it a bug that can be fixed? I can create a DB with the same tables
and do a 'copy from/to' to transfer the data.
'Cause it is a large DB (tables with more that 1 million rows) , I would
do this if I have no other option left...

Thanx!

Wim

#4Nigel J. Andrews
nandrews@investsystems.co.uk
In reply to: Wim (#3)
Re: URGENT! pg_dump doesn't work!

On Mon, 22 Jul 2002, Wim wrote:

Nigel J. Andrews wrote:

On Mon, 22 Jul 2002, Wim wrote:

Hello guys,

I have a problem with my postgres 7.2.1 database.
I can't perform a pg_dump one my database...
The message I get back is:

pg_dump: query to obtain list of tables failed: server closed the
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_dump failed on belbonedb_v2, exiting

Whe I connect to the database and do:

belbonedb_v2=# \dt networks

I get:

ERROR: AllocSetFree: cannot find block containing chunk 4aee70

Can I fix this error?

Is this perhaps another of those hardware errors that seem to be turning up at
the moment?

So Wim, did you have improper shutdowns? Are you confident in your memory and
hard disk(s)?

The database is never killed with the -9 and I have no problems with my
hard disks or memory...
Is it a bug that can be fixed? I can create a DB with the same tables
and do a 'copy from/to' to transfer the data.
'Cause it is a large DB (tables with more that 1 million rows) , I would
do this if I have no other option left...

When you say you can copy from the tables you have tried this and succeeded I
presume.

Have you checked the server log to see that it is giving the same message as
you see in psql?

What about that value to give in the error message (4aee70), is it always the
same value? Does that look like a reasonable address with in a programs data
space on your system?

Having looked at the code it seems that somewhere something is trying to free a
memory chunk that is bigger than the chunk limit (ALLOC_CHUNK_LIMIT), 8Kb I
believe from the comments, that has either already been freed or has not been
allocated. Therefore it's sounding a little like some pointer is being trashed
somewhere. If you could obtain a stack trace from the backend it might be
useful. Look in the directories in your data directory for core files. You may
need to enable core file dumping with something like ulimit -c unlimited before
starting your server. Alternately, start psql and use gdb to attach to the
backend process serving it and obtain the back trace when it faults.

--
Nigel J. Andrews
Director

---
Logictree Systems Limited
Computer Consultants

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Wim (#1)
Re: URGENT! pg_dump doesn't work!

Wim <wdh@belbone.be> writes:

I have a problem with my postgres 7.2.1 database.
I can't perform a pg_dump one my database...
The message I get back is:

Odd. I am thinking this might be a corrupted-data problem.

If you are lucky, the corruption is in an index, and you can fix it
by rebuilding the system indexes. Read the REINDEX reference page
(carefully, it's a convoluted procedure)

regards, tom lane

#6Nigel J. Andrews
nandrews@investsystems.co.uk
In reply to: Tom Lane (#5)
Re: URGENT! pg_dump doesn't work!

On Mon, 22 Jul 2002, Tom Lane wrote:

Wim <wdh@belbone.be> writes:

I have a problem with my postgres 7.2.1 database.
I can't perform a pg_dump one my database...
The message I get back is:

Odd. I am thinking this might be a corrupted-data problem.

If you are lucky, the corruption is in an index, and you can fix it
by rebuilding the system indexes. Read the REINDEX reference page
(carefully, it's a convoluted procedure)

Tom,

That's why I went down the hardware question. Given the error reported by Wim
though seems to be a memory allocation, actually freeing, fault would data
corruption in any persistent storage object be suspect?

I suppose it worth checking that the error persists across a backend
restart. Wim?

--
Nigel J. Andrews
Director

---
Logictree Systems Limited
Computer Consultants

#7Nigel J. Andrews
nandrews@investsystems.co.uk
In reply to: Nigel J. Andrews (#6)
Re: [NOVICE] URGENT! pg_dump doesn't work! (fwd)

---------- Forwarded message ----------
Date: Tue, 23 Jul 2002 14:39:50 +0100 (BST)
From: Nigel J. Andrews <nandrews@investsystems.co.uk>
To: Wim <wdh@belbone.be>
Subject: Re: [NOVICE] [GENERAL] URGENT! pg_dump doesn't work!

On Tue, 23 Jul 2002, Wim wrote:

I looked for core files in my data directory, but could find any... I
installed gdb, since I run postgres on Solaris 9.
I have no experience with gdb, so could you tell me how I get a trace
when postgres fails?

What you need to do is:

1) start psql
- to connect to your backend which I assume is running or you already know
how to do

2) identify the backend (postgres) server your psql session from 1) is attached
to
- use ps ax | grep postgres to list all backend processes (possibly ps -fe
instead ps ax). If your database is quiet and only your psql session is
using it then the relevent process is obvious. If your database is busy you
may find it easier to connect to it as a user that is not normally used,
for example the database superuser (usually postgres), as this information
should appear in the process listing.

3) start gdb using:
gdb <path to postgres backend binary> <process id from step 2>
- this attaches gdb to process serving your psql session.

4) at the gdb prompt press 'c'
- this makes the backend process continue normal operations (gdb stopped it
when it attached to it)

5) in your psql session issue the statement that causes the fault

6) when the backend generates it's fault you should have a prompt in your gdb
session. Typing 'bt' at the gdb prompt will give you a stack trace.

Now, the complications.

1) To get a useful stack trace you may need to have postgres built with
debugging enabled, which would require a reconfigure and build from source if
not already done.

2) I'm starting to think I've taken you wrong route with this. I'm no expert
on the internals but I'm not convinced the backend is generating a fault such
that you would get a core file or be able to do the final stage in above
steps. My apologies for wasting your time if that's the case. However, as
you've already gone to the trouble to install gdb it may be easiest to give the
above instructions a go and see if you do indeed get a gdb prompt in the final
step. Whether this works or not you should also attempt to obtain the stack
trace from the pg_dump attempt. There should be a way to start pg_dump such
that you have time to determine the backend process for it, however, reading
the manpage I can only see -W as a possibility.

One other question that would be interesting perhaps is whether the problem you
are experiencing persists across a backend restart.

Hopefully someone like Tom or Bruce can provide some guidance. Tom already
suggested data corruption.

--
Nigel J. Andrews
Director

---
Logictree Systems Limited
Computer Consultants

#8Wim
wdh@belbone.be
In reply to: Nigel J. Andrews (#7)
Re: [NOVICE] URGENT! pg_dump doesn't work! (fwd)

Nigel J. Andrews wrote:

snip...

2) I'm starting to think I've taken you wrong route with this. I'm no expert
on the internals but I'm not convinced the backend is generating a fault such
that you would get a core file or be able to do the final stage in above
steps. My apologies for wasting your time if that's the case. However, as
you've already gone to the trouble to install gdb it may be easiest to give the
above instructions a go and see if you do indeed get a gdb prompt in the final
step. Whether this works or not you should also attempt to obtain the stack
trace from the pg_dump attempt. There should be a way to start pg_dump such
that you have time to determine the backend process for it, however, reading
the manpage I can only see -W as a possibility.

One other question that would be interesting perhaps is whether the problem you
are experiencing persists across a backend restart.

Hopefully someone like Tom or Bruce can provide some guidance. Tom already
suggested data corruption.

This is a partial output when I enable debugging level 3...

DEBUG: reaping dead processes
DEBUG: child process (pid 10917) was terminated by signal 10
DEBUG: server process (pid 10917) was terminated by signal 10
DEBUG: terminating any other active server processes
DEBUG: CleanupProc: sending SIGQUIT to process 10899
NOTICE: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend
died abnormally and possibly corrupted shared memory.
I have rolled back the current transaction and am
going to terminate your database system connection and exit.
Please reconnect to the database system and repeat your query.
DEBUG: CleanupProc: sending SIGQUIT to process 10898
NOTICE: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend
died abnormally and possibly corrupted shared memory.
I have rolled back the current transaction and am
going to terminate your database system connection and exit.
Please reconnect to the database system and repeat your query.
DEBUG: child process (pid 10898) exited with exit code 1
DEBUG: child process (pid 10899) exited with exit code 1
DEBUG: all server processes terminated; reinitializing shared memory
and semaphores
DEBUG: shmem_exit(0)
invoking IpcMemoryCreate(size=1441792)
DEBUG: database system was interrupted at 2002-07-23 15:52:57 CEST
DEBUG: checkpoint record is at 1/1888F010
DEBUG: redo record is at 1/1888F010; undo record is at 0/0; shutdown TRUE
DEBUG: next transaction id: 647512; next oid: 12041996
DEBUG: database system was not properly shut down; automatic recovery
in progress
DEBUG: redo starts at 1/1888F050
DEBUG: reaping dead processes
DEBUG: ReadRecord: record with zero length at 1/18904CE8
DEBUG: redo done at 1/18904CC0
DEBUG: database system is ready
DEBUG: proc_exit(0)
DEBUG: shmem_exit(0)
DEBUG: exit(0)
DEBUG: reaping dead processes

I have trouble with corrupted data on two databases... each on a
different machine and with different data and structure...
With this DB, I can't do a pg_dumpall, and with the other, I have
problems with pg_clog, where some files are missing...
It all began when I upgraded to version 7.2.1... I reported the problem
I mention hereunder a few weeks ago, but fixed it with a pg_dumpall and
a recreation of the database...

This is the message from the other DB...

FATAL 2: open of /pgdata/pg_clog/0700 failed: No such file or directory
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: NOTICE:
Message from PostgreSQL backend:
The Postmaster has informed me that some other backend
died abnormally and possibly corrupted shared memory.
I have rolled back the current transaction and am
going to terminate your database system connection and exit.
Please reconnect to the database system and repeat your query.

Cheers!

Wim

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nigel J. Andrews (#6)
Re: URGENT! pg_dump doesn't work!

"Nigel J. Andrews" <nandrews@investsystems.co.uk> writes:

That's why I went down the hardware question. Given the error reported by Wim
though seems to be a memory allocation, actually freeing, fault would data
corruption in any persistent storage object be suspect?

I am thinking a corrupted length word in a varlena data item (text,
varchar, array, etc) might possibly lead to that sort of error.
It's just an educated guess though.

regards, tom lane