beta3 vacuum crash
I haven't tried everything to recover from this yet, but will quickly try to document the
crash before I lose track of what exactly went into it and what I did: Basically I deleted
a table and then ran vacuum verbose, with the net result that I cannot connect to this
database anymore with the error:
frank@kelis:/usr/local/httpd/htdocs > psql mpi
psql: FATAL 1: Index 'pg_trigger_tgrelid_index' does not exist
This is, fortunately, not the production system but my development machine. I was going to
go live with this in a couple of week's time on beta3. Should I reconsider and move back
to 7.03 (I'd hate to cuz I'll have rows bigger than 32K, potentially . . . )?
The vacuum went like this:
------------------------------- begin vacuum -------------------------------
mpi=# drop table wimis;
DROP
mpi=# vacuum verbose;
NOTICE: --Relation pg_type--
NOTICE: Pages 3: Changed 2, reaped 2, Empty 0, New 0; Tup 159: Vac 16, Keep/VTL 0/0,
Crash 0, UnUsed 0, MinLen 106, MaxLen 109; Re-using: Free/Avail. Space 6296/156;
EndEmpty/Avail. Pages 0/1. CPU 0.00s/0.00u sec.
NOTICE: Index pg_type_oid_index: Pages 2; Tuples 159: Deleted 16. CPU 0.00s/0.00u sec.
NOTICE: Index pg_type_typname_index: Pages 2; Tuples 159: Deleted 16. CPU 0.00s/0.00u
sec.
NOTICE: Rel pg_type: Pages: 3 --> 3; Tuple(s) moved: 1. CPU 0.01s/0.00u sec.
NOTICE: Index pg_type_oid_index: Pages 2; Tuples 159: Deleted 1. CPU 0.00s/0.00u sec.
NOTICE: Index pg_type_typname_index: Pages 2; Tuples 159: Deleted 1. CPU 0.00s/0.00u sec.
NOTICE: --Relation pg_attribute--
NOTICE: Pages 16: Changed 9, reaped 8, Empty 0, New 0; Tup 1021: Vac 160, Keep/VTL 0/0,
Crash 0, UnUsed 0, MinLen 98, MaxLen 98; Re-using: Free/Avail. Space 16480/16480;
EndEmpty/Avail. Pages 0/8. CPU 0.00s/0.00u sec.
NOTICE: Index pg_attribute_relid_attnam_index: Pages 16; Tuples 1021: Deleted 160. CPU
0.00s/0.01u sec.
NOTICE: Index pg_attribute_relid_attnum_index: Pages 8; Tuples 1021: Deleted 160. CPU
0.00s/0.00u sec.
NOTICE: Rel pg_attribute: Pages: 16 --> 14; Tuple(s) moved: 43. CPU 0.01s/0.01u sec.
NOTICE: Index pg_attribute_relid_attnam_index: Pages 16; Tuples 1021: Deleted 43. CPU
0.00s/0.00u sec.
NOTICE: Index pg_attribute_relid_attnum_index: Pages 8; Tuples 1021: Deleted 43. CPU
0.00s/0.00u sec.
NOTICE: --Relation pg_class--
NOTICE: Pages 7: Changed 1, reaped 7, Empty 0, New 0; Tup 136: Vac 257, Keep/VTL 0/0,
Crash 0, UnUsed 0, MinLen 115, MaxLen 160; Re-using: Free/Avail. Space 38880/31944;
EndEmpty/Avail. Pages 0/6. CPU 0.00s/0.00u sec.
NOTICE: Index pg_class_oid_index: Pages 2; Tuples 136: Deleted 257. CPU 0.00s/0.01u sec.
NOTICE: Index pg_class_relname_index: Pages 6; Tuples 136: Deleted 257. CPU 0.00s/0.00u
sec.
NOTICE: Rel pg_class: Pages: 7 --> 3; Tuple(s) moved: 76. CPU 0.01s/0.01u sec.
pqReadData() -- backend closed the channel unexpectedly.
This probably means the backend terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!# \q
------------------------------- end vacuum -------------------------------
The log says (I'm running the backend with -d 2):
------------------------------- begin log -------------------------------
DEBUG: query: vacuum verbose;
DEBUG: ProcessUtility: vacuum verbose;
NOTICE: --Relation pg_type--
NOTICE: Pages 3: Changed 2, reaped 2, Empty 0, New 0; Tup 159: Vac 16, Keep/VTL 0/0,
Crash 0, UnUsed 0, MinLen 106, MaxLen 109; Re-using: Free/Avail. Space 6296/156;
EndEmpty/Avail. Pages 0/1. CPU 0.00s/0.00u sec.
NOTICE: Index pg_type_oid_index: Pages 2; Tuples 159: Deleted 16. CPU 0.00s/0.00u sec.
NOTICE: Index pg_type_typname_index: Pages 2; Tuples 159: Deleted 16. CPU 0.00s/0.00u
sec.
NOTICE: Rel pg_type: Pages: 3 --> 3; Tuple(s) moved: 1. CPU 0.01s/0.00u sec.
NOTICE: Index pg_type_oid_index: Pages 2; Tuples 159: Deleted 1. CPU 0.00s/0.00u sec.
NOTICE: Index pg_type_typname_index: Pages 2; Tuples 159: Deleted 1. CPU 0.00s/0.00u sec.
NOTICE: --Relation pg_attribute--
NOTICE: Pages 16: Changed 9, reaped 8, Empty 0, New 0; Tup 1021: Vac 160, Keep/VTL 0/0,
Crash 0, UnUsed 0, MinLen 98, MaxLen 98; Re-using: Free/Avail. Space 16480/16480;
EndEmpty/Avail. Pages 0/8. CPU 0.00s/0.00u sec.
NOTICE: Index pg_attribute_relid_attnam_index: Pages 16; Tuples 1021: Deleted 160. CPU
0.00s/0.01u sec.
NOTICE: Index pg_attribute_relid_attnum_index: Pages 8; Tuples 1021: Deleted 160. CPU
0.00s/0.00u sec.
NOTICE: Rel pg_attribute: Pages: 16 --> 14; Tuple(s) moved: 43. CPU 0.01s/0.01u sec.
NOTICE: Index pg_attribute_relid_attnam_index: Pages 16; Tuples 1021: Deleted 43. CPU
0.00s/0.00u sec.
NOTICE: Index pg_attribute_relid_attnum_index: Pages 8; Tuples 1021: Deleted 43. CPU
0.00s/0.00u sec.
NOTICE: --Relation pg_class--
NOTICE: Pages 7: Changed 1, reaped 7, Empty 0, New 0; Tup 136: Vac 257, Keep/VTL 0/0,
Crash 0, UnUsed 0, MinLen 115, MaxLen 160; Re-using: Free/Avail. Space 38880/31944;
EndEmpty/Avail. Pages 0/6. CPU 0.00s/0.00u sec.
NOTICE: Index pg_class_oid_index: Pages 2; Tuples 136: Deleted 257. CPU 0.00s/0.01u sec.
NOTICE: Index pg_class_relname_index: Pages 6; Tuples 136: Deleted 257. CPU 0.00s/0.00u
sec.
NOTICE: Rel pg_class: Pages: 7 --> 3; Tuple(s) moved: 76. CPU 0.01s/0.01u sec.
/usr/local/pgsql/bin/postmaster: reaping dead processes...
/usr/local/pgsql/bin/postmaster: CleanupProc: pid 29832 exited with status 11
Server process (pid 29832) exited with status 11 at Sat Jan 20 21:46:22 2001
Terminating any active server processes...
/usr/local/pgsql/bin/postmaster: CleanupProc: sending SIGUSR1 to process 29335
/usr/local/pgsql/bin/postmaster: CleanupProc: sending SIGUSR1 to process 29330
NOTICE: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend died abnormally and
possibly corrupted shared memory.
I have rolled back the current transaction and am going to terminate your database
system connection and exit.
Please reconnect to the database system and repeat your query.
/usr/local/pgsql/bin/postmaster: ServerLoop: handling reading 5
/usr/local/pgsql/bin/postmaster: ServerLoop: handling reading 5
The Data Base System is in recovery mode
/usr/local/pgsql/bin/postmaster: ServerLoop: handling writing 5
NOTICE: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend died abnormally and
possibly corrupted shared memory.
I have rolled back the current transaction and am going to terminate your database
system connection and exit.
Please reconnect to the database system and repeat your query.
/usr/local/pgsql/bin/postmaster: reaping dead processes...
/usr/local/pgsql/bin/postmaster: CleanupProc: pid 29335 exited with status 256
/usr/local/pgsql/bin/postmaster: CleanupProc: sending SIGUSR1 to process 29330
/usr/local/pgsql/bin/postmaster: CleanupProc: pid 29330 exited with status 256
Server processes were terminated at Sat Jan 20 21:46:22 2001
Reinitializing shared memory and semaphores
invoking IpcMemoryCreate(size=17334272)
/usr/local/pgsql/bin/postmaster: reaping dead processes...
DEBUG: starting up
DEBUG: database system was interrupted at 2001-01-20 21:43:20
DEBUG: CheckPoint record at (0, 9309416)
DEBUG: Redo record at (0, 9309416); Undo record at (0, 0); Shutdown FALSE
DEBUG: NextTransactionId: 36262; NextOid: 38319
DEBUG: database system was not properly shut down; automatic recovery in progress...
DEBUG: redo starts at (0, 9309472)
DEBUG: redo done at (0, 9391148)
DEBUG: database system is in production state
------------------------------- end log -------------------------------
- Frank
BTW: I posted this earlier today from a different account, which is not
subscribed, and the message did not come through. I am assuming now that
non-subscribers can't post. If this is incorrect, and it eventually
comes through twice, I apologize and promise not to do it again!
Frank Joerdens <frank@joerdens.de> writes:
I haven't tried everything to recover from this yet, but will quickly
try to document the crash before I lose track of what exactly went
into it and what I did: Basically I deleted a table and then ran
vacuum verbose, with the net result that I cannot connect to this
database anymore with the error:
Any chance of a backtrace from the core dump file? The log only tells
us that the vacuum process crashed, which isn't much to go on ...
regards, tom lane
On Sat, Jan 20, 2001 at 05:35:41PM -0500, Tom Lane wrote:
Frank Joerdens <frank@joerdens.de> writes:
I haven't tried everything to recover from this yet, but will quickly
try to document the crash before I lose track of what exactly went
into it and what I did: Basically I deleted a table and then ran
vacuum verbose, with the net result that I cannot connect to this
database anymore with the error:Any chance of a backtrace from the core dump file? The log only tells
us that the vacuum process crashed, which isn't much to go on ...
Silly me, this was in fact a December 13 snapshot. I have 3 machines
with an identical setup, two of which I already had moved to beta3. On
those, the error does not occur. On the December 13 version, I can
reproduce the error, which obviously has been fixed.
Sorry about the confusion.
- Frank