Bug #671: server corrupt
Dmitry Riachtchentsev (diamondrain@mail.ru) reports a bug with a severity of 1
The lower the number the more severe it is.
Short Description
server corrupt
Long Description
The follow short sequence of SQL requests leads to server corrupt:
1. Run psql and type:
begin;
CREATE SEQUENCE B;
commit;
\q
2. After exiting run psql again and type:
begin;
SELECT nextval('B');
DROP SEQUENCE B;
CREATE SEQUENCE B1;
commit;
Here I get: "NOTICE: LockRelease: no such lock"
Then type:
begin;
CREATE SEQUENCE B2;
Here I get messages:
---------------------------------------------------------------
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: NOTICE: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend
died abnormally and possibly corrupted shared memory.
I have rolled back the current transaction and am
going to terminate your database system connection and exit.
Please reconnect to the database system and repeat your query.
Failed.
---------------------------------------------------------------
Then until restarting the server, connection to the server can not be established.
The database log is:
---------------------------------------------------------------
DEBUG: database system was shut down at 2002-05-21 11:40:09 EDT
DEBUG: checkpoint record is at 0/276B98
DEBUG: redo record is at 0/276B98; undo record is at 0/0; shutdown TRUE
DEBUG: next transaction id: 1421; next oid: 24991
DEBUG: database system is ready
DEBUG: pq_recvbuf: unexpected EOF on client connection
NOTICE: LockRelease: no such lock
DEBUG: server process (pid 10918) was terminated by signal 10
DEBUG: terminating any other active server processes
NOTICE: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend
died abnormally and possibly corrupted shared memory.
I have rolled back the current transaction and am
going to terminate your database system connection and exit.
Please reconnect to the database system and repeat your query.
NOTICE: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend
died abnormally and possibly corrupted shared memory.
I have rolled back the current transaction and am
going to terminate your database system connection and exit.
Please reconnect to the database system and repeat your query.
NOTICE: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend
died abnormally and possibly corrupted shared memory.
I have rolled back the current transaction and am
going to terminate your database system connection and exit.
Please reconnect to the database system and repeat your query.
NOTICE: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend
died abnormally and possibly corrupted shared memory.
I have rolled back the current transaction and am
going to terminate your database system connection and exit.
Please reconnect to the database system and repeat your query.
NOTICE: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend
died abnormally and possibly corrupted shared memory.
I have rolled back the current transaction and am
going to terminate your database system connection and exit.
Please reconnect to the database system and repeat your query.
NOTICE: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend
died abnormally and possibly corrupted shared memory.
I have rolled back the current transaction and am
going to terminate your database system connection and exit.
Please reconnect to the database system and repeat your query.
NOTICE: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend
died abnormally and possibly corrupted shared memory.
I have rolled back the current transaction and am
going to terminate your database system connection and exit.
Please reconnect to the database system and repeat your query.
NOTICE: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend
died abnormally and possibly corrupted shared memory.
I have rolled back the current transaction and am
going to terminate your database system connection and exit.
Please reconnect to the database system and repeat your query.
DEBUG: all server processes terminated; reinitializing shared memory and semaphores
DEBUG: database system was interrupted at 2002-05-21 11:40:46 EDT
DEBUG: checkpoint record is at 0/276B98
DEBUG: redo record is at 0/276B98; undo record is at 0/0; shutdown TRUE
DEBUG: next transaction id: 1421; next oid: 24991
DEBUG: database system was not properly shut down; automatic recovery in progress
DEBUG: redo starts at 0/276BD8
DEBUG: ReadRecord: record with zero length at 0/28F640
DEBUG: redo done at 0/28F618
DEBUG: database system is ready
---------------------------------------------------------------
PostgreSQL version is 7.2.1
Operation system is SunOS 5.8
Is my sequence of requests legal?
Is there a patch that fixs this problem?
If there is no patch, what is the root of the problem? Is there a kit of rules to avoid this situation?
I am ready to provide you any more required information to reproduce and trace this situation. Please e-mail me.
Thanks
Dmitry
Sample Code
No file was uploaded with this report
pgsql-bugs@postgresql.org writes:
The follow short sequence of SQL requests leads to server corrupt:
Looks like a bug to me too. I can still replicate the "NOTICE:
LockRelease: no such lock" in current sources, but it seems that
someone's already fixed the core dump; CVS tip does not crash.
Will look into it more...
regards, tom lane
pgsql-bugs@postgresql.org writes:
Is there a patch that fixs this problem?
If there is no patch, what is the root of the problem? Is there a kit of rules to avoid this situation?
After more detailed investigation, I find both the notice and the
subsequent crash have the same cause: after nextval(), sequence.c is
holding a reference to an open relation cache entry for the sequence,
which it intends to close at end of transaction. When you drop the
sequence later in the same transaction, the cache entry goes away,
leaving sequence.c with a dangling pointer. Its attempt to close the
cache entry at commit not only causes the NOTICE complaint from the lock
manager, but also modifies memory that may by now have been assigned to
something else. The crash in your example was due to another cache
entry having been corrupted in this way.
A proper fix will require revising the way sequence.c holds its state;
that strikes me as too large a change to risk back-patching into 7.2.*
--- especially for a bug that has gone unnoticed for many years. AFAIK
sequence.c has always been implemented like this, and thus has always
had this problem.
I will fix it for 7.3, but in the meantime I suggest not dropping a
sequence that you've nextval'd, currval'd, or setval'd earlier in the
same transaction. As a stopgap defense I will probably back-patch
something to make it error out if you try.
regards, tom lane
From the explanation of Tom Lane, it seems we could get around the problem
in this way: avoid using "nextval" etc. on the sequence to be dropped in the
same transaction. The details can be like this. Support we want to drop a
sequence sA and would like to get the next available sA sequence value
before we want to drop it. (This is the situation where Dmitry and I found
this bug.) We would do
1. exec statement "SELECT last_value from sA" to get the last value of this
sequence.
2. increment this last value
3. drop the sequence sA.
We have seen that this fixed the problem.
I am not sure we can absolutely guarantee that the last value we get this
way would be the same as calling "nextval", since another request might
change the sequence we want to drop (depending on transaction mode?).
However we can live with it since dropping sequence does not happen often at
all.
But my question is: will this completely get around the problem of server
corrupt? In particular, if in a transaction, the "nextval" is called on a
sequence different from the sequence we are trying to drop, would there be a
problem? My tests showed NO problem in this situation. Also the analysis
from Tom Lane seems confirming this. But I am still not sure on this.
Thanks,
Heng Sun
Import Notes
Resolved by subject fallback
"Heng Sun" <sunheng@hotmail.com> writes:
But my question is: will this completely get around the problem of server
corrupt? In particular, if in a transaction, the "nextval" is called on a
sequence different from the sequence we are trying to drop, would there be a
problem? My tests showed NO problem in this situation. Also the analysis
from Tom Lane seems confirming this. But I am still not sure on this.
I believe there is no problem in that case. The AccessShareLock held by
the other guy will actually hold off your attempt to drop the sequence
until he commits. The reason we can see the bug in the single-backend
case is that your own AccessShareLock won't block you from getting the
exclusive lock needed to drop the sequence.
regards, tom lane