Open 6.5 items
SELECT * FROM test WHERE test IN (SELECT * FROM test) fails with strange error
When creating a table with either type inet or type cidr as a primary,unique
key, the "198.68.123.0/24" and "198.68.123.0/27" are considered equal
Fix function pointer calls to take Datum args for char and int2 args(ecgs)
Regression test for new Numeric type
Large Object memory problems
refint problems
invalidate cache on aborted transaction
spinlock stuck problem
benchmark performance problem
Make psql \help, man pages, and sgml reflect changes in grammar
Markup sql.sgml, Stefan's intro to SQL
Generate Admin, User, Programmer hardcopy postscript
Generate INSTALL and HISTORY from sgml sources.
Update ref/lock.sgml, ref/set.sgml to reflect MVCC and locking changes.
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
spinlock stuck problem
This time I have tested on another slower/less memory machine. Seems
things getting worse. I got:
LockAcquire: xid table corrupted
This comes from:
/*
* Find or create an xid entry with this tag
*/
result = (XIDLookupEnt *) hash_search(xidTable, (Pointer) &item,
HASH_ENTER, &found);
if (!result)
{
elog(NOTICE, "LockAcquire: xid table corrupted");
return STATUS_ERROR;
}
As you can see the aquired master lock never released, and all
backends get stucked. (of course, corrupted xid table is a problem too
).
Another error was:
out of free buffers: time to abort !
I will do more testing...
---
Tatsuo Ishii
Import Notes
Reply to msg id not found: YourmessageofFri28May1999005809-0400.199905280458.AAA15840@candle.pha.pa.us | Resolved by subject fallback
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
LockAcquire: xid table corrupted
This comes from:
/*
* Find or create an xid entry with this tag
*/
result = (XIDLookupEnt *) hash_search(xidTable, (Pointer) &item,
HASH_ENTER, &found);
if (!result)
{
elog(NOTICE, "LockAcquire: xid table corrupted");
return STATUS_ERROR;
}
As you can see the aquired master lock never released, and all
backends get stucked. (of course, corrupted xid table is a problem too
Actually, corrupted xid table is *the* problem --- whatever happens
after that is just collateral damage. (The elog should likely be
elog(FATAL) not NOTICE...)
If I recall the dynahash.c code correctly, a null return value
indicates either damage to the structure of the table (ie someone
stomped on memory that didn't belong to them) or running out of memory
to add entries to the table. The latter should be impossible if we
sized shared memory correctly. Perhaps the table size estimation code
has been obsoleted by recent changes?
regards, tom lane
Import Notes
Reply to msg id not found: YourmessageofFri28May1999142022+0900199905280520.OAA19906@srapc451.sra.co.jp | Resolved by subject fallback
Thus spake Bruce Momjian
When creating a table with either type inet or type cidr as a primary,unique
key, the "198.68.123.0/24" and "198.68.123.0/27" are considered equal
So have we decided that this is still to be fixed? If so, it's an easy fix
but we have to decide which of the following is true.
198.68.123.0/24 < 198.68.123.0/27
198.68.123.0/24 > 198.68.123.0/27
Maybe deciding that should be the TODO item. :-)
--
D'Arcy J.M. Cain <darcy@{druid|vex}.net> | Democracy is three wolves
http://www.druid.net/darcy/ | and a sheep voting on
+1 416 424 2871 (DoD#0082) (eNTP) | what's for dinner.
Tom Lane wrote:
If I recall the dynahash.c code correctly, a null return value
indicates either damage to the structure of the table (ie someone
stomped on memory that didn't belong to them) or running out of memory
to add entries to the table. The latter should be impossible if we
Quite different cases and should result in different reactions.
If structure is corrupted then only abort() is proper thing.
If running out of memory then elog(ERROR) is enough.
sized shared memory correctly. Perhaps the table size estimation code
has been obsoleted by recent changes?
lock.h:
/* ----------------------
* The following defines are used to estimate how much shared
* memory the lock manager is going to require.
* See LockShmemSize() in lock.c.
*
* NLOCKS_PER_XACT - The number of unique locks acquired in a transaction
* NLOCKENTS - The maximum number of lock entries in the lock table.
* ----------------------
*/
#define NLOCKS_PER_XACT 40
^^
Isn't it too low?
#define NLOCKENTS(maxBackends) (NLOCKS_PER_XACT*(maxBackends))
And now - LockShmemSize() in lock.c:
/* lockHash table */
size += hash_estimate_size(NLOCKENTS(maxBackends),
^^^^^^^^^^^^^^^^^^^^^^
SHMEM_LOCKTAB_KEYSIZE,
SHMEM_LOCKTAB_DATASIZE);
/* xidHash table */
size += hash_estimate_size(maxBackends,
^^^^^^^^^^^
SHMEM_XIDTAB_KEYSIZE,
SHMEM_XIDTAB_DATASIZE);
Why just maxBackends is here? NLOCKENTS should be used too
(each transaction lock requieres own xidhash entry).
Vadim
Vadim Mikheev <vadim@krs.ru> writes:
If I recall the dynahash.c code correctly, a null return value
indicates either damage to the structure of the table (ie someone
stomped on memory that didn't belong to them) or running out of memory
to add entries to the table. The latter should be impossible if we
Quite different cases and should result in different reactions.
I agree; will see about cleaning up hash_search's call convention after
6.5 is done. Actually, maybe I should do it now? I'm not convinced yet
whether the reports we're seeing are due to memory clobber or running
out of space... fixing this may be the easiest way to find out.
#define NLOCKS_PER_XACT 40
^^
Isn't it too low?
You tell me ... that was the number that was in the 6.4 code, but I
have no idea if it's right or not. (Does MVCC require more locks
than the old stuff?) What is a good upper bound on the number
of concurrently existing locks?
/* xidHash table */
size += hash_estimate_size(maxBackends,
^^^^^^^^^^^
SHMEM_XIDTAB_KEYSIZE,
SHMEM_XIDTAB_DATASIZE);
Why just maxBackends is here? NLOCKENTS should be used too
(each transaction lock requieres own xidhash entry).
Should it be NLOCKENTS(maxBackends) xid entries, or do you mean
NLOCKENTS(maxBackends) + maxBackends? Feel free to stick in any
estimates that you like better --- what's there now is an interpretation
of what the 6.4 code was trying to do (but it was sufficiently buggy and
unreadable that it was probably coming out with different numbers in
the end...)
regards, tom lane
Import Notes
Reply to msg id not found: YourmessageofFri28May1999183105+0800374E7069.CA880C1@krs.ru | Resolved by subject fallback
"D'Arcy" "J.M." Cain <darcy@druid.net> writes:
but we have to decide which of the following is true.
198.68.123.0/24 < 198.68.123.0/27
198.68.123.0/24 > 198.68.123.0/27
I'd say the former, on the same principle that 'abc' < 'abcd'.
Think of the addresses as being bit strings of the specified length,
and compare them the same way character strings are compared.
But if Vixie's got a different opinion, I defer to him...
regards, tom lane
Import Notes
Reply to msg id not found: YourmessageofFri28May1999032137-0400m10nGxV-0000bIC@druid.net | Resolved by subject fallback
Tom Lane wrote:
Vadim Mikheev <vadim@krs.ru> writes:
If I recall the dynahash.c code correctly, a null return value
indicates either damage to the structure of the table (ie someone
stomped on memory that didn't belong to them) or running out of memory
to add entries to the table. The latter should be impossible if weQuite different cases and should result in different reactions.
I agree; will see about cleaning up hash_search's call convention after
6.5 is done. Actually, maybe I should do it now? I'm not convinced yet
whether the reports we're seeing are due to memory clobber or running
out of space... fixing this may be the easiest way to find out.
Imho, we have to fix it in some way before 6.5
Either by changing dynahash.c (to return 0x1 if table is
corrupted and 0x0 if out of space) or by changing
elog(NOTICE) to elog(ERROR).
#define NLOCKS_PER_XACT 40
^^
Isn't it too low?You tell me ... that was the number that was in the 6.4 code, but I
have no idea if it's right or not. (Does MVCC require more locks
than the old stuff?) What is a good upper bound on the number
of concurrently existing locks?
Probably yes, because of writers can continue to work and lock
other tables instead of sleeping of first lock due to concurrent
select. I'll change it to 64, but this should be configurable
thing.
/* xidHash table */
size += hash_estimate_size(maxBackends,
^^^^^^^^^^^
SHMEM_XIDTAB_KEYSIZE,
SHMEM_XIDTAB_DATASIZE);Why just maxBackends is here? NLOCKENTS should be used too
(each transaction lock requieres own xidhash entry).Should it be NLOCKENTS(maxBackends) xid entries, or do you mean
NLOCKENTS(maxBackends) + maxBackends? Feel free to stick in any
estimates that you like better --- what's there now is an interpretation
of what the 6.4 code was trying to do (but it was sufficiently buggy and
unreadable that it was probably coming out with different numbers in
the end...)
Just NLOCKENTS(maxBackends) - I'll change it now.
Vadim
Thus spake Tom Lane
"D'Arcy" "J.M." Cain <darcy@druid.net> writes:
but we have to decide which of the following is true.
198.68.123.0/24 < 198.68.123.0/27
198.68.123.0/24 > 198.68.123.0/27I'd say the former, on the same principle that 'abc' < 'abcd'.
And, in fact, that's what happens if you use the operators. The only
place they are equal is when sorting them so they can't be used as
primary keys. I guess there is no argument about the sorting order
if we think they should be sorted. There is still the question of
whether or not they should be sorted. There seems to be tacit sgreement
but could we have a little more discussion. The question is, when inet
or cidr is used as the primary key on a table, should they be considered
equal. In fact, think about the question separately as we may want a
different behaviour for each. Here is my breakdown of the question.
For inet type, the value specifies primarily, I think, the host but
also carries information about its place on the network. Given an inet
type you can extract the host, broadcast, netmask and even the cidr
that it is part of. So, 198.68.123.0/24 and 198.68.123.0/27 really
refer to the same host but on different networks. Since a host can only
be on one network, there is an argument that they can't both be used
as the primary key in the same table.
A cidr type is primarily a network. In fact, some valid inet values
aren't even valid cidr. So, the question is, if one network is part
of another then should it be possible to have both as a primary key?
Of course, both of these beg the real question, should either of these
types be used as a primary key, but that is a database design question.
Think of the addresses as being bit strings of the specified length,
and compare them the same way character strings are compared.
Not sure that that clarifies it but we do have the code to order them
in any case. We just need to decide whether we want to.
But if Vixie's got a different opinion, I defer to him...
Paul's code orders them without regard to netmask which implies "no"
as the answer to the question but his original code only referred to
what we eventually called the cidr type. The question would still
be open for the inet type anyway.
--
D'Arcy J.M. Cain <darcy@{druid|vex}.net> | Democracy is three wolves
http://www.druid.net/darcy/ | and a sheep voting on
+1 416 424 2871 (DoD#0082) (eNTP) | what's for dinner.
"D'Arcy" "J.M." Cain <darcy@druid.net> writes:
And, in fact, that's what happens if you use the operators. The only
place they are equal is when sorting them so they can't be used as
primary keys.
Huh? Indexes and operators are the same thing --- or more specifically,
indexes rely on operators to compare keys. I don't see how it's even
*possible* that an index would think that two keys are equal when the
underlying = operator says they are not.
A little experimentation shows that's indeed what's happening, though.
Weird. Is this a deliberate effect, and if so how did you achieve it?
It looks like what could be a serious bug to me.
I guess there is no argument about the sorting order
if we think they should be sorted. There is still the question of
whether or not they should be sorted. There seems to be tacit sgreement
but could we have a little more discussion. The question is, when inet
or cidr is used as the primary key on a table, should they be considered
equal. In fact, think about the question separately as we may want a
different behaviour for each.
I'd argue that plain indexing ought not try to do anything especially
subtle --- in particular it ought not vary from the behavior of the
comparison operators for the type. If someone wants a table wherein you
can't enter two spellings of the same hostname, the right way would be
to construct a unique functional index using a function that reduces the
INET type into the simpler form. A good analogy might be a text field
where you don't want any two entries to be equal on a case-insensitive
basis. You don't up and change the behavior of indexing to be
case-insensitive, you say
CREATE UNIQUE INDEX foo_f1_key ON foo (lower(f1) text_ops);
regards, tom lane
Import Notes
Reply to msg id not found: YourmessageofSat29May1999080742-0400m10nhtu-0000bIC@druid.net | Resolved by subject fallback
I wrote:
A little experimentation shows that's indeed what's happening, though.
Weird. Is this a deliberate effect, and if so how did you achieve it?
Oh, I see it: the network_cmp function is deliberately inconsistent with
the regular comparison functions on network values.
This is *very bad*. Indexes depend on both the operators and the cmp
support function. You cannot have inconsistent behavior between these
functions, or indexing will misbehave. Do I need to gin up an example
where it fails?
regards, tom lane
Import Notes
Reply to msg id not found: YourmessageofSat29May1999112834-040018135.927991714@sss.pgh.pa.us | Resolved by subject fallback
Hello all,
-----Original Message-----
From: owner-pgsql-hackers@postgreSQL.org
[mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Vadim Mikheev
Sent: Saturday, May 29, 1999 2:51 PM
To: Tom Lane
Cc: t-ishii@sra.co.jp; PostgreSQL-development
Subject: Re: [HACKERS] Open 6.5 itemsTom Lane wrote:
Vadim Mikheev <vadim@krs.ru> writes:
If I recall the dynahash.c code correctly, a null return value
indicates either damage to the structure of the table (ie someone
stomped on memory that didn't belong to them) or running outof memory
to add entries to the table. The latter should be impossible if we
Quite different cases and should result in different reactions.
I agree; will see about cleaning up hash_search's call convention after
6.5 is done. Actually, maybe I should do it now? I'm not convinced yet
whether the reports we're seeing are due to memory clobber or running
out of space... fixing this may be the easiest way to find out.Imho, we have to fix it in some way before 6.5
Either by changing dynahash.c (to return 0x1 if table is
corrupted and 0x0 if out of space) or by changing
elog(NOTICE) to elog(ERROR).
Another case exists which causes stuck spinlock abort.
status = WaitOnLock(lockmethod, lock, lockmode);
/*
* Check the xid entry status, in case something in the ipc
* communication doesn't work correctly.
*/
if (!((result->nHolding > 0) && (result->holders[lockmode] >
0))
)
{
XID_PRINT_AUX("LockAcquire: INCONSISTENT ", result);
LOCK_PRINT_AUX("LockAcquire: INCONSISTENT ", lock,
lockm
ode);
/* Should we retry ? */
return FALSE;
This case returns without releasing LockMgrLock and doesn't call even
elog().
As far as I see,different entries in xidHash have a same key when above
case occurs. Moreover xidHash has been in abnormal state since the
number of xidHash entries exceeded 256.
Is this bug solved by change maxBackends->NLOCKENTS(maxBackends)
by Vadim or the change about hash by Tom ?
As for my test case,xidHash is filled with XactLockTable entries which have
been acquired by XactLockTableWait().
Could those entries be released immediately after they are acquired ?
Thanks.
Hiroshi Inoue
Inoue@tpf.co.jp
-----Original Message-----
From: owner-pgsql-hackers@postgreSQL.org
[mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Bruce Momjian
Sent: Friday, May 28, 1999 1:58 PM
To: PostgreSQL-development
Subject: [HACKERS] Open 6.5 itemsSELECT * FROM test WHERE test IN (SELECT * FROM test) fails with
strange error
When creating a table with either type inet or type cidr as a
primary,unique
key, the "198.68.123.0/24" and "198.68.123.0/27" are considered equal
Fix function pointer calls to take Datum args for char and int2 args(ecgs)
Regression test for new Numeric type
Large Object memory problems
refint problems
invalidate cache on aborted transaction
spinlock stuck problem
benchmark performance problemMake psql \help, man pages, and sgml reflect changes in grammar
Markup sql.sgml, Stefan's intro to SQL
Generate Admin, User, Programmer hardcopy postscript
Generate INSTALL and HISTORY from sgml sources.
Update ref/lock.sgml, ref/set.sgml to reflect MVCC and locking changes.
What about mdtruncate() for multi-segments relation ?
AFAIK,it has not been solved yet.
Thanks.
Hiroshi Inoue
Inoue@tpf.co.jp
Hiroshi Inoue wrote:
As far as I see,different entries in xidHash have a same key when above
case occurs. Moreover xidHash has been in abnormal state since the
number of xidHash entries exceeded 256.Is this bug solved by change maxBackends->NLOCKENTS(maxBackends)
by Vadim or the change about hash by Tom ?
Should be fixed now.
As for my test case,xidHash is filled with XactLockTable entries which have
been acquired by XactLockTableWait().
Could those entries be released immediately after they are acquired ?
Ops. Thanks! Must be released.
Vadim
Make psql \help, man pages, and sgml reflect changes in grammar
Markup sql.sgml, Stefan's intro to SQL
Generate Admin, User, Programmer hardcopy postscript
Generate INSTALL and HISTORY from sgml sources.
Update ref/lock.sgml, ref/set.sgml to reflect MVCC and locking changes.What about mdtruncate() for multi-segments relation ?
AFAIK,it has not been solved yet.
I thought we decided that file descriptors are kept by backends, and are
still accessable while new backends don't see the files. Correct?
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
-----Original Message-----
From: Bruce Momjian [mailto:maillist@candle.pha.pa.us]
Sent: Monday, May 31, 1999 11:15 AM
To: Hiroshi Inoue
Cc: PostgreSQL-development
Subject: Re: [HACKERS] Open 6.5 itemsMake psql \help, man pages, and sgml reflect changes in grammar
Markup sql.sgml, Stefan's intro to SQL
Generate Admin, User, Programmer hardcopy postscript
Generate INSTALL and HISTORY from sgml sources.
Update ref/lock.sgml, ref/set.sgml to reflect MVCC andlocking changes.
What about mdtruncate() for multi-segments relation ?
AFAIK,it has not been solved yet.I thought we decided that file descriptors are kept by backends, and are
still accessable while new backends don't see the files. Correct?
Yes,other backends could write to unliked files which would be
vanished before long.
I think it's more secure to truncate useless segments to size 0
than unlinking the segments though vacuum would never remove
useless segments.
Thanks.
Hiroshi Inoue
Inoue@tpf.co.jp
I thought we decided that file descriptors are kept by backends, and are
still accessable while new backends don't see the files. Correct?Yes,other backends could write to unliked files which would be
vanished before long.
I think it's more secure to truncate useless segments to size 0
than unlinking the segments though vacuum would never remove
useless segments.
If you truncate, other backends will see the data gone, and will be
writing into the middle of an empty file. Better to remove.
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
I thought we decided that file descriptors are kept by
backends, and are
still accessable while new backends don't see the files. Correct?
Yes,other backends could write to unliked files which would be
vanished before long.
I think it's more secure to truncate useless segments to size 0
than unlinking the segments though vacuum would never remove
useless segments.If you truncate, other backends will see the data gone, and will be
writing into the middle of an empty file. Better to remove.
I couldn't explain more because of my poor English,sorry.
But my test case usually causes backend abort.
My test case is
While 1 or more sessions frequently insert/update a table,
vacuum the table.
After vacuum, those sessions abort with message
ERROR: cannot open segment .. of relation ...
This ERROR finally causes spinlock freeze as I reported in a posting
[HACKERS] spinlock freeze ?(Re: INSERT/UPDATE waiting (another
example)).
Comments ?
Thanks.
Hiroshi Inoue
Inoue@tpf.co.jp
I couldn't explain more because of my poor English,sorry.
But my test case usually causes backend abort.
My test case is
While 1 or more sessions frequently insert/update a table,
vacuum the table.After vacuum, those sessions abort with message
ERROR: cannot open segment .. of relation ...This ERROR finally causes spinlock freeze as I reported in a posting
[HACKERS] spinlock freeze ?(Re: INSERT/UPDATE waiting (another
example)).Comments ?
OK, I buy that. How will truncate fix things? Isn't that going to be
strange too. Hard to imagine how we are going to modify these things.
I am now leaning to the truncate option, especially considering that
usually only the last segment is going to be truncated.
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Tom Lane wrote:
Vadim Mikheev <vadim@krs.ru> writes:
If I recall the dynahash.c code correctly, a null return value
indicates either damage to the structure of the table (ie someone
stomped on memory that didn't belong to them) or running out of memory
to add entries to the table. The latter should be impossible if weQuite different cases and should result in different reactions.
I agree; will see about cleaning up hash_search's call convention after
6.5 is done. Actually, maybe I should do it now? I'm not convinced yet
whether the reports we're seeing are due to memory clobber or running
out of space... fixing this may be the easiest way to find out.Imho, we have to fix it in some way before 6.5
Either by changing dynahash.c (to return 0x1 if table is
corrupted and 0x0 if out of space) or by changing
elog(NOTICE) to elog(ERROR).#define NLOCKS_PER_XACT 40
^^
Isn't it too low?You tell me ... that was the number that was in the 6.4 code, but I
have no idea if it's right or not. (Does MVCC require more locks
than the old stuff?) What is a good upper bound on the number
of concurrently existing locks?Probably yes, because of writers can continue to work and lock
other tables instead of sleeping of first lock due to concurrent
select. I'll change it to 64, but this should be configurable
thing./* xidHash table */
size += hash_estimate_size(maxBackends,
^^^^^^^^^^^
SHMEM_XIDTAB_KEYSIZE,
SHMEM_XIDTAB_DATASIZE);Why just maxBackends is here? NLOCKENTS should be used too
(each transaction lock requieres own xidhash entry).Should it be NLOCKENTS(maxBackends) xid entries, or do you mean
NLOCKENTS(maxBackends) + maxBackends? Feel free to stick in any
estimates that you like better --- what's there now is an interpretation
of what the 6.4 code was trying to do (but it was sufficiently buggy and
unreadable that it was probably coming out with different numbers in
the end...)Just NLOCKENTS(maxBackends) - I'll change it now.
I have just done cvs update and saw your changes. I tried the same
testing as I did before (64 conccurrent connections, and each
connection excutes 100 transactions), but it failed again.
(1) without -B 1024, it failed: out of free buffers: time to abort!
(2) with -B 1024, it went into stuck spin lock
So I looked into sources a little bit, and made a minor change to
include/storage/lock.h:
#define INIT_TABLE_SIZE 100
to:
#define INIT_TABLE_SIZE 4096
then restarted postmaster with -B 1024 (this will prevent
out-of-free-buffers problem, I guess). Now everything seems to work
great!
I suspect that huge INIT_TABLE_SIZE prevented dynamic expanding the
hash tables and seems there's something wrong in the routines
responsible for that.
Comments?
--
Tatsuo Ishii
Import Notes
Reply to msg id not found: YourmessageofSat29May1999135113+0800.374F8051.B990D982@krs.ru | Resolved by subject fallback