Open 6.5 items

Started by Bruce Momjianover 26 years ago57 messages

maillist@candle.pha.pa.us

over 26 years ago

SELECT * FROM test WHERE test IN (SELECT * FROM test) fails with strange error
When creating a table with either type inet or type cidr as a primary,unique
key, the "198.68.123.0/24" and "198.68.123.0/27" are considered equal
Fix function pointer calls to take Datum args for char and int2 args(ecgs)
Regression test for new Numeric type
Large Object memory problems
refint problems
invalidate cache on aborted transaction
spinlock stuck problem
benchmark performance problem

Make psql \help, man pages, and sgml reflect changes in grammar
Markup sql.sgml, Stefan's intro to SQL
Generate Admin, User, Programmer hardcopy postscript
Generate INSTALL and HISTORY from sgml sources.
Update ref/lock.sgml, ref/set.sgml to reflect MVCC and locking changes.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Tatsuo Ishii

t-ishii@sra.co.jp

over 26 years ago

In reply to: Bruce Momjian (#1)

Re: [HACKERS] Open 6.5 items

spinlock stuck problem

This time I have tested on another slower/less memory machine. Seems
things getting worse. I got:

LockAcquire: xid table corrupted

This comes from:

/*
* Find or create an xid entry with this tag
*/
result = (XIDLookupEnt *) hash_search(xidTable, (Pointer) &item,

HASH_ENTER, &found);
if (!result)
{
elog(NOTICE, "LockAcquire: xid table corrupted");
return STATUS_ERROR;
}

As you can see the aquired master lock never released, and all
backends get stucked. (of course, corrupted xid table is a problem too
).

Another error was:

out of free buffers: time to abort !

I will do more testing...
---
Tatsuo Ishii

Import Notes

Reply to msg id not found: YourmessageofFri28May1999005809-0400.199905280458.AAA15840@candle.pha.pa.us | Resolved by subject fallback

Tom Lane

tgl@sss.pgh.pa.us

over 26 years ago

In reply to: Tatsuo Ishii (#2)

Re: [HACKERS] Open 6.5 items

Tatsuo Ishii <t-ishii@sra.co.jp> writes:

LockAcquire: xid table corrupted

This comes from:

/*
* Find or create an xid entry with this tag
*/
result = (XIDLookupEnt *) hash_search(xidTable, (Pointer) &item,

HASH_ENTER, &found);
if (!result)
{
elog(NOTICE, "LockAcquire: xid table corrupted");
return STATUS_ERROR;
}

As you can see the aquired master lock never released, and all
backends get stucked. (of course, corrupted xid table is a problem too

Actually, corrupted xid table is *the* problem --- whatever happens
after that is just collateral damage. (The elog should likely be
elog(FATAL) not NOTICE...)

If I recall the dynahash.c code correctly, a null return value
indicates either damage to the structure of the table (ie someone
stomped on memory that didn't belong to them) or running out of memory
to add entries to the table. The latter should be impossible if we
sized shared memory correctly. Perhaps the table size estimation code
has been obsoleted by recent changes?

regards, tom lane

Import Notes

Reply to msg id not found: YourmessageofFri28May1999142022+0900199905280520.OAA19906@srapc451.sra.co.jp | Resolved by subject fallback

J.M.

darcy@druid.net

over 26 years ago

In reply to: Bruce Momjian (#1)

Re: [HACKERS] Open 6.5 items

Thus spake Bruce Momjian

When creating a table with either type inet or type cidr as a primary,unique
key, the "198.68.123.0/24" and "198.68.123.0/27" are considered equal

So have we decided that this is still to be fixed? If so, it's an easy fix
but we have to decide which of the following is true.

198.68.123.0/24 < 198.68.123.0/27
198.68.123.0/24 > 198.68.123.0/27

Maybe deciding that should be the TODO item. :-)

-- 
D'Arcy J.M. Cain <darcy@{druid|vex}.net>   |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 424 2871     (DoD#0082)    (eNTP)   |  what's for dinner.

Vadim Mikheev

vadim@krs.ru

over 26 years ago

In reply to: Tom Lane (#3)

Re: [HACKERS] Open 6.5 items

Tom Lane wrote:

If I recall the dynahash.c code correctly, a null return value
indicates either damage to the structure of the table (ie someone
stomped on memory that didn't belong to them) or running out of memory
to add entries to the table. The latter should be impossible if we

Quite different cases and should result in different reactions.
If structure is corrupted then only abort() is proper thing.
If running out of memory then elog(ERROR) is enough.

sized shared memory correctly. Perhaps the table size estimation code
has been obsoleted by recent changes?

lock.h:

/* ----------------------
* The following defines are used to estimate how much shared
* memory the lock manager is going to require.
* See LockShmemSize() in lock.c.
*
* NLOCKS_PER_XACT - The number of unique locks acquired in a transaction
* NLOCKENTS - The maximum number of lock entries in the lock table.
* ----------------------
*/
#define NLOCKS_PER_XACT 40
^^
Isn't it too low?

#define NLOCKENTS(maxBackends) (NLOCKS_PER_XACT*(maxBackends))

And now - LockShmemSize() in lock.c:

/* lockHash table */
size += hash_estimate_size(NLOCKENTS(maxBackends),
^^^^^^^^^^^^^^^^^^^^^^
SHMEM_LOCKTAB_KEYSIZE,
SHMEM_LOCKTAB_DATASIZE);

/* xidHash table */
size += hash_estimate_size(maxBackends,
^^^^^^^^^^^
SHMEM_XIDTAB_KEYSIZE,
SHMEM_XIDTAB_DATASIZE);

Why just maxBackends is here? NLOCKENTS should be used too
(each transaction lock requieres own xidhash entry).

Vadim

Tom Lane

tgl@sss.pgh.pa.us

over 26 years ago

In reply to: Vadim Mikheev (#5)

Re: [HACKERS] Open 6.5 items

Vadim Mikheev <vadim@krs.ru> writes:

If I recall the dynahash.c code correctly, a null return value
indicates either damage to the structure of the table (ie someone
stomped on memory that didn't belong to them) or running out of memory
to add entries to the table. The latter should be impossible if we

Quite different cases and should result in different reactions.

I agree; will see about cleaning up hash_search's call convention after
6.5 is done. Actually, maybe I should do it now? I'm not convinced yet
whether the reports we're seeing are due to memory clobber or running
out of space... fixing this may be the easiest way to find out.

#define NLOCKS_PER_XACT 40
^^
Isn't it too low?

You tell me ... that was the number that was in the 6.4 code, but I
have no idea if it's right or not. (Does MVCC require more locks
than the old stuff?) What is a good upper bound on the number
of concurrently existing locks?

/* xidHash table */
size += hash_estimate_size(maxBackends,
^^^^^^^^^^^
SHMEM_XIDTAB_KEYSIZE,
SHMEM_XIDTAB_DATASIZE);

Why just maxBackends is here? NLOCKENTS should be used too
(each transaction lock requieres own xidhash entry).

Should it be NLOCKENTS(maxBackends) xid entries, or do you mean
NLOCKENTS(maxBackends) + maxBackends? Feel free to stick in any
estimates that you like better --- what's there now is an interpretation
of what the 6.4 code was trying to do (but it was sufficiently buggy and
unreadable that it was probably coming out with different numbers in
the end...)

regards, tom lane

Import Notes

Reply to msg id not found: YourmessageofFri28May1999183105+0800374E7069.CA880C1@krs.ru | Resolved by subject fallback

Tom Lane

tgl@sss.pgh.pa.us

over 26 years ago

In reply to: Tom Lane (#6)

Re: [HACKERS] Open 6.5 items

"D'Arcy" "J.M." Cain <darcy@druid.net> writes:

but we have to decide which of the following is true.

198.68.123.0/24 < 198.68.123.0/27
198.68.123.0/24 > 198.68.123.0/27

I'd say the former, on the same principle that 'abc' < 'abcd'.
Think of the addresses as being bit strings of the specified length,
and compare them the same way character strings are compared.

But if Vixie's got a different opinion, I defer to him...

regards, tom lane

Import Notes

Reply to msg id not found: YourmessageofFri28May1999032137-0400m10nGxV-0000bIC@druid.net | Resolved by subject fallback

Vadim Mikheev

vadim@krs.ru

over 26 years ago

In reply to: Tom Lane (#6)

Re: [HACKERS] Open 6.5 items

Tom Lane wrote:

Vadim Mikheev <vadim@krs.ru> writes:

If I recall the dynahash.c code correctly, a null return value
indicates either damage to the structure of the table (ie someone
stomped on memory that didn't belong to them) or running out of memory
to add entries to the table. The latter should be impossible if we

Quite different cases and should result in different reactions.

I agree; will see about cleaning up hash_search's call convention after
6.5 is done. Actually, maybe I should do it now? I'm not convinced yet
whether the reports we're seeing are due to memory clobber or running
out of space... fixing this may be the easiest way to find out.

Imho, we have to fix it in some way before 6.5
Either by changing dynahash.c (to return 0x1 if table is
corrupted and 0x0 if out of space) or by changing
elog(NOTICE) to elog(ERROR).

#define NLOCKS_PER_XACT 40
^^
Isn't it too low?

You tell me ... that was the number that was in the 6.4 code, but I
have no idea if it's right or not. (Does MVCC require more locks
than the old stuff?) What is a good upper bound on the number
of concurrently existing locks?

Probably yes, because of writers can continue to work and lock
other tables instead of sleeping of first lock due to concurrent
select. I'll change it to 64, but this should be configurable
thing.

/* xidHash table */
size += hash_estimate_size(maxBackends,
^^^^^^^^^^^
SHMEM_XIDTAB_KEYSIZE,
SHMEM_XIDTAB_DATASIZE);

Why just maxBackends is here? NLOCKENTS should be used too
(each transaction lock requieres own xidhash entry).

Should it be NLOCKENTS(maxBackends) xid entries, or do you mean
NLOCKENTS(maxBackends) + maxBackends? Feel free to stick in any
estimates that you like better --- what's there now is an interpretation
of what the 6.4 code was trying to do (but it was sufficiently buggy and
unreadable that it was probably coming out with different numbers in
the end...)

Just NLOCKENTS(maxBackends) - I'll change it now.

Vadim

J.M.

darcy@druid.net

over 26 years ago

In reply to: Tom Lane (#7)

Re: [HACKERS] Open 6.5 items

Thus spake Tom Lane

"D'Arcy" "J.M." Cain <darcy@druid.net> writes:

but we have to decide which of the following is true.

198.68.123.0/24 < 198.68.123.0/27
198.68.123.0/24 > 198.68.123.0/27

I'd say the former, on the same principle that 'abc' < 'abcd'.

And, in fact, that's what happens if you use the operators. The only
place they are equal is when sorting them so they can't be used as
primary keys. I guess there is no argument about the sorting order
if we think they should be sorted. There is still the question of
whether or not they should be sorted. There seems to be tacit sgreement
but could we have a little more discussion. The question is, when inet
or cidr is used as the primary key on a table, should they be considered
equal. In fact, think about the question separately as we may want a
different behaviour for each. Here is my breakdown of the question.

For inet type, the value specifies primarily, I think, the host but
also carries information about its place on the network. Given an inet
type you can extract the host, broadcast, netmask and even the cidr
that it is part of. So, 198.68.123.0/24 and 198.68.123.0/27 really
refer to the same host but on different networks. Since a host can only
be on one network, there is an argument that they can't both be used
as the primary key in the same table.

A cidr type is primarily a network. In fact, some valid inet values
aren't even valid cidr. So, the question is, if one network is part
of another then should it be possible to have both as a primary key?

Of course, both of these beg the real question, should either of these
types be used as a primary key, but that is a database design question.

Think of the addresses as being bit strings of the specified length,
and compare them the same way character strings are compared.

Not sure that that clarifies it but we do have the code to order them
in any case. We just need to decide whether we want to.

But if Vixie's got a different opinion, I defer to him...

Paul's code orders them without regard to netmask which implies "no"
as the answer to the question but his original code only referred to
what we eventually called the cidr type. The question would still
be open for the inet type anyway.

-- 
D'Arcy J.M. Cain <darcy@{druid|vex}.net>   |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 424 2871     (DoD#0082)    (eNTP)   |  what's for dinner.

#10

Tom Lane

tgl@sss.pgh.pa.us

over 26 years ago

In reply to: J.M. (#9)

Re: [HACKERS] Open 6.5 items

"D'Arcy" "J.M." Cain <darcy@druid.net> writes:

And, in fact, that's what happens if you use the operators. The only
place they are equal is when sorting them so they can't be used as
primary keys.

Huh? Indexes and operators are the same thing --- or more specifically,
indexes rely on operators to compare keys. I don't see how it's even
*possible* that an index would think that two keys are equal when the
underlying = operator says they are not.

A little experimentation shows that's indeed what's happening, though.
Weird. Is this a deliberate effect, and if so how did you achieve it?
It looks like what could be a serious bug to me.

I guess there is no argument about the sorting order
if we think they should be sorted. There is still the question of
whether or not they should be sorted. There seems to be tacit sgreement
but could we have a little more discussion. The question is, when inet
or cidr is used as the primary key on a table, should they be considered
equal. In fact, think about the question separately as we may want a
different behaviour for each.

I'd argue that plain indexing ought not try to do anything especially
subtle --- in particular it ought not vary from the behavior of the
comparison operators for the type. If someone wants a table wherein you
can't enter two spellings of the same hostname, the right way would be
to construct a unique functional index using a function that reduces the
INET type into the simpler form. A good analogy might be a text field
where you don't want any two entries to be equal on a case-insensitive
basis. You don't up and change the behavior of indexing to be
case-insensitive, you say
CREATE UNIQUE INDEX foo_f1_key ON foo (lower(f1) text_ops);

regards, tom lane

Import Notes

Reply to msg id not found: YourmessageofSat29May1999080742-0400m10nhtu-0000bIC@druid.net | Resolved by subject fallback

#11

Tom Lane

tgl@sss.pgh.pa.us

over 26 years ago

In reply to: Tom Lane (#10)

Re: [HACKERS] Open 6.5 items

I wrote:

A little experimentation shows that's indeed what's happening, though.
Weird. Is this a deliberate effect, and if so how did you achieve it?

Oh, I see it: the network_cmp function is deliberately inconsistent with
the regular comparison functions on network values.

This is *very bad*. Indexes depend on both the operators and the cmp
support function. You cannot have inconsistent behavior between these
functions, or indexing will misbehave. Do I need to gin up an example
where it fails?

regards, tom lane

Import Notes

Reply to msg id not found: YourmessageofSat29May1999112834-040018135.927991714@sss.pgh.pa.us | Resolved by subject fallback

#12

Hiroshi Inoue

Inoue@tpf.co.jp

over 26 years ago

In reply to: Vadim Mikheev (#8)

RE: [HACKERS] Open 6.5 items

Hello all,

-----Original Message-----
From: owner-pgsql-hackers@postgreSQL.org
[mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Vadim Mikheev
Sent: Saturday, May 29, 1999 2:51 PM
To: Tom Lane
Cc: t-ishii@sra.co.jp; PostgreSQL-development
Subject: Re: [HACKERS] Open 6.5 items

Tom Lane wrote:

Vadim Mikheev <vadim@krs.ru> writes:

If I recall the dynahash.c code correctly, a null return value
indicates either damage to the structure of the table (ie someone
stomped on memory that didn't belong to them) or running out

of memory

to add entries to the table. The latter should be impossible if we

Quite different cases and should result in different reactions.

I agree; will see about cleaning up hash_search's call convention after
6.5 is done. Actually, maybe I should do it now? I'm not convinced yet
whether the reports we're seeing are due to memory clobber or running
out of space... fixing this may be the easiest way to find out.

Imho, we have to fix it in some way before 6.5
Either by changing dynahash.c (to return 0x1 if table is
corrupted and 0x0 if out of space) or by changing
elog(NOTICE) to elog(ERROR).

Another case exists which causes stuck spinlock abort.

status = WaitOnLock(lockmethod, lock, lockmode);

/*
* Check the xid entry status, in case something in the ipc
* communication doesn't work correctly.
*/
if (!((result->nHolding > 0) && (result->holders[lockmode] >
0))
)
{
XID_PRINT_AUX("LockAcquire: INCONSISTENT ", result);
LOCK_PRINT_AUX("LockAcquire: INCONSISTENT ", lock,
lockm
ode);
/* Should we retry ? */
return FALSE;

This case returns without releasing LockMgrLock and doesn't call even
elog().
As far as I see,different entries in xidHash have a same key when above
case occurs. Moreover xidHash has been in abnormal state since the
number of xidHash entries exceeded 256.

Is this bug solved by change maxBackends->NLOCKENTS(maxBackends)
by Vadim or the change about hash by Tom ?

As for my test case,xidHash is filled with XactLockTable entries which have
been acquired by XactLockTableWait().
Could those entries be released immediately after they are acquired ?

Thanks.

Hiroshi Inoue
Inoue@tpf.co.jp

#13

Hiroshi Inoue

Inoue@tpf.co.jp

over 26 years ago

In reply to: Bruce Momjian (#1)

RE: [HACKERS] Open 6.5 items

-----Original Message-----
From: owner-pgsql-hackers@postgreSQL.org
[mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Bruce Momjian
Sent: Friday, May 28, 1999 1:58 PM
To: PostgreSQL-development
Subject: [HACKERS] Open 6.5 items

SELECT * FROM test WHERE test IN (SELECT * FROM test) fails with
strange error
When creating a table with either type inet or type cidr as a
primary,unique
key, the "198.68.123.0/24" and "198.68.123.0/27" are considered equal
Fix function pointer calls to take Datum args for char and int2 args(ecgs)
Regression test for new Numeric type
Large Object memory problems
refint problems
invalidate cache on aborted transaction
spinlock stuck problem
benchmark performance problem

Make psql \help, man pages, and sgml reflect changes in grammar
Markup sql.sgml, Stefan's intro to SQL
Generate Admin, User, Programmer hardcopy postscript
Generate INSTALL and HISTORY from sgml sources.
Update ref/lock.sgml, ref/set.sgml to reflect MVCC and locking changes.

What about mdtruncate() for multi-segments relation ?
AFAIK,it has not been solved yet.

Thanks.

Hiroshi Inoue
Inoue@tpf.co.jp

#14

Vadim Mikheev

vadim@krs.ru

over 26 years ago

In reply to: Hiroshi Inoue (#12)

Re: [HACKERS] Open 6.5 items

Hiroshi Inoue wrote:

As far as I see,different entries in xidHash have a same key when above
case occurs. Moreover xidHash has been in abnormal state since the
number of xidHash entries exceeded 256.

Is this bug solved by change maxBackends->NLOCKENTS(maxBackends)
by Vadim or the change about hash by Tom ?

Should be fixed now.

As for my test case,xidHash is filled with XactLockTable entries which have
been acquired by XactLockTableWait().
Could those entries be released immediately after they are acquired ?

Ops. Thanks! Must be released.

Vadim

#15

Bruce Momjian

maillist@candle.pha.pa.us

over 26 years ago

In reply to: Hiroshi Inoue (#13)

Re: [HACKERS] Open 6.5 items

Make psql \help, man pages, and sgml reflect changes in grammar
Markup sql.sgml, Stefan's intro to SQL
Generate Admin, User, Programmer hardcopy postscript
Generate INSTALL and HISTORY from sgml sources.
Update ref/lock.sgml, ref/set.sgml to reflect MVCC and locking changes.

What about mdtruncate() for multi-segments relation ?
AFAIK,it has not been solved yet.

I thought we decided that file descriptors are kept by backends, and are
still accessable while new backends don't see the files. Correct?

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#16

Hiroshi Inoue

Inoue@tpf.co.jp

over 26 years ago

In reply to: Bruce Momjian (#15)

RE: [HACKERS] Open 6.5 items

-----Original Message-----
From: Bruce Momjian [mailto:maillist@candle.pha.pa.us]
Sent: Monday, May 31, 1999 11:15 AM
To: Hiroshi Inoue
Cc: PostgreSQL-development
Subject: Re: [HACKERS] Open 6.5 items

Make psql \help, man pages, and sgml reflect changes in grammar
Markup sql.sgml, Stefan's intro to SQL
Generate Admin, User, Programmer hardcopy postscript
Generate INSTALL and HISTORY from sgml sources.
Update ref/lock.sgml, ref/set.sgml to reflect MVCC and

locking changes.

What about mdtruncate() for multi-segments relation ?
AFAIK,it has not been solved yet.

I thought we decided that file descriptors are kept by backends, and are
still accessable while new backends don't see the files. Correct?

Yes,other backends could write to unliked files which would be
vanished before long.
I think it's more secure to truncate useless segments to size 0
than unlinking the segments though vacuum would never remove
useless segments.

Thanks.

Hiroshi Inoue
Inoue@tpf.co.jp

#17

Bruce Momjian

maillist@candle.pha.pa.us

over 26 years ago

In reply to: Hiroshi Inoue (#16)

Re: [HACKERS] Open 6.5 items

I thought we decided that file descriptors are kept by backends, and are
still accessable while new backends don't see the files. Correct?

Yes,other backends could write to unliked files which would be
vanished before long.
I think it's more secure to truncate useless segments to size 0
than unlinking the segments though vacuum would never remove
useless segments.

If you truncate, other backends will see the data gone, and will be
writing into the middle of an empty file. Better to remove.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#18

Hiroshi Inoue

Inoue@tpf.co.jp

over 26 years ago

In reply to: Bruce Momjian (#17)

RE: [HACKERS] Open 6.5 items

I thought we decided that file descriptors are kept by

backends, and are

still accessable while new backends don't see the files. Correct?

Yes,other backends could write to unliked files which would be
vanished before long.
I think it's more secure to truncate useless segments to size 0
than unlinking the segments though vacuum would never remove
useless segments.

If you truncate, other backends will see the data gone, and will be
writing into the middle of an empty file. Better to remove.

I couldn't explain more because of my poor English,sorry.

But my test case usually causes backend abort.
My test case is
While 1 or more sessions frequently insert/update a table,
vacuum the table.

After vacuum, those sessions abort with message
ERROR: cannot open segment .. of relation ...

This ERROR finally causes spinlock freeze as I reported in a posting
[HACKERS] spinlock freeze ?(Re: INSERT/UPDATE waiting (another
example)).

Comments ?

Thanks.

Hiroshi Inoue
Inoue@tpf.co.jp

#19

Bruce Momjian

maillist@candle.pha.pa.us

over 26 years ago

In reply to: Hiroshi Inoue (#18)

Re: [HACKERS] Open 6.5 items

I couldn't explain more because of my poor English,sorry.

But my test case usually causes backend abort.
My test case is
While 1 or more sessions frequently insert/update a table,
vacuum the table.

After vacuum, those sessions abort with message
ERROR: cannot open segment .. of relation ...

This ERROR finally causes spinlock freeze as I reported in a posting
[HACKERS] spinlock freeze ?(Re: INSERT/UPDATE waiting (another
example)).

Comments ?

OK, I buy that. How will truncate fix things? Isn't that going to be
strange too. Hard to imagine how we are going to modify these things.
I am now leaning to the truncate option, especially considering that
usually only the last segment is going to be truncated.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#20

Tatsuo Ishii

t-ishii@sra.co.jp

over 26 years ago

In reply to: Bruce Momjian (#19)

Re: [HACKERS] Open 6.5 items

Tom Lane wrote:

Vadim Mikheev <vadim@krs.ru> writes:

If I recall the dynahash.c code correctly, a null return value
indicates either damage to the structure of the table (ie someone
stomped on memory that didn't belong to them) or running out of memory
to add entries to the table. The latter should be impossible if we

Quite different cases and should result in different reactions.

I agree; will see about cleaning up hash_search's call convention after
6.5 is done. Actually, maybe I should do it now? I'm not convinced yet
whether the reports we're seeing are due to memory clobber or running
out of space... fixing this may be the easiest way to find out.

Imho, we have to fix it in some way before 6.5
Either by changing dynahash.c (to return 0x1 if table is
corrupted and 0x0 if out of space) or by changing
elog(NOTICE) to elog(ERROR).

#define NLOCKS_PER_XACT 40
^^
Isn't it too low?

You tell me ... that was the number that was in the 6.4 code, but I
have no idea if it's right or not. (Does MVCC require more locks
than the old stuff?) What is a good upper bound on the number
of concurrently existing locks?

Probably yes, because of writers can continue to work and lock
other tables instead of sleeping of first lock due to concurrent
select. I'll change it to 64, but this should be configurable
thing.

/* xidHash table */
size += hash_estimate_size(maxBackends,
^^^^^^^^^^^
SHMEM_XIDTAB_KEYSIZE,
SHMEM_XIDTAB_DATASIZE);

Why just maxBackends is here? NLOCKENTS should be used too
(each transaction lock requieres own xidhash entry).

Should it be NLOCKENTS(maxBackends) xid entries, or do you mean
NLOCKENTS(maxBackends) + maxBackends? Feel free to stick in any
estimates that you like better --- what's there now is an interpretation
of what the 6.4 code was trying to do (but it was sufficiently buggy and
unreadable that it was probably coming out with different numbers in
the end...)

Just NLOCKENTS(maxBackends) - I'll change it now.

I have just done cvs update and saw your changes. I tried the same
testing as I did before (64 conccurrent connections, and each
connection excutes 100 transactions), but it failed again.

(1) without -B 1024, it failed: out of free buffers: time to abort!

(2) with -B 1024, it went into stuck spin lock

So I looked into sources a little bit, and made a minor change to
include/storage/lock.h:

#define INIT_TABLE_SIZE 100

to:

#define INIT_TABLE_SIZE 4096

then restarted postmaster with -B 1024 (this will prevent
out-of-free-buffers problem, I guess). Now everything seems to work
great!

I suspect that huge INIT_TABLE_SIZE prevented dynamic expanding the
hash tables and seems there's something wrong in the routines
responsible for that.

Comments?
--
Tatsuo Ishii

Import Notes

Reply to msg id not found: YourmessageofSat29May1999135113+0800.374F8051.B990D982@krs.ru | Resolved by subject fallback

#21

Vadim Mikheev

vadim@krs.ru

over 26 years ago

In reply to: Tatsuo Ishii (#20)

Re: [HACKERS] Open 6.5 items

Tatsuo Ishii wrote:

I have just done cvs update and saw your changes. I tried the same
testing as I did before (64 conccurrent connections, and each
connection excutes 100 transactions), but it failed again.

(1) without -B 1024, it failed: out of free buffers: time to abort!

(2) with -B 1024, it went into stuck spin lock

So I looked into sources a little bit, and made a minor change to
include/storage/lock.h:

#define INIT_TABLE_SIZE 100

to:

#define INIT_TABLE_SIZE 4096

then restarted postmaster with -B 1024 (this will prevent
out-of-free-buffers problem, I guess). Now everything seems to work
great!

I suspect that huge INIT_TABLE_SIZE prevented dynamic expanding the
hash tables and seems there's something wrong in the routines
responsible for that.

Seems like that -:(

If we'll not fix expand hash code before 6.5 release then
I would recommend to don't use INIT_TABLE_SIZE in

lockMethodTable->lockHash = (HTAB *) ShmemInitHash(shmemName,
INIT_TABLE_SIZE, MAX_TABLE_SIZE,
&info, hash_flags);

and

lockMethodTable->xidHash = (HTAB *) ShmemInitHash(shmemName,
INIT_TABLE_SIZE, MAX_TABLE_SIZE,
&info, hash_flags);

but use NLOCKENTS(maxBackends) instead.

Vadim

#22

Tom Lane

tgl@sss.pgh.pa.us

over 26 years ago

In reply to: Vadim Mikheev (#21)

Re: [HACKERS] Open 6.5 items

Vadim Mikheev <vadim@krs.ru> writes:

As for my test case,xidHash is filled with XactLockTable entries which have
been acquired by XactLockTableWait().
Could those entries be released immediately after they are acquired ?

Ops. Thanks! Must be released.

Does this account for the "ShmemAlloc: out of memory" errors we've been
seeing? I spent a good deal of time yesterday grovelling through all
the calls to ShmemAlloc, and concluded that (unless there is a memory
stomp somewhere) it has to be caused by one of the shared hashtables
growing well beyond its size estimate.

I did find that the PROC structures are not counted in the initial
sizing of the shared memory block. This is no problem at the default
limit of 32 backends, but could get to be an issue for hundreds of
backends. I will add that item to the size estimate today.

regards, tom lane

Import Notes

Reply to msg id not found: YourmessageofMon31May1999094155+08003751E8E3.A20A7393@krs.ru | Resolved by subject fallback

#23

Tom Lane

tgl@sss.pgh.pa.us

over 26 years ago

In reply to: Tom Lane (#22)

Re: [HACKERS] Open 6.5 items

Vadim Mikheev <vadim@krs.ru> writes:

I suspect that huge INIT_TABLE_SIZE prevented dynamic expanding the
hash tables and seems there's something wrong in the routines
responsible for that.

Seems like that -:(

OK, as the last guy to touch dynahash.c I suppose this is my
bailiwick... I will look into it today.

If we'll not fix expand hash code before 6.5 release then
I would recommend to don't use INIT_TABLE_SIZE in

If we can't figure out what's really wrong then that might be
a good kluge solution to get 6.5 out the door. But I'd rather
try to fix it right first.

Can anyone send me a test case script that reproduces the problem?

regards, tom lane

Import Notes

Reply to msg id not found: YourmessageofMon31May1999173352+080037525780.99FAC65B@krs.ru | Resolved by subject fallback

#24

Tom Lane

tgl@sss.pgh.pa.us

over 26 years ago

In reply to: Tom Lane (#23)

Re: [HACKERS] Open 6.5 items

I wrote:

I suspect that huge INIT_TABLE_SIZE prevented dynamic expanding the
hash tables and seems there's something wrong in the routines
responsible for that.

OK, as the last guy to touch dynahash.c I suppose this is my
bailiwick... I will look into it today.

It's amazing how much easier it is to see a bug when you know it must be
there ;-).

I discovered that the hashtable expansion routine would mess up in the
case that all the records in the bucket being split ended up in the new
bucket rather than the old. In that case it forgot to clear the old
bucket's chain header, with the result that all the records appeared to
be in both buckets at once. This would not be a big problem until and
unless the first record in the chain got deleted --- it would only be
correctly removed from the new bucket, leaving the old bucket's chain
header pointing at a now-free record (and failing to link to any records
that had been added to the shared chain on its behalf in the meanwhile).
Disaster ensues.

An actual failure via this path seems somewhat improbable, but that
may just explain why we hadn't seen it happen very much before...

I have committed a fix in dynahash.c. Hiroshi and Tatsuo, would you
please grab latest sources and see whether the problems you are
observing are fixed?

regards, tom lane

Import Notes

Reply to msg id not found: YourmessageofMon31May1999112250-04009001.928164170@sss.pgh.pa.us | Resolved by subject fallback

#25

Tom Lane

tgl@sss.pgh.pa.us

over 26 years ago

In reply to: Tom Lane (#24)

Re: [HACKERS] Open 6.5 items

Tatsuo Ishii <t-ishii@sra.co.jp> writes:

I have just done cvs update and saw your changes. I tried the same
testing as I did before (64 conccurrent connections, and each
connection excutes 100 transactions), but it failed again.

(1) without -B 1024, it failed: out of free buffers: time to abort!

Right now, the postmaster will let you set any combination of -B and -N
you please. But it seems obvious that there is some minimum number of
buffers per backend below which things aren't going to work very well.
I wonder whether the postmaster startup code ought to enforce a minimum
ratio, say -B at least twice -N ? I have no idea what an appropriate
limit would be, however. Vadim, do you have any thoughts?

regards, tom lane

Import Notes

Reply to msg id not found: YourmessageofMon31May1999172446+0900199905310824.RAA21429@srapc451.sra.co.jp | Resolved by subject fallback

#26

Tom Lane

tgl@sss.pgh.pa.us

over 26 years ago

In reply to: Tom Lane (#25)

Re: [HACKERS] Open 6.5 items

I said:

I did find that the PROC structures are not counted in the initial
sizing of the shared memory block.

Er ... yes they are ... never mind ...

At this point I am not going to make any further changes to the
shared-mem code unless Hiroshi and Tatsuo report that there are still
problems.

I would still like to alter the calling conventions for hash_search
and hash_seq, which are ugly and also dependent on static state
variables. But as far as I can tell right now, those are just code
beautification items rather than repairs for currently-existing bugs.
So it seems best not to risk breaking anything with so little testing
time left in the 6.5 cycle. I will put them on my to-do-for-6.6 list
instead.

regards, tom lane

Import Notes

Reply to msg id not found: YourmessageofMon31May1999110318-04008928.928162998@sss.pgh.pa.us | Resolved by subject fallback

#27

Vadim Mikheev

vadim@krs.ru

over 26 years ago

In reply to: Tom Lane (#22)

Re: [HACKERS] Open 6.5 items

Tom Lane wrote:

Vadim Mikheev <vadim@krs.ru> writes:

As for my test case,xidHash is filled with XactLockTable entries which have
been acquired by XactLockTableWait().
Could those entries be released immediately after they are acquired ?

Ops. Thanks! Must be released.

Does this account for the "ShmemAlloc: out of memory" errors we've been
seeing? I spent a good deal of time yesterday grovelling through all

Yes.

Vadim

#28

Vadim Mikheev

vadim@krs.ru

over 26 years ago

In reply to: Tom Lane (#25)

Re: [HACKERS] Open 6.5 items

Tom Lane wrote:

Tatsuo Ishii <t-ishii@sra.co.jp> writes:

I have just done cvs update and saw your changes. I tried the same
testing as I did before (64 conccurrent connections, and each
connection excutes 100 transactions), but it failed again.

(1) without -B 1024, it failed: out of free buffers: time to abort!

Right now, the postmaster will let you set any combination of -B and -N
you please. But it seems obvious that there is some minimum number of
buffers per backend below which things aren't going to work very well.
I wonder whether the postmaster startup code ought to enforce a minimum
ratio, say -B at least twice -N ? I have no idea what an appropriate

^^^^^^^^^^^^^^
It's enough for select from single table using index, so it's
probably good ratio.

limit would be, however. Vadim, do you have any thoughts?

Vadim

#29

Tatsuo Ishii

t-ishii@sra.co.jp

over 26 years ago

In reply to: Vadim Mikheev (#28)

Re: [HACKERS] Open 6.5 items

Tom,

It's amazing how much easier it is to see a bug when you know it must be
there ;-).

I discovered that the hashtable expansion routine would mess up in the
case that all the records in the bucket being split ended up in the new
bucket rather than the old. In that case it forgot to clear the old
bucket's chain header, with the result that all the records appeared to
be in both buckets at once. This would not be a big problem until and
unless the first record in the chain got deleted --- it would only be
correctly removed from the new bucket, leaving the old bucket's chain
header pointing at a now-free record (and failing to link to any records
that had been added to the shared chain on its behalf in the meanwhile).
Disaster ensues.

An actual failure via this path seems somewhat improbable, but that
may just explain why we hadn't seen it happen very much before...

I have committed a fix in dynahash.c. Hiroshi and Tatsuo, would you
please grab latest sources and see whether the problems you are
observing are fixed?

Bingo! Your fix seems to solve the problem! Now 64 concurrent
transactions ran 100 transactions each without any problem. Thanks.

BTW, the script I'm using for the heavy load testing is written in
Java(not written by me). Do you want to try it?

(1) without -B 1024, it failed: out of free buffers: time to abort!

Right now, the postmaster will let you set any combination of -B and -N
you please. But it seems obvious that there is some minimum number of
buffers per backend below which things aren't going to work very well.
I wonder whether the postmaster startup code ought to enforce a minimum
ratio, say -B at least twice -N ? I have no idea what an appropriate
limit would be, however. Vadim, do you have any thoughts?

Just a few questions.

o I observed the backend processes grew ~10MB with -B 1024. Is this
normal?

o Is it possible to let the backend wait for free buffers in case of
insufficient shared buffers? (with reasonable retries, of course)
--
Tatsuo Ishii

Import Notes

Reply to msg id not found: YourmessageofMon31May1999131350-0400.16165.928170830@sss.pgh.pa.us | Resolved by subject fallback

#30

Vadim Mikheev

vadim@krs.ru

over 26 years ago

In reply to: Tatsuo Ishii (#29)

Re: [HACKERS] Open 6.5 items

Tatsuo Ishii wrote:

Just a few questions.

o I observed the backend processes grew ~10MB with -B 1024. Is this
normal?

Backend attaches to 1024*8K + other shmem, so probably
ps takes it into account.

o Is it possible to let the backend wait for free buffers in case of
insufficient shared buffers? (with reasonable retries, of course)

Yes, but not now.

Vadim

#31

Tatsuo Ishii

t-ishii@sra.co.jp

over 26 years ago

In reply to: Vadim Mikheev (#30)

Re: [HACKERS] Open 6.5 items

Just a few questions.

o I observed the backend processes grew ~10MB with -B 1024. Is this
normal?

Backend attaches to 1024*8K + other shmem, so probably
ps takes it into account.

Oh, I see.

o Is it possible to let the backend wait for free buffers in case of
insufficient shared buffers? (with reasonable retries, of course)

Yes, but not now.

Agreed.
--
Tatsuo Ishii

Import Notes

Reply to msg id not found: YourmessageofTue01Jun1999154045+0800.37538E7D.DC56F3A7@krs.ru | Resolved by subject fallback

#32

Tatsuo Ishii

t-ishii@sra.co.jp

over 26 years ago

In reply to: Tatsuo Ishii (#31)

Re: [HACKERS] Open 6.5 items

[forwarded to hackers list]

Bingo! Your fix seems to solve the problem! Now 64 concurrent
transactions ran 100 transactions each without any problem. Thanks.

~~~~~~~~~~~~users

BTW, the script I'm using for the heavy load testing is written in
Java(not written by me). Do you want to try it?

I am doing some benchmarks and would really appreciate if you could
let me have your Java routines.

It's "JDBCBench" available from:

http://www.worldserver.com/mm.mysql/performance/

Seems it is originally made for MySQL, can be used with PostgreSQL and
other commercial dbms including Oracle, however.

I like it since it:

o automatically creates test data
o simulates heavy loads with the specified number of users and
transactions per user from the command line

I noticed minor bugs with JDBCBench 1.0. Also I added begin/end so
that the set of operations are performed in a transaction. Here are
diffs: (please make sure that the file is in Unix format. seems the
original file is in DOS format.)

*** JDBCBench.java.orig	Tue Jun  1 17:31:11 1999
--- JDBCBench.java	Tue Jun  1 17:32:04 1999
***************
*** 18,24 ****
      public final static int TELLER  = 0;
      public final static int BRANCH  = 1;
      public final static int ACCOUNT = 2;
!

      private Connection Conn = null;
--- 18,24 ----
      public final static int TELLER  = 0;
      public final static int BRANCH  = 1;
      public final static int ACCOUNT = 2;
!   static String DBUrl = "";

private Connection Conn = null;
***************
*** 40,46 ****
public static void main(String[] Args)
{
String DriverName = "";
! String DBUrl = "";
boolean initialize_dataset = false;

        for (int i = 0; i < Args.length; i++) {
--- 40,46 ----
      public static void main(String[] Args)
      {
        String DriverName = "";
! 
        boolean initialize_dataset = false;

        for (int i = 0; i < Args.length; i++) {
***************
*** 286,291 ****
--- 286,299 ----

          public void run()
          {
+ 	  Connection myC = null;
+ 	  try {
+ 	    myC = DriverManager.getConnection(DBUrl);
+ 	  }
+ 	  catch (Exception E) {
+ 	    System.out.println(E.getMessage());
+ 	    E.printStackTrace();
+ 	  }
              while (ntrans-- > 0) {

int account = JDBCBench.getRandomID(ACCOUNT);
***************
*** 293,299 ****
int teller = JDBCBench.getRandomID(TELLER);
int delta = JDBCBench.getRandomInt(0,1000);

!                 doOne(account, branch, teller, delta);
                  incrementTransactionCount();
              }
              reportDone();
--- 301,307 ----
                  int teller  = JDBCBench.getRandomID(TELLER);
                  int delta   = JDBCBench.getRandomInt(0,1000);

! doOne(myC, account, branch, teller, delta);
incrementTransactionCount();
}
reportDone();
***************
*** 303,320 ****
* doOne() - Executes a single TPC BM B transaction.
*/

! int doOne(int bid, int tid, int aid, int delta)
{
try {
! Statement Stmt = Conn.createStatement();

! String Query = "UPDATE accounts ";
Query+= "SET Abalance = Abalance + " + delta + " ";
Query+= "WHERE Aid = " + aid;

                  Stmt.executeUpdate(Query);
                  Stmt.clearWarnings();
!                        
                  Query = "SELECT Abalance ";
                  Query+= "FROM   accounts ";
                  Query+= "WHERE  Aid = " + aid;
--- 311,334 ----
           *  doOne() - Executes a single TPC BM B transaction.
           */

! int doOne(Connection myC, int aid, int bid, int tid, int delta)
{
+ int aBalance = 0;
try {
! String Query;
! Statement Stmt = myC.createStatement();

! Stmt.executeUpdate("begin");
! Stmt.clearWarnings();
!
! Query = "UPDATE accounts ";
Query+= "SET Abalance = Abalance + " + delta + " ";
Query+= "WHERE Aid = " + aid;

Stmt.executeUpdate(Query);
Stmt.clearWarnings();
!
!
Query = "SELECT Abalance ";
Query+= "FROM accounts ";
Query+= "WHERE Aid = " + aid;
***************
*** 322,333 ****
ResultSet RS = Stmt.executeQuery(Query);
Stmt.clearWarnings();

! int aBalance = 0;

                  while (RS.next()) {
                      aBalance = RS.getInt(1);
                  }
!         
                  Query = "UPDATE tellers ";
                  Query+= "SET    Tbalance = Tbalance + " + delta + " ";
                  Query+= "WHERE  Tid = " + tid;
--- 336,348 ----
                  ResultSet RS = Stmt.executeQuery(Query);
                  Stmt.clearWarnings();

! aBalance = 0;

+ 
                  while (RS.next()) {
                      aBalance = RS.getInt(1);
                  }
! 
                  Query = "UPDATE tellers ";
                  Query+= "SET    Tbalance = Tbalance + " + delta + " ";
                  Query+= "WHERE  Tid = " + tid;
***************
*** 350,355 ****
--- 365,373 ----
                  Query+= delta + ")";

                  Stmt.executeUpdate(Query);
+                 Stmt.clearWarnings();
+ 
+                 Stmt.executeUpdate("end");
                  Stmt.clearWarnings();

return aBalance;

Import Notes

Reply to msg id not found: YourmessageofTue01Jun1999091336+0200.37538820.ED3F2589@flame.co.za | Resolved by subject fallback

#33

Hiroshi Inoue

Inoue@tpf.co.jp

over 26 years ago

In reply to: Tom Lane (#24)

RE: [HACKERS] Open 6.5 items

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Tuesday, June 01, 1999 2:14 AM
To: Hiroshi Inoue; t-ishii@sra.co.jp
Cc: PostgreSQL-development
Subject: Re: [HACKERS] Open 6.5 items

I wrote:

I suspect that huge INIT_TABLE_SIZE prevented dynamic expanding the
hash tables and seems there's something wrong in the routines
responsible for that.

OK, as the last guy to touch dynahash.c I suppose this is my
bailiwick... I will look into it today.

It's amazing how much easier it is to see a bug when you know it must be
there ;-).

[snip]

I have committed a fix in dynahash.c. Hiroshi and Tatsuo, would you
please grab latest sources and see whether the problems you are
observing are fixed?

It works fine.
The number of xidHash entry exceeded 600 but spinlock error didn't
occur.

However,when I did vacuum while testing I got the following error
message.
ERROR: Child itemid marked as unused

TransactionId-s of tuples in update chain may be out of order.

Thanks.

Hiroshi Inoue
Inoue@tpf.co.jp

#34

Hiroshi Inoue

Inoue@tpf.co.jp

over 26 years ago

In reply to: Bruce Momjian (#19)

RE: [HACKERS] Open 6.5 items

-----Original Message-----
From: owner-pgsql-hackers@postgreSQL.org
[mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Bruce Momjian
Sent: Monday, May 31, 1999 3:15 PM
To: Hiroshi Inoue
Cc: PostgreSQL-development
Subject: Re: [HACKERS] Open 6.5 items

I couldn't explain more because of my poor English,sorry.

But my test case usually causes backend abort.
My test case is
While 1 or more sessions frequently insert/update a table,
vacuum the table.

After vacuum, those sessions abort with message
ERROR: cannot open segment .. of relation ...

This ERROR finally causes spinlock freeze as I reported in a posting
[HACKERS] spinlock freeze ?(Re: INSERT/UPDATE waiting (another
example)).

Comments ?

OK, I buy that. How will truncate fix things? Isn't that going to be
strange too. Hard to imagine how we are going to modify these things.
I am now leaning to the truncate option, especially considering that
usually only the last segment is going to be truncated.

I made a patch on trial.

1) Useless segments are never removed by my implementation
because I call FileTruncate() instead of File(Name)Unlink().
2) mdfd_lstbcnt member of MdfdVec was abused in mdnblocks().
I am maintaining the value of mdfd_lstbcnt unwillingly.
Is it preferable to get rid of mdfd_lstbcnt completely ?

I'm not sure that this patch has no problem.
Please check and comment on my patch.

I don't have > 1G disk space.
Could someone run under > 1G environment ?

As Ole Gjerde mentioned,current implementation by his old
patch is not right. His new patch seems right if vacuum is
executed alone.
Please run vacuum while other concurrent sessions are
reading or writing,if you would see the difference.

Thanks.

Hiroshi Inoue
Inoue@tpf.co.jp

*** storage/smgr/md.c.orig	Wed May 26 16:05:02 1999
--- storage/smgr/md.c	Wed Jun  2 15:35:35 1999
***************
*** 674,684 ****
  	segno = 0;
  	for (;;)
  	{
! 		if (v->mdfd_lstbcnt == RELSEG_SIZE
! 			|| (nblocks = _mdnblocks(v->mdfd_vfd, BLCKSZ)) == RELSEG_SIZE)
  		{
- 
- 			v->mdfd_lstbcnt = RELSEG_SIZE;
  			segno++;

  			if (v->mdfd_chain == (MdfdVec *) NULL)
--- 674,685 ----
  	segno = 0;
  	for (;;)
  	{
! 		nblocks = _mdnblocks(v->mdfd_vfd, BLCKSZ);
! 		if (nblocks > RELSEG_SIZE)
! 			elog(FATAL, "segment too big in mdnblocks!");
! 		v->mdfd_lstbcnt = nblocks;
! 		if (nblocks == RELSEG_SIZE)
  		{
  			segno++;

if (v->mdfd_chain == (MdfdVec *) NULL)
***************
*** 714,745 ****
int curnblk,
i,
oldsegno,
! newsegno;
! char fname[NAMEDATALEN];
! char tname[NAMEDATALEN + 10];

curnblk = mdnblocks(reln);
oldsegno = curnblk / RELSEG_SIZE;
newsegno = nblocks / RELSEG_SIZE;

- StrNCpy(fname, RelationGetRelationName(reln)->data, NAMEDATALEN);
-
- if (newsegno < oldsegno)
- {
- for (i = (newsegno + 1);; i++)
- {
- sprintf(tname, "%s.%d", fname, i);
- if (FileNameUnlink(tname) < 0)
- break;
- }
- }
#endif

fd = RelationGetFile(reln);
v = &Md_fdvec[fd];

if (FileTruncate(v->mdfd_vfd, nblocks * BLCKSZ) < 0)
return -1;

return nblocks;

--- 715,766 ----
  	int			curnblk,
  				i,
  				oldsegno,
! 				newsegno,
! 				lastsegblocks;
! 	MdfdVec			**varray;

curnblk = mdnblocks(reln);
+ if (nblocks > curnblk)
+ return -1;
oldsegno = curnblk / RELSEG_SIZE;
newsegno = nblocks / RELSEG_SIZE;

#endif

fd = RelationGetFile(reln);
v = &Md_fdvec[fd];

+ #ifndef LET_OS_MANAGE_FILESIZE
+ 	varray = (MdfdVec **)palloc((oldsegno + 1) * sizeof(MdfdVec *));
+ 	for (i = 0; i <= oldsegno; i++)
+ 	{
+ 		if (!v)
+ 			elog(ERROR,"segment isn't open in mdtruncate!");
+ 		varray[i] = v;
+ 		v = v->mdfd_chain;
+ 	}
+ 	for (i = oldsegno; i > newsegno; i--)
+ 	{
+ 		v = varray[i];
+ 		if (FileTruncate(v->mdfd_vfd, 0) < 0)
+ 		{
+ 			pfree(varray);
+ 			return -1;
+ 		}
+ 		v->mdfd_lstbcnt = 0;
+ 	}
+ 	/* Calculate the # of blocks in the last segment */
+ 	lastsegblocks = nblocks - (newsegno * RELSEG_SIZE);
+ 	v = varray[newsegno];
+ 	pfree(varray);
+ 	if (FileTruncate(v->mdfd_vfd, lastsegblocks * BLCKSZ) < 0)
+ 		return -1;
+ 	v->mdfd_lstbcnt = lastsegblocks;
+ #else
  	if (FileTruncate(v->mdfd_vfd, nblocks * BLCKSZ) < 0)
  		return -1;
+ 	v->mdfd_lstbcnt = nblocks;
+ #endif

return nblocks;

#35

Vadim Mikheev

vadim@krs.ru

over 26 years ago

In reply to: Hiroshi Inoue (#33)

Re: [HACKERS] Open 6.5 items

Hiroshi Inoue wrote:

However,when I did vacuum while testing I got the following error
message.
ERROR: Child itemid marked as unused

TransactionId-s of tuples in update chain may be out of order.

I see... Need 1-2 days to fix this -:(

Vadim

#36

Vadim Mikheev

vadim@krs.ru

over 26 years ago

In reply to: Hiroshi Inoue (#33)

Re: [HACKERS] Open 6.5 items

Hiroshi Inoue wrote:

However,when I did vacuum while testing I got the following error
message.
ERROR: Child itemid marked as unused

TransactionId-s of tuples in update chain may be out of order.

1. Fix and explanation in xact.c:CommitTransaction():

RecordTransactionCommit();

/*
* Let others know about no transaction in progress by me.
* Note that this must be done _before_ releasing locks we hold
* and SpinAcquire(ShmemIndexLock) is required - or bad (too high)
* XmaxRecent value might be used by vacuum: UPDATE with xid 0 is
* blocked by xid 1' UPDATE, xid 1 is doing commit while xid 2
* gets snapshot - if xid 2' GetSnapshotData sees xid 1 as running
* then it must see xid 0 as running as well or XmaxRecent = 1
* might be used by concurrent vacuum causing
* ERROR: Child itemid marked as unused
* This bug was reported by Hiroshi Inoue and I was able to reproduce
* it with 3 sessions and gdb. - vadim 06/03/99
*/
if (MyProc != (PROC *) NULL)
{
SpinAcquire(ShmemIndexLock);
MyProc->xid = InvalidTransactionId;
MyProc->xmin = InvalidTransactionId;
SpinRelease(ShmemIndexLock);
}

2. It was possible to get two versions of the same row from
select. Fixed by moving MyProc->xid assignment from
StartTransaction() inside GetNewTransactionId().

Thanks, Hiroshi! And please run your tests - I used just
3 sessions and gdb.

Vadim

#37

Tom Lane

tgl@sss.pgh.pa.us

over 26 years ago

In reply to: Vadim Mikheev (#36)

Re: [HACKERS] Open 6.5 items

Vadim Mikheev <vadim@krs.ru> writes:

ERROR: Child itemid marked as unused

[ is fixed ]

Great! Vadim (also Hiroshi and Tatsuo), how many bugs remain on your
must-fix-for-6.5 lists? I was just wondering over in the "Freezing
docs" thread whether we had any problems severe enough to justify
delaying the release. It sounds like at least one such problem is
gone...

regards, tom lane

Import Notes

Reply to msg id not found: YourmessageofThu03Jun1999214458+0800375686DA.B584296E@krs.ru | Resolved by subject fallback

#38

Vadim Mikheev

vadim@krs.ru

over 26 years ago

In reply to: Tom Lane (#37)

Re: [HACKERS] Open 6.5 items

Tom Lane wrote:

Vadim Mikheev <vadim@krs.ru> writes:

ERROR: Child itemid marked as unused

[ is fixed ]

Great! Vadim (also Hiroshi and Tatsuo), how many bugs remain on your
must-fix-for-6.5 lists? I was just wondering over in the "Freezing
docs" thread whether we had any problems severe enough to justify
delaying the release. It sounds like at least one such problem is
gone...

No one in mine.

There are still some bad things, but they are old:

1. elog(NOTICE) in lock manager when locking was not succeeded.
Hope that our recent changes will reduce possibility of this.

Hiroshi wrote:

spinlock io_in_progress_lock of a buffer page is not
released by operations called by elog() such as
ProcReleaseSpins(),ResetBufferPool() etc.

I tried to fix this before 6.4 but without success (don't
remember why).

It seems elog(FATAL) doesn't release allocated buffer pages.
It's OK ?
AFAIC elog(FATAL) causes proc_exit(0) and proc_exit() doesn't
call ResetBufferPool().

Seems to me that elog(FATAL) should call siglongjmp(Warn_restart, 1),
like elog(ERROR), but force exit in tcop main loop after
AbortCurrentTransaction(). AbortTransaction() does pretty nice
things like RelationPurgeLocalRelation(false) and DestroyNoNameRels()...

Vadim

#39

Hiroshi Inoue

Inoue@tpf.co.jp

over 26 years ago

In reply to: Vadim Mikheev (#36)

RE: [HACKERS] Open 6.5 items

-----Original Message-----
From: root@sunpine.krs.ru [mailto:root@sunpine.krs.ru]On Behalf Of Vadim
Mikheev
Sent: Thursday, June 03, 1999 10:45 PM
To: Hiroshi Inoue
Cc: Tom Lane; t-ishii@sra.co.jp; PostgreSQL-development
Subject: Re: [HACKERS] Open 6.5 items

Hiroshi Inoue wrote:

However,when I did vacuum while testing I got the following error
message.
ERROR: Child itemid marked as unused

TransactionId-s of tuples in update chain may be out of order.

1. Fix and explanation in xact.c:CommitTransaction():

RecordTransactionCommit();

/*
* Let others know about no transaction in progress by me.
* Note that this must be done _before_ releasing locks we hold
* and SpinAcquire(ShmemIndexLock) is required - or bad (too high)
* XmaxRecent value might be used by vacuum: UPDATE with xid 0 is
* blocked by xid 1' UPDATE, xid 1 is doing commit while xid 2
* gets snapshot - if xid 2' GetSnapshotData sees xid 1 as running
* then it must see xid 0 as running as well or XmaxRecent = 1
* might be used by concurrent vacuum causing
* ERROR: Child itemid marked as unused
* This bug was reported by Hiroshi Inoue and I was able to reproduce
* it with 3 sessions and gdb. - vadim 06/03/99
*/
if (MyProc != (PROC *) NULL)
{
SpinAcquire(ShmemIndexLock);
MyProc->xid = InvalidTransactionId;
MyProc->xmin = InvalidTransactionId;
SpinRelease(ShmemIndexLock);
}

2. It was possible to get two versions of the same row from
select. Fixed by moving MyProc->xid assignment from
StartTransaction() inside GetNewTransactionId().

Thanks, Hiroshi! And please run your tests - I used just
3 sessions and gdb.

Unfortunately,the error still occurs(I changed xact.c as above
by hand OK ?).

It seems there are cases that tuples are updated by older
transactions than their xmin-s and only some tuples in the middle
of update chain may be deleted.

I have no idea to fix this now.
It's OK for me to leave this unsolved because those cases would
rarely occur.

Thanks.

Hiroshi Inoue
Inoue@tpf.co.jp

#40

Hiroshi Inoue

Inoue@tpf.co.jp

over 26 years ago

In reply to: Vadim Mikheev (#38)

RE: [HACKERS] Open 6.5 items

-----Original Message-----
From: root@sunpine.krs.ru [mailto:root@sunpine.krs.ru]On Behalf Of Vadim
Mikheev
Sent: Friday, June 04, 1999 2:24 AM
To: Tom Lane
Cc: Hiroshi Inoue; t-ishii@sra.co.jp; PostgreSQL-development
Subject: Re: [HACKERS] Open 6.5 items

Tom Lane wrote:

Vadim Mikheev <vadim@krs.ru> writes:

ERROR: Child itemid marked as unused

[ is fixed ]

Great! Vadim (also Hiroshi and Tatsuo), how many bugs remain on your
must-fix-for-6.5 lists? I was just wondering over in the "Freezing
docs" thread whether we had any problems severe enough to justify
delaying the release. It sounds like at least one such problem is
gone...

No one in mine.

[snip]

Hiroshi wrote:

2.

spinlock io_in_progress_lock of a buffer page is not
released by operations called by elog() such as
ProcReleaseSpins(),ResetBufferPool() etc.

I tried to fix this before 6.4 but without success (don't
remember why).

This is not in my must-fix-for-6.5 lists.
For the present this is caused by other bugs not by bufmgr/smgr itself
and an easy fix may introduce other bugs.

And on segmented relations.

Ole Gjerde who provided the patch for current implementation of
mdtruncate() sayz.
"First, please reverse my patch to mdtruncate() in md.c as soon as
possible. It does not work properly in some cases."

I also recommend to reverse his patch to mdtruncate().

Though we could not shrink segmented relations by old implementation
the result by vacuum would never be inconsistent(?).

I think we don't have enough time to fix this.

Thanks.

Hiroshi Inoue
Inoue@tpf.co.jp

#41

Vadim Mikheev

vadim@krs.ru

over 26 years ago

In reply to: Hiroshi Inoue (#39)

Re: [HACKERS] Open 6.5 items

Hiroshi Inoue wrote:

2. It was possible to get two versions of the same row from
select. Fixed by moving MyProc->xid assignment from
StartTransaction() inside GetNewTransactionId().

Thanks, Hiroshi! And please run your tests - I used just
3 sessions and gdb.

Unfortunately,the error still occurs(I changed xact.c as above
by hand OK ?).

Did you add 2. changes too?
Also, I made some changes in shmem.c:GetSnapshotData() but seems
that they are not relevant. Could you post me your xact.c and
varsup.c or grab current sources?

It seems there are cases that tuples are updated by older
transactions than their xmin-s and only some tuples in the middle
of update chain may be deleted.

I have no idea to fix this now.
It's OK for me to leave this unsolved because those cases would
rarely occur.

I would like to see it fixed anyway.

Vadim

#42

Hiroshi Inoue

Inoue@tpf.co.jp

over 26 years ago

In reply to: Vadim Mikheev (#41)

RE: [HACKERS] Open 6.5 items

-----Original Message-----
From: root@sunpine.krs.ru [mailto:root@sunpine.krs.ru]On Behalf Of Vadim
Mikheev
Sent: Friday, June 04, 1999 5:47 PM
To: Hiroshi Inoue
Cc: Tom Lane; t-ishii@sra.co.jp; PostgreSQL-development
Subject: Re: [HACKERS] Open 6.5 items

Hiroshi Inoue wrote:

2. It was possible to get two versions of the same row from
select. Fixed by moving MyProc->xid assignment from
StartTransaction() inside GetNewTransactionId().

Thanks, Hiroshi! And please run your tests - I used just
3 sessions and gdb.

Unfortunately,the error still occurs(I changed xact.c as above
by hand OK ?).

Did you add 2. changes too?
Also, I made some changes in shmem.c:GetSnapshotData() but seems
that they are not relevant. Could you post me your xact.c and
varsup.c or grab current sources?

OK I would grab current sources and retry.

Thanks.

Hiroshi Inoue
Inoue@tpf.co.jp

#43

Bruce Momjian

maillist@candle.pha.pa.us

over 26 years ago

In reply to: Hiroshi Inoue (#40)

Re: [HACKERS] Open 6.5 items

Hiroshi wrote:

2.

spinlock io_in_progress_lock of a buffer page is not
released by operations called by elog() such as
ProcReleaseSpins(),ResetBufferPool() etc.

I tried to fix this before 6.4 but without success (don't
remember why).

This is not in my must-fix-for-6.5 lists.
For the present this is caused by other bugs not by bufmgr/smgr itself
and an easy fix may introduce other bugs.

And on segmented relations.

Ole Gjerde who provided the patch for current implementation of
mdtruncate() sayz.
"First, please reverse my patch to mdtruncate() in md.c as soon as
possible. It does not work properly in some cases."

I also recommend to reverse his patch to mdtruncate().

Though we could not shrink segmented relations by old implementation
the result by vacuum would never be inconsistent(?).

I think we don't have enough time to fix this.

So what do we put in its place when we reverse out the patch?

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#44

Tom Lane

tgl@sss.pgh.pa.us

over 26 years ago

In reply to: Bruce Momjian (#43)

Re: [HACKERS] Open 6.5 items

Vadim Mikheev <vadim@krs.ru> writes:

Tom Lane wrote:

Right now, the postmaster will let you set any combination of -B and -N
you please. But it seems obvious that there is some minimum number of
buffers per backend below which things aren't going to work very well.
I wonder whether the postmaster startup code ought to enforce a minimum
ratio, say -B at least twice -N ? I have no idea what an appropriate

^^^^^^^^^^^^^^
It's enough for select from single table using index, so it's
probably good ratio.

I've added a check in postmaster.c to require -B to be at least twice
-N, per this discussion. It also enforces a minimum -B of 16 no matter
how small -N is. I pulled that number out of the air --- anyone have an
idea whether some other number would be better?

The current default values of -B 64, -N 32 are not affected. However,
since the -N default is easily configurable, I wonder whether we should
move the default -B value into config.h as well...

regards, tom lane

Import Notes

Reply to msg id not found: YourmessageofTue01Jun1999102907+080037534572.8E77A45E@krs.ru | Resolved by subject fallback

#45

Vadim Mikheev

vadim@krs.ru

over 26 years ago

In reply to: Tom Lane (#44)

Re: [HACKERS] Open 6.5 items

Hiroshi Inoue wrote:

I have downloaded snapshot(99/06/04) and retried.
But the result didn't change.

I run Hiroshi' test locally and got the same ERROR.
Now I see that my changes in xact.c can't fix
XmaxRecent vacuum problem, though they avoid ability
to get the same tuple twice.

I need in additional time... Jun 9th...

Vadim

Import Notes

Reference msg id not found: 000401beae78cccc44402801007e@cadzone.tpf.co.jp | Resolved by subject fallback

#46

Hiroshi Inoue

Inoue@tpf.co.jp

over 26 years ago

In reply to: Bruce Momjian (#43)

RE: [HACKERS] Open 6.5 items

-----Original Message-----
From: Bruce Momjian [mailto:maillist@candle.pha.pa.us]
Sent: Friday, June 04, 1999 11:37 PM
To: Hiroshi Inoue
Cc: Vadim Mikheev; Tom Lane; t-ishii@sra.co.jp; PostgreSQL-development
Subject: Re: [HACKERS] Open 6.5 items

Hiroshi wrote:

And on segmented relations.

Ole Gjerde who provided the patch for current implementation of
mdtruncate() sayz.
"First, please reverse my patch to mdtruncate() in md.c as soon as
possible. It does not work properly in some cases."

I also recommend to reverse his patch to mdtruncate().

Though we could not shrink segmented relations by old implementation
the result by vacuum would never be inconsistent(?).

I think we don't have enough time to fix this.

So what do we put in its place when we reverse out the patch?

Future TODO items ?

As far as I see,there's no consensus of opinion whether we would
remove useless segments(I also think it's preferable if possible) or
we would only truncate the segments(as my trial patch does).

Only Bruce and Ole objected to my opinion and no one agreed
with me.
How do other people who would use segmented relations think ?

Thanks.

Hiroshi Inoue
Inoue@tpf.co.p

#47

Bruce Momjian

maillist@candle.pha.pa.us

over 26 years ago

In reply to: Hiroshi Inoue (#46)

Re: [HACKERS] Open 6.5 items

Future TODO items ?

As far as I see,there's no consensus of opinion whether we would
remove useless segments(I also think it's preferable if possible) or
we would only truncate the segments(as my trial patch does).

Only Bruce and Ole objected to my opinion and no one agreed
with me.
How do other people who would use segmented relations think ?

I liked unlinking because it allowed old backends to still see the
segments if they still have open file descriptors, and new backends can
see there is no file there. That seemed nice, but you clearly
demostrated it caused major problems. Maybe truncation is the answer.
I don't know, but we need to resolve this for 6.5. I can't imagine us
focusing on this like we have in the past few weeks. Let's just figure
out an answer. I am on IRC now if someone can get on to discuss this. I
will even phone someone in US or Canada to discuss it.

What is it on the backend that causes some backend to think there is
another segment. Does it just go off the end of the max segment size
and try to open another, or do we store the number of segments
somewhere. I thought it was the former in sgml() area. I honestly don't
care if the segment files stay around if that is going to be a reliable
solution.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#48

The Hermit Hacker

scrappy@hub.org

over 26 years ago

In reply to: Bruce Momjian (#47)

Re: [HACKERS] Open 6.5 items

On Sun, 6 Jun 1999, Bruce Momjian wrote:

Future TODO items ?

As far as I see,there's no consensus of opinion whether we would
remove useless segments(I also think it's preferable if possible) or
we would only truncate the segments(as my trial patch does).

Only Bruce and Ole objected to my opinion and no one agreed
with me.
How do other people who would use segmented relations think ?

I liked unlinking because it allowed old backends to still see the
segments if they still have open file descriptors, and new backends can
see there is no file there. That seemed nice, but you clearly
demostrated it caused major problems. Maybe truncation is the answer.
I don't know, but we need to resolve this for 6.5. I can't imagine us
focusing on this like we have in the past few weeks. Let's just figure
out an answer. I am on IRC now if someone can get on to discuss this. I
will even phone someone in US or Canada to discuss it.

What is it on the backend that causes some backend to think there is
another segment. Does it just go off the end of the max segment size
and try to open another, or do we store the number of segments
somewhere. I thought it was the former in sgml() area. I honestly don't
care if the segment files stay around if that is going to be a reliable
solution.

Other then the inode being used, what is wrong with a zero-length segment
file?

Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org

#49

Bruce Momjian

maillist@candle.pha.pa.us

over 26 years ago

In reply to: The Hermit Hacker (#48)

Re: [HACKERS] Open 6.5 items

What is it on the backend that causes some backend to think there is
another segment. Does it just go off the end of the max segment size
and try to open another, or do we store the number of segments
somewhere. I thought it was the former in sgml() area. I honestly don't
care if the segment files stay around if that is going to be a reliable
solution.

Other then the inode being used, what is wrong with a zero-length segment
file?

Nothing is wrong with it. I just thought it would be more reliable to
unlink it, but now am considering I was wrong.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#50

The Hermit Hacker

scrappy@hub.org

over 26 years ago

In reply to: Bruce Momjian (#49)

Re: [HACKERS] Open 6.5 items

On Mon, 7 Jun 1999, Bruce Momjian wrote:

What is it on the backend that causes some backend to think there is
another segment. Does it just go off the end of the max segment size
and try to open another, or do we store the number of segments
somewhere. I thought it was the former in sgml() area. I honestly don't
care if the segment files stay around if that is going to be a reliable
solution.

Other then the inode being used, what is wrong with a zero-length segment
file?

Nothing is wrong with it. I just thought it would be more reliable to
unlink it, but now am considering I was wrong.

Just a thought, but if you left it zero length, the dba could use it as a
means for estimating disk space requirements? :) buff.0 buff.1 is zero
lenght, but buff.2 isn't, we know that we've filled 2x1gig buffers plus a
little bit, so can allocate space accordingly? :)

I'm groping here, help me out ... :)

Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org

#51

Bruce Momjian

maillist@candle.pha.pa.us

over 26 years ago

In reply to: The Hermit Hacker (#50)

Re: [HACKERS] Open 6.5 items

Nothing is wrong with it. I just thought it would be more reliable to
unlink it, but now am considering I was wrong.

Just a thought, but if you left it zero length, the dba could use it as a
means for estimating disk space requirements? :) buff.0 buff.1 is zero
lenght, but buff.2 isn't, we know that we've filled 2x1gig buffers plus a
little bit, so can allocate space accordingly? :)

I'm groping here, help me out ... :)

One reason to do truncate is that if it is a symbolic link to another
driver, that link will stay, while unlink will not, and will recreate on
on the same drive.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#52

ZEUGSWETTER Andreas IZ5

Andreas.Zeugswetter@telecom.at

over 26 years ago

In reply to: Bruce Momjian (#51)

Re: [HACKERS] Open 6.5 items

One reason to do truncate is that if it is a symbolic link to another
driver, that link will stay, while unlink will not, and will recreate on
on the same drive.

That is the first good argument in favor of truncate other than
coding issues. I like it. It is actually very good :-)

My only concern with leaving the files around would be, that there is
a chance for other backends to write where they should not.

Andreas

Import Notes

Resolved by subject fallback

#53

Vadim Mikheev

vadim@krs.ru

over 26 years ago

In reply to: ZEUGSWETTER Andreas IZ5 (#52)

Re: [HACKERS] Open 6.5 items

ZEUGSWETTER Andreas IZ5 wrote:

One reason to do truncate is that if it is a symbolic link to another
driver, that link will stay, while unlink will not, and will recreate on
on the same drive.

That is the first good argument in favor of truncate other than
coding issues. I like it. It is actually very good :-)

My only concern with leaving the files around would be, that there is
a chance for other backends to write where they should not.

Seems that shouldn't be problem here:

- only vacuum trancates relations;
- vacuum locks a relation in Access Exclusive mode;
- hepa_beginscan (used by both SeqScan and IndexScan)
tries to lock a relation in Access Share mode _before_
call to RelationGetNumberOfBlocks()->smgrnblocks();
- INSERT acquires Row Exclusive lock on relation
before doing anything;

- so, anyone should get valid number of blocks.

But mdmgr should be tested...

Vadim

#54

Ole Gjerde

gjerde@icebox.org

over 26 years ago

In reply to: Hiroshi Inoue (#34)

RE: [HACKERS] Open 6.5 items

On Wed, 2 Jun 1999, Hiroshi Inoue wrote:

I made a patch on trial.
1) Useless segments are never removed by my implementation
because I call FileTruncate() instead of File(Name)Unlink().
I'm not sure that this patch has no problem.
Please check and comment on my patch.

I have tried it, and it seems to work.

As Ole Gjerde mentioned,current implementation by his old
patch is not right. His new patch seems right if vacuum is
executed alone.

Yes, my first patch was horribly wrong :)
The second one, as you mention, only works right if no reading or writing
is going on.

I'll talk more about the new patch in later emails.

Ole Gjerde

#55

Ole Gjerde

gjerde@icebox.org

over 26 years ago

In reply to: Bruce Momjian (#47)

Re: [HACKERS] Open 6.5 items

On Sun, 6 Jun 1999, Bruce Momjian wrote:

I liked unlinking because it allowed old backends to still see the
segments if they still have open file descriptors, and new backends can
see there is no file there. That seemed nice, but you clearly
demostrated it caused major problems. Maybe truncation is the answer.
I don't know, but we need to resolve this for 6.5. I can't imagine us
focusing on this like we have in the past few weeks. Let's just figure
out an answer. I am on IRC now if someone can get on to discuss this. I
will even phone someone in US or Canada to discuss it.

Personally, I think the right thing is to unlink the unused segments. For
the most part keeping them around is not going to cause any problems, but
I can't really think of any good reasons to keep them around. Keeping the
database directories clean is a good thing in my opinion.

What is it on the backend that causes some backend to think there is
another segment. Does it just go off the end of the max segment size
and try to open another, or do we store the number of segments
somewhere. I thought it was the former in sgml() area. I honestly don't
care if the segment files stay around if that is going to be a reliable
solution.

The new patch from Hiroshi Inoue <Inoue@tpf.co.jp> works. I believe it is
a reliable solution, I just don't agree it's the right one. That is
probably just a matter of opinion however. As his patch doesn't have any
immediate problems, so I vote for that to be included in 6.5.

Thanks,
Ole Gjerde

#56

Bruce Momjian

maillist@candle.pha.pa.us

over 26 years ago

In reply to: Vadim Mikheev (#21)

Re: [HACKERS] Open 6.5 items

Folks, do we have anything to revisit here?

Tatsuo Ishii wrote:

I have just done cvs update and saw your changes. I tried the same
testing as I did before (64 conccurrent connections, and each
connection excutes 100 transactions), but it failed again.

(1) without -B 1024, it failed: out of free buffers: time to abort!

(2) with -B 1024, it went into stuck spin lock

So I looked into sources a little bit, and made a minor change to
include/storage/lock.h:

#define INIT_TABLE_SIZE 100

to:

#define INIT_TABLE_SIZE 4096

then restarted postmaster with -B 1024 (this will prevent
out-of-free-buffers problem, I guess). Now everything seems to work
great!

I suspect that huge INIT_TABLE_SIZE prevented dynamic expanding the
hash tables and seems there's something wrong in the routines
responsible for that.

Seems like that -:(

If we'll not fix expand hash code before 6.5 release then
I would recommend to don't use INIT_TABLE_SIZE in

lockMethodTable->lockHash = (HTAB *) ShmemInitHash(shmemName,
INIT_TABLE_SIZE, MAX_TABLE_SIZE,
&info, hash_flags);

and

lockMethodTable->xidHash = (HTAB *) ShmemInitHash(shmemName,
INIT_TABLE_SIZE, MAX_TABLE_SIZE,
&info, hash_flags);

but use NLOCKENTS(maxBackends) instead.

Vadim

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#57

Tom Lane

tgl@sss.pgh.pa.us

over 26 years ago

In reply to: Bruce Momjian (#56)

Re: [HACKERS] Open 6.5 items

Bruce Momjian <maillist@candle.pha.pa.us> writes:

Folks, do we have anything to revisit here?

I believe it is fixed --- I went back and found some more bugs in
dynahash.c after seeing Tatsuo's report ;-)

regards, tom lane

Tatsuo Ishii wrote:

I have just done cvs update and saw your changes. I tried the same
testing as I did before (64 conccurrent connections, and each
connection excutes 100 transactions), but it failed again.

(1) without -B 1024, it failed: out of free buffers: time to abort!

(2) with -B 1024, it went into stuck spin lock

So I looked into sources a little bit, and made a minor change to
include/storage/lock.h:

#define INIT_TABLE_SIZE 100

to:

#define INIT_TABLE_SIZE 4096

then restarted postmaster with -B 1024 (this will prevent
out-of-free-buffers problem, I guess). Now everything seems to work
great!

I suspect that huge INIT_TABLE_SIZE prevented dynamic expanding the
hash tables and seems there's something wrong in the routines
responsible for that.

Seems like that -:(

If we'll not fix expand hash code before 6.5 release then
I would recommend to don't use INIT_TABLE_SIZE in

lockMethodTable-> lockHash = (HTAB *) ShmemInitHash(shmemName,

INIT_TABLE_SIZE, MAX_TABLE_SIZE,
&info, hash_flags);

and

lockMethodTable-> xidHash = (HTAB *) ShmemInitHash(shmemName,

Show quoted text

INIT_TABLE_SIZE, MAX_TABLE_SIZE,
&info, hash_flags);

but use NLOCKENTS(maxBackends) instead.

Vadim

Import Notes

Reply to msg id not found: YourmessageofWed7Jul1999192043-0400199907072320.TAA13276@candle.pha.pa.us | Resolved by subject fallback