Re: [PATCHES] patches for 6.2.1p6

Started by Bruce Momjianalmost 28 years ago37 messages
#1Bruce Momjian
maillist@candle.pha.pa.us

Hi hackers,

I have old patches for version 6.2.1p6 which fix some problems and add
new features. Here is a short description of each patch file:

assert.patch

adds a switch to turn on/off the assert checking if enabled at compile
time. You can now compile postgres with assert checking and disable it
at runtime in a production environment.

async-unlisten.patch

declares Async_Unlisten() external so that it can be called by user
modules.

exec-limit.patch

removes the #ifdef NOT_USED around ExecutorLimit(). It is used.

exitpg.patch

limits recursive calls to exitpg() preventing an infinite loop
if an error is found inside exitpg.

libpgtcl-listen.patch

Just a change from upper to lowercase of an sql command in libpgtcl,
totally harmless.

new-locks.patch

After long studying and many debugging sessions I have finally
understood how the low level locks work.
I have completely rewritten lock.c cleaning up the code and adding
better assert checking. I have also added some fields to the lock
and xid tags for better support of user locks. This patch includes
also a patch submitted by Bruce Momjian which changes the handling
of lock priorities. It can however be disabled if an option is set
in pg_options, see tprintf.patch (Bruce patch works by building
the queue in reverse priority order, my old patch kept the queue in
decreasing order and traversed it from the other side).

pg-flush.patch

removes an unnecessary flush in libpq reducing network traffic and
increasing performance.

relname.patch

an utility which returns the relname corresponding to a given oid.
Useful for debug messages (see vacum.patch).

sequence.patch

added a setval() function which enables othe owner of a sequence
to set its value without need to delete and recreate it.

sinval.patch

fixes a problem in SI cache which causes table overflow if some
backend is idle for a long time while other backends keep adding
entries.
It uses the new signal handling implemented in tprintf.patch.
I have also increacasesed the max number of backends from 32 to 64 and
the table size from 1000 to 5000.

spin-lock.patch

I'm not sure if this is really useful, but it seems stupid to have
a backend wasting cpu cycles in a busy loop while the process which
should release the lock is waiting for the cpu. So I added a call
to process_yield() if the spin lock can't obtained.
This has been implemented and tested only on Linux. I don't know if
other OS have process_yield(). If someone can check please do it.

tprintf.patch

adds functions and macros which implement a conditional trace package
with the ability to change flags and numeric options of running
backends at runtime.
Options/flags can be specified in the command line and/or read from
the file pg_options in the data directory.
Running backends can be forced to update their options from this file
by sending them a SIGHUP signal (this is the convention used by most
unix daemons so I changed the meaning of SIGHUP).
Options can be debugging flags used by the trace package or any other
numeric value used by the backend, for example the deadlock_timeout.
Having flags and options specified at runtime and changed while the
backends are running can greatly simplify the debugging and tuning
of the database. New options can be defined in utils/misc/trace.c and
include/utils/trace.h. As an example of the usage of this package
see lock.c and proc.c which make use of new runtime options.

Old files using int flags or variables can be easily changed to
use the new package by substituting the old variable with a #define
like in the following example:

/* int my_flag = 0; */
#include "trace.h"
#define my_flag pg_options[OPT_MYFLAG]

I have done it in postgres.c and some other files and now I can turn
on/off any single debug flag on the fly with a simple shell script.
I have removed the IpcConfigTip() from ipc.c, it should better be
described in the postgres manual instead of being printed on stderr.

This patch provides also a new format of debugging messages which
are always in a single line with a timestamp and the backend pid:

#timestamp #pid #message
980127.17:52:14.173 [29271] StartTransactionCommand
980127.17:52:14.174 [29271] ProcessUtility: drop table t;
980127.17:52:14.186 [29271] SIIncNumEntries: table is 70% full
980127.17:52:14.186 [29286] Async_NotifyHandler
980127.17:52:14.186 [29286] Waking up sleeping backend process
980127.19:52:14.292 [29286] Async_NotifyFrontEnd
980127.19:52:14.413 [29286] Async_NotifyFrontEnd done
980127.19:52:14.466 [29286] Async_NotifyHandler done

This improves the readability of the log and allows one to understand
exactly which backend is doing what and at which time. It also makes
easier to write simple awk or perl scripts which monitor the log to
detect database errors or problem, or to compute transaction times.

The patch changes also the meaning of signals used by postgres, as
described by the following table:

postmaster backend

SIGHUP kill(*,sighup) read_pg_options
SIGINT kill(*,sigint), die die
SIGCHLD reaper -
SIGTTIN ignored -
SIGTTOU ignored -
SIGQUIT die handle_warn
SIGTERM kill(*,sigterm), kill(*,9), die die
SIGCONT dumpstatus -
SIGPIPE ignored die
SIGFPE - FloatExceptionHandler
SIGTSTP - ignored (alive test)
SIGUSR1 kill(*,sigusr1), die quickdie
SIGUSR2 kill(*,sigusr2) Async_NotifyHandler
(also SI buffer flush)

The main changes to the old implementation are SIGQUIT instead of
SIGHUP to handle warns, SIGHUP to reread pg_options and redirection
to all backends of SIGHUP, SIGINT, SIGTERM, SIGUSR1 and SIGUSR2.
In this way some of the signals sent to the postmaster can be sent
automatically to all the backends. To shut down postgres one needs
only to send a SIGTERM to postmaster and it will stop automatically
all the backends. This new signal handling mechanism is also used
to prevent SI cache table overflows: when a backend detects the SI
table full at 70% it simply sends a signal to the postmaster which
will wake up all idle backends and make them flush the cache.

vacuum.patch

adds a debug message to vacuum that prints the name of a table or
index *before* vacuuming it, if the verbose keyword is set.
This is useful to know which table is causing troubles if a
vacuum all crashes. Currently table information is printed only
at the end of each vacuum operation and is never printed if the
vacuum crashes.

--
Massimo Dal Zotto

Massimo, now that 6.3 is released, any chance of getting these patches
against the 6.3 source code?

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#2Noname
dg@illustra.com
In reply to: Bruce Momjian (#1)
Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6

Hi hackers,

I have old patches for version 6.2.1p6 which fix some problems and add
new features. Here is a short description of each patch file:

spin-lock.patch

I'm not sure if this is really useful, but it seems stupid to have
a backend wasting cpu cycles in a busy loop while the process which
should release the lock is waiting for the cpu. So I added a call
to process_yield() if the spin lock can't obtained.
This has been implemented and tested only on Linux. I don't know if
other OS have process_yield(). If someone can check please do it.

The generic way to do this is

select( NULL_FDSET, NULL_FDSET, NULL_FDSET, &delaytime, NULL);

Delay time may be 0, but a random value between 0 and say 30 msec seems
to be optimal. Hard busy wait spinlocks cause huge performance problems with
heavily loaded systems and lots of postgres backends. Basically one backend
ends up with the lock and gets scheduled out holding it, every else queues
up busywaiting behind this one. But the backend holding the lock cannot
release it until all the other backeds waiting on the lock exhaust a full
timeslice busywaiting. Get 20 of these guys going (like on a busy website) and
the system pretty much stops doing any work at all.

I say we should get this in as soon as we can.

--
Massimo Dal Zotto

Massimo, now that 6.3 is released, any chance of getting these patches
against the 6.3 source code?

--
Bruce Momjian | 830 Blythe Avenue
maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026

-dg

David Gould dg@illustra.com 510.628.3783 or 510.305.9468
Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612
- I realize now that irony has no place in business communications.

#3The Hermit Hacker
scrappy@hub.org
In reply to: Noname (#2)
Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6

On Mon, 16 Mar 1998, David Gould wrote:

The generic way to do this is

select( NULL_FDSET, NULL_FDSET, NULL_FDSET, &delaytime, NULL);

Delay time may be 0, but a random value between 0 and say 30 msec seems
to be optimal. Hard busy wait spinlocks cause huge performance problems with
heavily loaded systems and lots of postgres backends. Basically one backend
ends up with the lock and gets scheduled out holding it, every else queues
up busywaiting behind this one. But the backend holding the lock cannot
release it until all the other backeds waiting on the lock exhaust a full
timeslice busywaiting. Get 20 of these guys going (like on a busy website) and
the system pretty much stops doing any work at all.

I say we should get this in as soon as we can.

Can you submit an appropriate patch that can be included in the mega-patch
to be created on Sunday?

Thanks...

Marc G. Fournier
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org

#4Bruce Momjian
maillist@candle.pha.pa.us
In reply to: The Hermit Hacker (#3)
Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6

On Mon, 16 Mar 1998, David Gould wrote:

The generic way to do this is

select( NULL_FDSET, NULL_FDSET, NULL_FDSET, &delaytime, NULL);

Delay time may be 0, but a random value between 0 and say 30 msec seems
to be optimal. Hard busy wait spinlocks cause huge performance problems with
heavily loaded systems and lots of postgres backends. Basically one backend
ends up with the lock and gets scheduled out holding it, every else queues
up busywaiting behind this one. But the backend holding the lock cannot
release it until all the other backeds waiting on the lock exhaust a full
timeslice busywaiting. Get 20 of these guys going (like on a busy website) and
the system pretty much stops doing any work at all.

I say we should get this in as soon as we can.

Can you submit an appropriate patch that can be included in the mega-patch
to be created on Sunday?

Just a warning that this is not going to be easy. We have OS-specific
code for spinlocks in include/storage/s_lock.h and
backend/storage/buffer/s_lock.c. So each S_LOCK macro call has to have
its test-and-set logic de-coupled with its while-lock-fail-try-again
logic. Most of them are easy, but some like VAX:

#define S_LOCK(addr) __asm__("1: bbssi $0,(%0),1b": :"r"(addr))

are hard to de-couple. Now, I did not know we supported NetBSD on VAX.
Does it work, anyone? Can I remove it?

This is going to be pretty tough to test on every platform we support,
so if it is done now, it will have to be done carefully.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#5Thomas G. Lockhart
lockhart@alumni.caltech.edu
In reply to: Bruce Momjian (#4)
Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6

Can you submit an appropriate patch that can be included in the
mega-patch to be created on Sunday?

Just a warning that this is not going to be easy. We have OS-specific
code for spinlocks in include/storage/s_lock.h and
backend/storage/buffer/s_lock.c. So each S_LOCK macro call has to
have its test-and-set logic de-coupled with its
while-lock-fail-try-again logic.
Most of them are easy, but some like VAX:

#define S_LOCK(addr) __asm__("1: bbssi $0,(%0),1b": :"r"(addr))

are hard to de-couple. Now, I did not know we supported NetBSD on
VAX. Does it work, anyone? Can I remove it?

NetBSD on VAX in on our supported list, and was verified for v6.3 by Tom
Helbekkmo.

This is going to be pretty tough to test on every platform we support,
so if it is done now, it will have to be done carefully.

Is this behavior in v6.2.x? In any case, if it is anything but minimally
trivial, it should be given a test on every supported platform, since it
hits the heart of the platform-specific code, doesn't it? Seems like it
should be put into the CVS tree and shaken out until the next release...

- Tom

#6The Hermit Hacker
scrappy@hub.org
In reply to: Thomas G. Lockhart (#5)
Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6

On Tue, 17 Mar 1998, Thomas G. Lockhart wrote:

This is going to be pretty tough to test on every platform we support,
so if it is done now, it will have to be done carefully.

Is this behavior in v6.2.x? In any case, if it is anything but minimally
trivial, it should be given a test on every supported platform, since it
hits the heart of the platform-specific code, doesn't it? Seems like it
should be put into the CVS tree and shaken out until the next release...

Not realizing what was involved, I have to agree here...*after* I
get a post-release patch out on Sunday? :)

Marc G. Fournier
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org

#7Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Thomas G. Lockhart (#5)
Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6

Can you submit an appropriate patch that can be included in the
mega-patch to be created on Sunday?

are hard to de-couple. Now, I did not know we supported NetBSD on
VAX. Does it work, anyone? Can I remove it?

NetBSD on VAX in on our supported list, and was verified for v6.3 by Tom
Helbekkmo.

This is going to be pretty tough to test on every platform we support,
so if it is done now, it will have to be done carefully.

Is this behavior in v6.2.x? In any case, if it is anything but minimally
trivial, it should be given a test on every supported platform, since it
hits the heart of the platform-specific code, doesn't it? Seems like it
should be put into the CVS tree and shaken out until the next release...

Yea, that is what I was hinting at.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#8Noname
dg@illustra.com
In reply to: Bruce Momjian (#7)
Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6

Can you submit an appropriate patch that can be included in the
mega-patch to be created on Sunday?

are hard to de-couple. Now, I did not know we supported NetBSD on
VAX. Does it work, anyone? Can I remove it?

NetBSD on VAX in on our supported list, and was verified for v6.3 by Tom
Helbekkmo.

This is going to be pretty tough to test on every platform we support,
so if it is done now, it will have to be done carefully.

Is this behavior in v6.2.x? In any case, if it is anything but minimally
trivial, it should be given a test on every supported platform, since it
hits the heart of the platform-specific code, doesn't it? Seems like it
should be put into the CVS tree and shaken out until the next release...

Yea, that is what I was hinting at.

--
Bruce Momjian | 830 Blythe Avenue
maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026

I tend to agree but am willing to compromise.

Can we do only the easy platforms at this time and then fix the others later?

Since S_LOCK is a macro, it could be

#define S_LOCK s_lock_with_backoff

on the easy platforms and

#define S_LOCK original_definition

on the tricky or hard to test platforms

If this will work, I am willing to hack this together tomorrow.
What is the time frame for accepting a patch like this?

-dg

David Gould dg@illustra.com 510.628.3783 or 510.305.9468
Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612
- I realize now that irony has no place in business communications.

#9Thomas G. Lockhart
lockhart@alumni.caltech.edu
In reply to: Noname (#8)
Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6

David Gould wrote:

Can you submit an appropriate patch that can be included in the
mega-patch to be created on Sunday?

are hard to de-couple. Now, I did not know we supported NetBSD on
VAX. Does it work, anyone? Can I remove it?

NetBSD on VAX in on our supported list, and was verified for v6.3 by Tom
Helbekkmo.

This is going to be pretty tough to test on every platform we support,
so if it is done now, it will have to be done carefully.

Is this behavior in v6.2.x? In any case, if it is anything but minimally
trivial, it should be given a test on every supported platform, since it
hits the heart of the platform-specific code, doesn't it? Seems like it
should be put into the CVS tree and shaken out until the next release...

Yea, that is what I was hinting at.

--
Bruce Momjian | 830 Blythe Avenue
maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026

I tend to agree but am willing to compromise.

Can we do only the easy platforms at this time
If this will work, I am willing to hack this together tomorrow.
What is the time frame for accepting a patch like this?

What do you think it would take to exercise the patch heavily enough on
any of the affected platforms to ensure that it behaves no worse than
the current code? Is a test case easy to set up and run? If so, you can
probably get ~5 platforms tested in the next few days, and reduce the
risk of including this in the mega-patch.

Alternatively, we could do this the day _after_ the mega-patch, so that
the two are decoupled, and have a nice "slock patch" a few days later.

At the moment, there aren't any large backend patches waiting to go (I
think Vadim is still sleeping to recover from v6.3 :)

- Tom

#10The Hermit Hacker
scrappy@hub.org
In reply to: Noname (#8)
Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6

On Mon, 16 Mar 1998, David Gould wrote:

If this will work, I am willing to hack this together tomorrow.
What is the time frame for accepting a patch like this?

Assuming that its *clean* (clean meaning that Bruce fully approves
of it, as this is his area of the code...well, one of them
*grin*)...tomorrow would be great :) If Bruce has *any* doubts though, it
doesn't go in until after I do the patch I want to do...

Marc G. Fournier
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org

#11Noname
dg@illustra.com
In reply to: Thomas G. Lockhart (#9)
Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6

David Gould wrote:

Can you submit an appropriate patch that can be included in the
mega-patch to be created on Sunday?

are hard to de-couple. Now, I did not know we supported NetBSD on
VAX. Does it work, anyone? Can I remove it?

NetBSD on VAX in on our supported list, and was verified for v6.3 by Tom
Helbekkmo.

This is going to be pretty tough to test on every platform we support,
so if it is done now, it will have to be done carefully.

Is this behavior in v6.2.x? In any case, if it is anything but minimally
trivial, it should be given a test on every supported platform, since it
hits the heart of the platform-specific code, doesn't it? Seems like it
should be put into the CVS tree and shaken out until the next release...

Yea, that is what I was hinting at.

--
Bruce Momjian | 830 Blythe Avenue
maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026

I tend to agree but am willing to compromise.

Can we do only the easy platforms at this time
If this will work, I am willing to hack this together tomorrow.
What is the time frame for accepting a patch like this?

What do you think it would take to exercise the patch heavily enough on
any of the affected platforms to ensure that it behaves no worse than
the current code? Is a test case easy to set up and run? If so, you can
probably get ~5 platforms tested in the next few days, and reduce the
risk of including this in the mega-patch.

Simple but clumsy test:

Multiple backends all doing single row inserts into private tables. Run it
for a fixed time period and add up the total number of rows. Do this for both
a small (eg 2) and large (eg 20 or 40) number of backends.

psuedo perl /sh / sql follows
# driver <number of users> <delay time> <run time>
synchtime = timenow
starttime = synctime + delaytime;
endtime = starttime + runtime
for user 1 to number of users
singleuser synchtime delay_time run_time

# singleuser <userno> <starttime> <endtime>
exec sql 'create table user$user (i int);'

sleep until start time
n = 1;
while timenow < endtime
exec sql "insert into user$user values($n);"

print "user $user inserted $n"

The figure of merit is the total number of inserts. Oh and that the system
does not fall over.

But, it may be best to leave this until after the mega patch. I am not sure
I want to share the blame ;-).

Just to fill me in, where does the mega patch fall in with the next release
snapshot? That is, if this misses the mega patch is it waiting until 6.4?

-dg

David Gould dg@illustra.com 510.628.3783 or 510.305.9468
Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612
- I realize now that irony has no place in business communications.

#12The Hermit Hacker
scrappy@hub.org
In reply to: Noname (#11)
Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6

On Mon, 16 Mar 1998, David Gould wrote:

Just to fill me in, where does the mega patch fall in with the next release
snapshot? That is, if this misses the mega patch is it waiting until 6.4?

That is pretty much up to you, actually. There is nothing wrong
with a clean patch for this being placed in the patches directory, if it
can be done shortly after the post-release patch. Basically, starting
April 1st (or so), development basically starts up again, so backtracking
a patch from v6.4-alpha to v6.3 release can prove difficult :(

Marc G. Fournier
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org

#13Bruce Momjian
maillist@candle.pha.pa.us
In reply to: The Hermit Hacker (#10)
Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6

On Mon, 16 Mar 1998, David Gould wrote:

If this will work, I am willing to hack this together tomorrow.
What is the time frame for accepting a patch like this?

Assuming that its *clean* (clean meaning that Bruce fully approves
of it, as this is his area of the code...well, one of them
*grin*)...tomorrow would be great :) If Bruce has *any* doubts though, it
doesn't go in until after I do the patch I want to do...

David, go for it. The code is all local in two files, and I think you
can basically change all the do{test-and-set} while(lock-is-false)
loops to:

do{test-and-set} while(lock-is-false && select ())

Pretty easy. No need to test multiple platforms. The ones where the
loop is integrated into the asm(), leave them for later.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#14Thomas G. Lockhart
lockhart@alumni.caltech.edu
In reply to: Noname (#11)
Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6

But, it may be best to leave this until after the mega patch. I am not
sure I want to share the blame ;-).

Just to fill me in, where does the mega patch fall in with the next
release snapshot? That is, if this misses the mega patch is it waiting
until 6.4?

I would guess that we could post a separate patch any time soon,
especially since the changes are apparently isolated to only a few
places in the code. In the last release, we posted ~7 patches, each
independent of the others, and generated on our local source trees. Each
of the patches was, however, fairly simple, quite often only one or a
few lines of change, and were intended as bug fixes. Also, they were
easily tested. In any case, at a minimum the regression test should be
run (and passed!).

- Tom

#15Tom Ivar Helbekkmo
tih@Hamartun.Priv.NO
In reply to: Bruce Momjian (#4)
Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6

* Bruce Momjian
|
| Just a warning that this is not going to be easy. We have OS-specific
| code for spinlocks in include/storage/s_lock.h and
| backend/storage/buffer/s_lock.c. So each S_LOCK macro call has to have
| its test-and-set logic de-coupled with its while-lock-fail-try-again
| logic. Most of them are easy, but some like VAX:
|
| #define S_LOCK(addr) __asm__("1: bbssi $0,(%0),1b": :"r"(addr))
|
| are hard to de-couple. Now, I did not know we supported NetBSD on VAX.
| Does it work, anyone? Can I remove it?

Yes, it works. No, please don't break it. Heck, I only just got it
in in time for 6.3! :-) The not-so-busy-waiting-spinlock stuff can be
put in on a platform at a time -- I'll expand the VAX version to do
the right thing once someone has done another platform, so I can see
what's the preferred way of doing it.

-tih
--
Popularity is the hallmark of mediocrity. --Niles Crane, "Frasier"

#16Noname
dg@illustra.com
In reply to: Tom Ivar Helbekkmo (#15)
Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6

* Bruce Momjian
| Just a warning that this is not going to be easy. We have OS-specific
| code for spinlocks in include/storage/s_lock.h and
| backend/storage/buffer/s_lock.c. So each S_LOCK macro call has to have
| its test-and-set logic de-coupled with its while-lock-fail-try-again
| logic. Most of them are easy, but some like VAX:
|
| #define S_LOCK(addr) __asm__("1: bbssi $0,(%0),1b": :"r"(addr))
|
| are hard to de-couple. Now, I did not know we supported NetBSD on VAX.
| Does it work, anyone? Can I remove it?

Yes, it works. No, please don't break it. Heck, I only just got it
in in time for 6.3! :-) The not-so-busy-waiting-spinlock stuff can be
put in on a platform at a time -- I'll expand the VAX version to do
the right thing once someone has done another platform, so I can see
what's the preferred way of doing it.

-tih

I won't. I hope.

Seriously, if you want to, please create a function to emulate the following:

/*
* tas(lock)
*
* Access to platform specific test_and_set functionality. Given pointer to
* lock attempts to acquire the lock atomically.
*
* Returns 0 for success, nonzero for failure.
*/
typedef slock_t unsigned char; /* or whatever works on the platform */

int tas(slock_t *lock)
{
slock_t tmp;

/* atomic, interlocked */
tmp = *lock;
*lock = -1; /* any nonzero will do here */

return (tmp != 0);
}

Given this, I can fold the VAX right into the grand scheme, just like a
normal computer (;-)).

-dg

David Gould dg@illustra.com 510.628.3783 or 510.305.9468
Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612
- I realize now that irony has no place in business communications.

#17Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Tom Ivar Helbekkmo (#15)
Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6

* Bruce Momjian
|
| Just a warning that this is not going to be easy. We have OS-specific
| code for spinlocks in include/storage/s_lock.h and
| backend/storage/buffer/s_lock.c. So each S_LOCK macro call has to have
| its test-and-set logic de-coupled with its while-lock-fail-try-again
| logic. Most of them are easy, but some like VAX:
|
| #define S_LOCK(addr) __asm__("1: bbssi $0,(%0),1b": :"r"(addr))
|
| are hard to de-couple. Now, I did not know we supported NetBSD on VAX.
| Does it work, anyone? Can I remove it?

Yes, it works. No, please don't break it. Heck, I only just got it
in in time for 6.3! :-) The not-so-busy-waiting-spinlock stuff can be
put in on a platform at a time -- I'll expand the VAX version to do
the right thing once someone has done another platform, so I can see
what's the preferred way of doing it.

OK, now I know that the VAX stuff is still used and supported. Good.
We don't have good platform-specific information on NetBSD and Linux
ports.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#18Thomas G. Lockhart
lockhart@alumni.caltech.edu
In reply to: Bruce Momjian (#17)
Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6

OK, now I know that the VAX stuff is still used and supported. Good.
We don't have good platform-specific information on NetBSD and Linux
ports.

??

Have you checked the hardcopy or html versions of the "Supported
Platforms" section in the Administrator's Guide? _Every_ entry in this
was updated and refreshed at the end of February.

What other info should we be collecting for this?

- Tom

#19Noname
dg@illustra.com
In reply to: Thomas G. Lockhart (#18)
Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6

Right now, there is not much chance of catching a signal while waiting for
a spinlock. This is good, cause the code that waits for spinlocks tends to
be doing funny things with buffers and locks and shared stuff like that.
We don't catch signals because we don't make syscalls. But, once this goes in,
we will be calling select() a _lot_ and so it kinda becomes the likely place
for a signal to get delivered. Without thinking about it and looking at the
code a bit longer, I am not sure it is prudent to rush this in. I still want
it in as soon a possible, but possible includes free from harmful side effects.

Well, signals are handled in the backend by tcop/postgres.c. In
most/all? cases, a signal causes a longjump() out of the code and
releases locks and aborts the transaction.

I was afraid that would be the answer. Basically, this never worked. The
problem is that in an indeterminate number of places the code manipulates
multiple distinct but related structures in a sequence. To leave the server
in a consistant state all these updates need to be done in the right order
so that if the sequence in interrupted by a signal the partially updated
state is consistant. Unhappily, the original coders did not always have this
in mind. An example:

cleaning up after a scan
1 - release buffer pointed by scan descriptor
2 - release scan descriptor

If a signal is taken just after one, the abort code will see the scan
descriptor and call the cleanup for it resulting in:

cleaning up after a scan (take 2)
1 - release buffer pointed by scan descriptor
- Whoopsie, buffer already released!
2 - release scan descriptor

These sequences either _all_ have to identified and fixed, or made atomic
somehow, which is a biggish job.

Or the system has to acknowledge signals at only clearly defined points.

My preference is to have signal handlers only set a global flag to indicate
that a signal was seen. Then one just sprinkles check_for_interrupts() calls
in all the likely places: step to next node, fetch next page, call function,
etc.

The way this shows up in real life is strange unreproduceable errors on busy
systems, especially when backends are killed or transactions are forced to
abort.

Fixing this is a bit of a project, but the result is that a lot of mystery
bugs go away.

I considered the possibility that your select() could return EINTR, so I
was thinking of suggesting something like

do { } while (lock-no-set && (select(),true) )

or something like that so the return value of select is always true. I
also recommend making a S_LOCK macro, and inside the while loop, call
the OS-specific lock stuff, so you don't have to code the select() for
each platform. So you have two macros, the S_LOCK macro with the while,
and inside the while, you call S_LOCK_SPIN() which is defined for each
platform. More centralized and less error-prone.

You may like what I have done then. S_LOCK() becomes a function:

S_LOCK(lock)
{
do {
while (!S_LOCK_FREE(lock)) /* non interlocked test to avoid */
{ /* hammering the bus and caches */
select( a_semi_random_time_delay_with_backoff );
}
} while (TAS(lock)); /* TAS: test and set
}

S_LOCK_FREE(), and TAS() are platform specific macros that invoke the
platform specific implementation. There are default implementations for
all the platform interface macros, so most of the time all that is needed
is to create a tas() function in asm and the rest falls into place.

Now I have to go home and test it.

Btw, feel free to forward this to the list if you feel it is of interest. I
didn't because I didn't want to quote you from private mail.

-dg

David Gould dg@illustra.com 510.628.3783 or 510.305.9468
Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612
- I realize now that irony has no place in business communications.

#20Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Thomas G. Lockhart (#18)
Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6

OK, now I know that the VAX stuff is still used and supported. Good.
We don't have good platform-specific information on NetBSD and Linux
ports.

??

Have you checked the hardcopy or html versions of the "Supported
Platforms" section in the Administrator's Guide? _Every_ entry in this
was updated and refreshed at the end of February.

What other info should we be collecting for this?

- Tom

Sorry, haven't looked there yet.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#21Massimo Dal Zotto
dz@cs.unitn.it
In reply to: Bruce Momjian (#1)
Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6

Hi hackers,

I have old patches for version 6.2.1p6 which fix some problems and add
new features. Here is a short description of each patch file:

...

--
Massimo Dal Zotto

Massimo, now that 6.3 is released, any chance of getting these patches
against the 6.3 source code?

I have already applied all my old patches against 6.3 except the lock
patch. The lock code has changed and I have to verify if my patches are
still compatible. It will take some time and I haven't very much.
I found also an interesting bug in the notify code. I will post the new
patches when I find some spare time to verify them.
Could you please send me some documentation on the new lock and deadlock
code in the meantime ?

Massimo Dal Zotto

+----------------------------------------------------------------------+
|  Massimo Dal Zotto                e-mail:  dz@cs.unitn.it            |
|  Via Marconi, 141                 phone:  ++39-461-534251            |
|  38057 Pergine Valsugana (TN)     www:  http://www.cs.unitn.it/~dz/  |
|  Italy                            pgp:  finger dz@tango.cs.unitn.it  |
+----------------------------------------------------------------------+
#22Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Massimo Dal Zotto (#21)
Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6

I have already applied all my old patches against 6.3 except the lock
patch. The lock code has changed and I have to verify if my patches are

Great.

still compatible. It will take some time and I haven't very much.
I found also an interesting bug in the notify code. I will post the new
patches when I find some spare time to verify them.
Could you please send me some documentation on the new lock and deadlock
code in the meantime ?

Here is some comments from storage/lmgr/lock.c:DeadLockCheck(). This
was a real mind-bender for me. It finds holders of the lock it has been
waiting on, and checks to see if any of those holders is waiting on the
lock I own. If not, it then checks all the holders of locks these new
processes are waiting on, and checks to see if they are waiting on my
lock, and it continues until it has traced all backend process id's
related to my lock. I have a static checked_procs[] array that keeps
track of what I have checked so I don't loop. The code is recursive.

The new wait queue handling is documented in storage/lmgr/lock.c:ProcSleep().

---------------------------------------------------------------------------

* This code takes a list of locks a process holds, and the lock that
* the process is sleeping on, and tries to find if any of the processes
* waiting on its locks hold the lock it is waiting for. If no deadlock
* is found, it goes on to look at all the processes waiting on their locks.
*
* We have already locked the master lock before being called.
*/
bool
DeadLockCheck(SHM_QUEUE *lockQueue, LOCK *findlock, bool skip_check)

---------------------------------------------------------------------------

/*
* This is our only check to see if we found the lock we want.
*
* The lock we are waiting for is already in MyProc->lockQueue so we
* need to skip it here. We are trying to find it in someone
* else's lockQueue.
*/
if (lock == findlock && !skip_check)

---------------------------------------------------------------------------

/*
* For findlock's wait queue, we are interested in
* procs who are blocked waiting for a write-lock on
* the table we are waiting on, and already hold a
* lock on it. We first check to see if there is an
* escalation deadlock, where we hold a readlock and
* want a writelock, and someone else holds readlock
* on the same table, and wants a writelock.
*
* Basically, the test is, "Do we both hold some lock on
* findlock, and we are both waiting in the lock
* queue?"
*/

---------------------------------------------------------------------------

* For non-MyProc entries, we are looking only
* waiters, not necessarily people who already
* hold locks and are waiting. Now we check for
* cases where we have two or more tables in a
* deadlock. We do this by continuing to search
* for someone holding a lock
*/
if (DeadLockCheck(&(proc->lockQueue), findlock, false))

---------------------------------------------------------------------------

/*
* If the first entries in the waitQueue have a greater priority than
* we have, we must be a reader, and they must be a writers, and we
* must be here because the current holder is a writer or a reader but
* we don't share shared locks if a writer is waiting. We put
* ourselves after the writers. This way, we have a FIFO, but keep
* the readers together to give them decent priority, and no one
* starves. Because we group all readers together, a non-empty queue
* only has a few possible configurations:
*
* [readers] [writers] [readers][writers] [writers][readers]
* [writers][readers][writers]
*
* In a full queue, we would have a reader holding a lock, then a writer
* gets the lock, then a bunch of readers, made up of readers who
* could not share the first readlock because a writer was waiting,
* and new readers arriving while the writer had the lock.
*
*/

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#23Massimo Dal Zotto
dz@cs.unitn.it
In reply to: Bruce Momjian (#13)
Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6

On Mon, 16 Mar 1998, David Gould wrote:

If this will work, I am willing to hack this together tomorrow.
What is the time frame for accepting a patch like this?

Assuming that its *clean* (clean meaning that Bruce fully approves
of it, as this is his area of the code...well, one of them
*grin*)...tomorrow would be great :) If Bruce has *any* doubts though, it
doesn't go in until after I do the patch I want to do...

David, go for it. The code is all local in two files, and I think you
can basically change all the do{test-and-set} while(lock-is-false)
loops to:

do{test-and-set} while(lock-is-false && select ())

Pretty easy. No need to test multiple platforms. The ones where the
loop is integrated into the asm(), leave them for later.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
+  If your life is a hard drive,     |  (610) 353-9879(w)
+  Christ can be your backup.        |  (610) 853-3000(h)

I'am against a generic patch using select(). If we have sched_yield() on an
architecture I don't see why dont't use it. Here is the patch for Linux.
It has been tested for two months by 100 users without any problem.
The only thing I would add is a more general configuration test in configure
to include the proper include files.

*** src/include/storage/s_lock.h.orig	Sat Oct 18 22:39:21 1997
--- src/include/storage/s_lock.h	Wed Nov 19 23:11:14 1997
***************
*** 294,300 ****
--- 294,314 ----
   */
  #if defined(NEED_I386_TAS_ASM)
+ #include <unistd.h>
+ #include <sched.h>
+ #ifdef _POSIX_PRIORITY_SCHEDULING
+ #define	S_LOCK(lock)	do \
+ 						{ \
+ 							slock_t		_res; \
+ 							do \
+ 							{ \
+ 								__asm__("xchgb %0,%1": "=q"(_res), \
+ 										"=m"(*lock):"0"(0x1)); \
+ 								if (_res) sched_yield(); \
+ 							} while (_res != 0); \
+ 						} while (0)
+ #else
  #define	S_LOCK(lock)	do \
  						{ \
  							slock_t		_res; \
***************
*** 303,308 ****
--- 317,323 ----
  				__asm__("xchgb %0,%1": "=q"(_res), "=m"(*lock):"0"(0x1)); \
  							} while (_res != 0); \
  						} while (0)
+ #endif

#define S_UNLOCK(lock) (*(lock) = 0)

Massimo Dal Zotto

+----------------------------------------------------------------------+
|  Massimo Dal Zotto                e-mail:  dz@cs.unitn.it            |
|  Via Marconi, 141                 phone:  ++39-461-534251            |
|  38057 Pergine Valsugana (TN)     www:  http://www.cs.unitn.it/~dz/  |
|  Italy                            pgp:  finger dz@tango.cs.unitn.it  |
+----------------------------------------------------------------------+
#24Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Massimo Dal Zotto (#23)
Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6

I'am against a generic patch using select(). If we have sched_yield() on an
architecture I don't see why dont't use it. Here is the patch for Linux.
It has been tested for two months by 100 users without any problem.
The only thing I would add is a more general configuration test in configure
to include the proper include files.

I understand your issue. Unfortunately, only Linux has sched_yield(),
as far as I know. Perhaps we can implement sched_yield/select based on
the platform. However, if someone is holding a spinlock, does
sched_yield() give the other process enough time to finish with the
spinlock before we start checking it again. Seems select() allows us to
control the time we wait before checking again.

Also, it looks like the s_lock.h file is going to change pretty
radically from David's change, so when he is done, we can put some
OS-specific stuff if you wish.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#25Noname
dg@illustra.com
In reply to: Massimo Dal Zotto (#23)
Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6

Massimo Dal Zotto writes:

David, go for it. The code is all local in two files, and I think you
can basically change all the do{test-and-set} while(lock-is-false)
loops to:

do{test-and-set} while(lock-is-false && select ())

Pretty easy. No need to test multiple platforms. The ones where the
loop is integrated into the asm(), leave them for later.

--
Bruce Momjian | 830 Blythe Avenue

I'am against a generic patch using select(). If we have sched_yield() on an
architecture I don't see why dont't use it. Here is the patch for Linux.
It has been tested for two months by 100 users without any problem.
The only thing I would add is a more general configuration test in configure
to include the proper include files.

*** src/include/storage/s_lock.h.orig	Sat Oct 18 22:39:21 1997
--- src/include/storage/s_lock.h	Wed Nov 19 23:11:14 1997
***************
*** 294,300 ****
--- 294,314 ----
*/
#if defined(NEED_I386_TAS_ASM)
+ #include <unistd.h>
+ #include <sched.h>
+ #ifdef _POSIX_PRIORITY_SCHEDULING
+ #define	S_LOCK(lock)	do \
+ 					{ \
+ 						slock_t		_res; \
+ 						do \
+ 						{ \
+ 							__asm__("xchgb %0,%1": "=q"(_res), \
+ 									"=m"(*lock):"0"(0x1)); \
+ 							if (_res) sched_yield(); \
+ 						} while (_res != 0); \
+ 					} while (0)
+ #else
#define	S_LOCK(lock)	do \
{ \
slock_t		_res; \
***************
*** 303,308 ****
--- 317,323 ----
__asm__("xchgb %0,%1": "=q"(_res), "=m"(*lock):"0"(0x1)); \
} while (_res != 0); \
} while (0)
+ #endif

#define S_UNLOCK(lock) (*(lock) = 0)

+----------------------------------------------------------------------+
|  Massimo Dal Zotto                e-mail:  dz@cs.unitn.it            |
|  Via Marconi, 141                 phone:  ++39-461-534251            |
|  38057 Pergine Valsugana (TN)     www:  http://www.cs.unitn.it/~dz/  |
|  Italy                            pgp:  finger dz@tango.cs.unitn.it  |
+----------------------------------------------------------------------+

I am perfectly happy to use sched_yield() on Linux. My goal however is to
make the concept work on all platforms.

There was a recent thread on comp.os.linux.system on context switch times
of Linux vs NTthat revealed that sched_yield() is fairly costly as it causes
the scheduler to do a full scan of the process table and recalculate the
the priorities of all processes. Probably not a problem, but it should
be it should probably be benchmarked both ways.

Finally, even though this appears to work, there is a possible stability
problem with both approaches. Here is some of the discussion I had about
that with Bruce:

-----------------------------------------------------------------------
David:

Right now, there is not much chance of catching a signal while waiting for
a spinlock. This is good, cause the code that waits for spinlocks tends to
be doing funny things with buffers and locks and shared stuff like that.
We don't catch signals because we don't make syscalls. But, once this goes in,
we will be calling select() a _lot_ and so it kinda becomes the likely place
for a signal to get delivered. Without thinking about it and looking at the
code a bit longer, I am not sure it is prudent to rush this in. I still want
it in as soon a possible, but possible includes free from harmful side effects.

Bruce:

Well, signals are handled in the backend by tcop/postgres.c. In
most/all? cases, a signal causes a longjump() out of the code and
releases locks and aborts the transaction.

David:

I was afraid that would be the answer. Basically, this never worked. The
problem is that in an indeterminate number of places the code manipulates
multiple distinct but related structures in a sequence. To leave the server
in a consistant state all these updates need to be done in the right order
so that if the sequence in interrupted by a signal the partially updated
state is consistant. Unhappily, the original coders did not always have this
in mind. An example:

cleaning up after a scan
1 - release buffer pointed by scan descriptor
2 - release scan descriptor

If a signal is taken just after one, the abort code will see the scan
descriptor and call the cleanup for it resulting in:

cleaning up after a scan (take 2)
1 - release buffer pointed by scan descriptor
- Whoopsie, buffer already released!
2 - release scan descriptor

These sequences either _all_ have to identified and fixed, or made atomic
somehow, which is a biggish job.

Or the system has to acknowledge signals at only clearly defined points.
My preference is to have signal handlers only set a global flag to indicate
that a signal was seen. Then one just sprinkles tests of the flag in
all the likely places: step to next node, fetch next page, call function, etc.

The way this shows up in real life is strange unreproduceable errors on busy
systems, especially when backends are killed or transactions are forced to
abort.

Fixing this is a bit of a project, but the result is that a lot of mystery
bugs go away.

--------------------------------------------------------------------------

-dg

David Gould dg@illustra.com 510.628.3783 or 510.305.9468
Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612
- I realize now that irony has no place in business communications.

#26The Hermit Hacker
scrappy@hub.org
In reply to: Bruce Momjian (#24)
Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6

On Fri, 20 Mar 1998, Bruce Momjian wrote:

I'am against a generic patch using select(). If we have sched_yield() on an
architecture I don't see why dont't use it. Here is the patch for Linux.
It has been tested for two months by 100 users without any problem.
The only thing I would add is a more general configuration test in configure
to include the proper include files.

I understand your issue. Unfortunately, only Linux has sched_yield(),
as far as I know. Perhaps we can implement sched_yield/select based on
the platform.

What's the possibility of doing this similar to how we do some of
the other functions (dl_open comes immediately to mind)...make a
pg_sched_yield function and use that, which is built based on the various
platforms?

Right now, I don't believe we have *anything* in place, so have
pg_sched_yield() return 0 (or an equivalent) for every platform except for
Linux...

Marc G. Fournier
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org

#27Noname
dg@illustra.com
In reply to: The Hermit Hacker (#26)
Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6

On Fri, 20 Mar 1998, Bruce Momjian wrote:

I'am against a generic patch using select(). If we have sched_yield() on an
architecture I don't see why dont't use it. Here is the patch for Linux.
It has been tested for two months by 100 users without any problem.
The only thing I would add is a more general configuration test in configure
to include the proper include files.

I understand your issue. Unfortunately, only Linux has sched_yield(),
as far as I know. Perhaps we can implement sched_yield/select based on
the platform.

What's the possibility of doing this similar to how we do some of
the other functions (dl_open comes immediately to mind)...make a
pg_sched_yield function and use that, which is built based on the various
platforms?

Right now, I don't believe we have *anything* in place, so have
pg_sched_yield() return 0 (or an equivalent) for every platform except for
Linux...

Marc G. Fournier
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org

I appreciate all the help, but I think I have a solution for this. Details
next week...

-dg

David Gould dg@illustra.com 510.628.3783 or 510.305.9468
Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612
- I realize now that irony has no place in business communications.

#28Bruce Momjian
maillist@candle.pha.pa.us
In reply to: The Hermit Hacker (#26)
Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6

What's the possibility of doing this similar to how we do some of
the other functions (dl_open comes immediately to mind)...make a
pg_sched_yield function and use that, which is built based on the various
platforms?

Right now, I don't believe we have *anything* in place, so have
pg_sched_yield() return 0 (or an equivalent) for every platform except for
Linux...

Probably even easier. Just use #ifdef linux around the select or
sched_yield.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#29Mattias Kregert
matti@algonet.se
In reply to: The Hermit Hacker (#26)
Re: sched_yield()

The Hermit Hacker wrote:

What's the possibility of doing this similar to how we do some of
the other functions (dl_open comes immediately to mind)...make a
pg_sched_yield function and use that, which is built based on the various
platforms?

Right now, I don't believe we have *anything* in place, so have
pg_sched_yield() return 0 (or an equivalent) for every platform except for
Linux...

But sched_yield() is not Linux-specific:
-- The sched_yield() function relinquishes the processor for the
-- running process.
-- IEEE Std 1003.1b-1993, �13.3.5. (POSIX real-time standard 1003.lb)

Except from Linux, I can find references to sched_yield() in LynxOS,
DECthreads thread library, AIX 4.1 and up (libc), Solaris (thread.h
(c)1994 Sun
Microsystems), Unix98, GNU, C EXECUTIVE(r) and PSX(tm) real time kernels
...
This is just a quick search.

Perhaps we should enable sched_yield() for every OS except for... well,
what's the
name of that OS which does not have sched_yield()... FreeBSD ;)

After all, sched_yield() is five years old. Any reasonable OS should
have it.

/* m */

#30Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Mattias Kregert (#29)
Re: sched_yield()

Perhaps we should enable sched_yield() for every OS except for... well,
what's the
name of that OS which does not have sched_yield()... FreeBSD ;)

After all, sched_yield() is five years old. Any reasonable OS should
have it.

Gee, I just checked and BSDI has it. However, it appears to work only
on threaded applications:

#include <pthread.h>
...
If other threads are ready to run, the sched_yield() function forces the
current thread to suspend itself temporarily and let them execute.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#31Tom
tom@sdf.com
In reply to: Mattias Kregert (#29)
Re: [HACKERS] Re: sched_yield()

On Sun, 22 Mar 1998, Mattias Kregert wrote:

Except from Linux, I can find references to sched_yield() in LynxOS,
DECthreads thread library, AIX 4.1 and up (libc), Solaris (thread.h
(c)1994 Sun
Microsystems), Unix98, GNU, C EXECUTIVE(r) and PSX(tm) real time kernels
...
This is just a quick search.

This seems to be part of Posix threads, which means that sched_yield
should only be callable by a thread.... I'm surprised that normal
processes can call sched_yield. In fact, in most of the environments
you've mentioned, it certainly can't be used outside a thread.

Tom

#32Andrew Martin
martin@biochemistry.ucl.ac.uk
In reply to: Tom (#31)
Re: [HACKERS] Re: sched_yield()

But sched_yield() is not Linux-specific:
-- The sched_yield() function relinquishes the processor for the
-- running process.
-- IEEE Std 1003.1b-1993, '13.3.5. (POSIX real-time standard 1003.lb)

Except from Linux, I can find references to sched_yield() in LynxOS,
DECthreads thread library, AIX 4.1 and up (libc), Solaris (thread.h
(c)1994 Sun
Microsystems), Unix98, GNU, C EXECUTIVE(r) and PSX(tm) real time kernels
...
This is just a quick search.

Perhaps we should enable sched_yield() for every OS except for... well,
what's the
name of that OS which does not have sched_yield()... FreeBSD ;)

After all, sched_yield() is five years old. Any reasonable OS should
have it.

It appears from man pages on our Irix system that Irix6 has it but Irix5
does not.

Andrew

----------------------------------------------------------------------------
Dr. Andrew C.R. Martin University College London
EMAIL: (Work) martin@biochem.ucl.ac.uk (Home) andrew@stagleys.demon.co.uk
URL: http://www.biochem.ucl.ac.uk/~martin
Tel: (Work) +44(0)171 419 3890 (Home) +44(0)1372 275775

#33Andrew Martin
martin@biochemistry.ucl.ac.uk
In reply to: Andrew Martin (#32)
Re: [HACKERS] Re: sched_yield()

Gee, I just checked and BSDI has it. However, it appears to work only
on threaded applications:

#include <pthread.h>
...
If other threads are ready to run, the sched_yield() function forces the
current thread to suspend itself temporarily and let them execute.

From the Irix6 man page:

#include <sched.h>
...
The sched_yield system call causes the calling process to relinquish the
processor to a runnable process of higher or equal priority.

Looks like it should work here, but needs a different header file from BDSI.

Andrew

----------------------------------------------------------------------------
Dr. Andrew C.R. Martin University College London
EMAIL: (Work) martin@biochem.ucl.ac.uk (Home) andrew@stagleys.demon.co.uk
URL: http://www.biochem.ucl.ac.uk/~martin
Tel: (Work) +44(0)171 419 3890 (Home) +44(0)1372 275775

#34Tom Ivar Helbekkmo
tih@Hamartun.Priv.NO
In reply to: Noname (#16)
Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6

David Gould wrote:

Seriously, if you want to, please create a function to emulate the following:

/*
* tas(lock)
*
* Access to platform specific test_and_set functionality. Given pointer to
* lock attempts to acquire the lock atomically.
*
* Returns 0 for success, nonzero for failure.
*/
typedef slock_t unsigned char; /* or whatever works on the platform */

int tas(slock_t *lock)
{
slock_t tmp;

/* atomic, interlocked */
tmp = *lock;
*lock = -1; /* any nonzero will do here */

return (tmp != 0);
}

Given this, I can fold the VAX right into the grand scheme, just like a
normal computer (;-)).

Hmpf! The true worth of a computer is a function of its weight! :-)

Sorry this took a while, but anyway, this should do it for the VAX (in
fact, it's more or less the version of the code that I figured I'd use
until Bruce asked me to bum it down maximally for performance, only
now with the return values from tas() swapped). I include the macros
that would fit the current (6.3) locking scheme:

typedef unsigned char slock_t;

int tas(slock_t *lock) {
register ret;

asm(" movl $1, r0
bbssi $0,(%1),1f
clrl r0
1: movl r0,%0"
: "=r"(ret) /* return value, in register */
: "r"(lock) /* argument, 'lock pointer', in register */
: "r0"); /* inline code uses this register */

return ret;
}

#define S_LOCK(addr) do { while (tas(addr)) ; } while (0)
#define S_UNLOCK(addr) (*(addr) = 0)
#define S_INIT_LOCK(addr) (*(addr) = 0)

-tih
--
Popularity is the hallmark of mediocrity. --Niles Crane, "Frasier"

#35Noname
dg@illustra.com
In reply to: Tom Ivar Helbekkmo (#34)
Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6

David Gould wrote:

Seriously, if you want to, please create a function to emulate the following:

/*
* tas(lock)
*
* Access to platform specific test_and_set functionality. Given pointer to
* lock attempts to acquire the lock atomically.
*
* Returns 0 for success, nonzero for failure.
*/
typedef slock_t unsigned char; /* or whatever works on the platform */

int tas(slock_t *lock)
{
slock_t tmp;

/* atomic, interlocked */
tmp = *lock;
*lock = -1; /* any nonzero will do here */

return (tmp != 0);
}

Given this, I can fold the VAX right into the grand scheme, just like a
normal computer (;-)).

Hmpf! The true worth of a computer is a function of its weight! :-)

Sorry this took a while, but anyway, this should do it for the VAX (in
fact, it's more or less the version of the code that I figured I'd use
until Bruce asked me to bum it down maximally for performance, only
now with the return values from tas() swapped). I include the macros

What do you mean "now with the return values from tas() swapped"? I think
your code looks ok, but just want to be sure we are following the same
grand plan...

that would fit the current (6.3) locking scheme:

typedef unsigned char slock_t;

int tas(slock_t *lock) {
register ret;

asm(" movl $1, r0
bbssi $0,(%1),1f
clrl r0
1: movl r0,%0"
: "=r"(ret) /* return value, in register */
: "r"(lock) /* argument, 'lock pointer', in register */
: "r0"); /* inline code uses this register */

return ret;
}

#define S_LOCK(addr) do { while (tas(addr)) ; } while (0)
#define S_UNLOCK(addr) (*(addr) = 0)
#define S_INIT_LOCK(addr) (*(addr) = 0)

-tih
--

Thanks, this is just what I was looking for. I will fold it in to my changes.
I have gotten a little snowed under with other tasks, but I expect to finalize
my patch next week and will post it.

-dg

David Gould dg@illustra.com 510.628.3783 or 510.305.9468
Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612
- Linux. Not because it is free. Because it is better.

#36Tom Ivar Helbekkmo
tih@Hamartun.Priv.NO
In reply to: Noname (#35)
Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6

* David Gould
|
| What do you mean "now with the return values from tas() swapped"? I think
| your code looks ok, but just want to be sure we are following the same
| grand plan...

I just meant that my original code (which has been posted before) had
the tas() function implemented so that it returned 0 on failure, not
on success, as you asked for. Thus, I had to swap the sense of the
return value. In practice, I changed

clrl r0 ; clear register r0
bbssi $0,(%1),1f ; branch on bit set else set
incl r0 ; increment register r0
1: movl r0,%0 ; return register r0

[...]

#define S_LOCK(addr) do { while (!tas(addr)) ; } while (0)

...into...

movl $1, r0 ; set register r0 to 1
bbssi $0,(%1),1f ; branch on bit set else set
clrl r0 ; clear register r0
1: movl r0,%0 ; return register r0

[...]

#define S_LOCK(addr) do { while (tas(addr)) ; } while (0)

-tih
--
Popularity is the hallmark of mediocrity. --Niles Crane, "Frasier"

#37Noname
dg@illustra.com
In reply to: Tom Ivar Helbekkmo (#36)
Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6

Tom Ivar Helbekkmo:

* David Gould
|
| What do you mean "now with the return values from tas() swapped"? I think
| your code looks ok, but just want to be sure we are following the same
| grand plan...

I just meant that my original code (which has been posted before) had
the tas() function implemented so that it returned 0 on failure, not
on success, as you asked for. Thus, I had to swap the sense of the
return value. In practice, I changed

clrl r0 ; clear register r0
bbssi $0,(%1),1f ; branch on bit set else set
incl r0 ; increment register r0
1: movl r0,%0 ; return register r0

[...]

#define S_LOCK(addr) do { while (!tas(addr)) ; } while (0)

...into...

movl $1, r0 ; set register r0 to 1
bbssi $0,(%1),1f ; branch on bit set else set
clrl r0 ; clear register r0
1: movl r0,%0 ; return register r0

[...]

#define S_LOCK(addr) do { while (tas(addr)) ; } while (0)

Thats what I thought, but it has been a few years (not saying how many ;-) )
since I wrote any Macro-32 so I figured I should check.

The tas() definition is not to return success or failure so much as to
return the _previous_state_ of the lock. So if you test_and_set the lock
the test part returns true if it was previously locked and false if it was
unlocked. In either case, it is locked after the tas() (the set part).
Only, if it was previously unlocked, someone else owns the lock so we
have to wait for them to unlock it.

-dg

David Gould dg@illustra.com 510.628.3783 or 510.305.9468
Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612
- Linux. Not because it is free. Because it is better.