Threaded PosgreSQL server
Are there any plans to merge the sources from the experimental threaded
server and the forked server so that a compile switch could choose the
model?
If someone wanted to submit appropriate patches for the v7.3 development
tree, that merge cleanly, I can't see why this wouldn't be a good thing
...
On Mon, 4 Feb 2002, Dann Corbit wrote:
Show quoted text
Are there any plans to merge the sources from the experimental threaded
server and the forked server so that a compile switch could choose the
model?---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
I would love to see this happen but they are already quite different and
drifting further apart every day. I am trying integrate parts of the real
PostgreSQL into threaded postgres as time permits.
I think threaded postgres could serve as a vehicle for testing the
relative value of using threads, but trying to merge patches would be a
major task. I found the interesting marketing white paper the covers
PostgreSQL, Illustra, Informix, DSA ( using threads ), and Datablade
extensions. If
nothing else, it shows that PostgreSQL extension model can be used in
threaded environment.
www.databaseassociates.com/pdf/infobj.pdf
Myron Scott
mkscott@sacadia.com
On Mon, 4 Feb 2002, Marc G. Fournier wrote:
Show quoted text
If someone wanted to submit appropriate patches for the v7.3 development
tree, that merge cleanly, I can't see why this wouldn't be a good thing
...On Mon, 4 Feb 2002, Dann Corbit wrote:
Are there any plans to merge the sources from the experimental threaded
server and the forked server so that a compile switch could choose the
model?---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
I would have to contend that the two will never been merged into one
source base. If the threaded server is done correctly, then many of the
internal structures and logic will be radically different. I have to
commend Mr. Scott for continuing on with this work when it was pretty
obvious from previous discussions that this would not be "well received".
On Mon, 4 Feb 2002 mkscott@sacadia.com wrote:
I would love to see this happen but they are already quite different and
drifting further apart every day. I am trying integrate parts of the real
PostgreSQL into threaded postgres as time permits.I think threaded postgres could serve as a vehicle for testing the
relative value of using threads, but trying to merge patches would be a
major task. I found the interesting marketing white paper the covers
PostgreSQL, Illustra, Informix, DSA ( using threads ), and Datablade
extensions. If
nothing else, it shows that PostgreSQL extension model can be used in
threaded environment.www.databaseassociates.com/pdf/infobj.pdf
Myron Scott
mkscott@sacadia.comOn Mon, 4 Feb 2002, Marc G. Fournier wrote:
If someone wanted to submit appropriate patches for the v7.3 development
tree, that merge cleanly, I can't see why this wouldn't be a good thing
...On Mon, 4 Feb 2002, Dann Corbit wrote:
Are there any plans to merge the sources from the experimental threaded
server and the forked server so that a compile switch could choose the
model?
--
//========================================================\\
|| D. Hageman <dhageman@dracken.com> ||
\\========================================================//
If someone wanted to submit appropriate patches for the v7.3 development
tree, that merge cleanly, I can't see why this wouldn't be a good thing
...
I thought that the one thread instead of one process per client model
would only be an advantage for the "native Windows port" ?
Imho a useful threaded model on unix would involve a separation of threads
and clients. ( 1 CPU thread per physical CPU, several IO threads)
But that would involve a complete redesign.
Andreas
Show quoted text
Are there any plans to merge the sources from the experimental threaded
server and the forked server so that a compile switch could choose the
model?
Import Notes
Resolved by subject fallback
"Marc G. Fournier" <scrappy@hub.org> writes:
If someone wanted to submit appropriate patches for the v7.3 development
tree, that merge cleanly, I can't see why this wouldn't be a good thing
...
I would resist it. I do not think we need the portability and
reliability headaches that would come with it. Furthermore,
an #ifdef'd implementation would be the worst of all possible
worlds, as it would do major damage to readability of the code.
regards, tom lane
Dann Corbit wrote:
Are there any plans to merge the sources from the experimental threaded
server and the forked server so that a compile switch could choose the
model?
Just a question, in order to elighten my thought. Does the current experimental
threaded server disable multi-process model? Or does it *add* the functionality
as a compile switch? (This would be the other way round as the one you pointed
out.)
I think it is important as to evaluate resistance to go multithreading.
If they disabled the original method, I agree with Tom. If they *merged* both
flawlessly, I would try to consider it for the current tree.
Any comments?
Regards,
Haroldo.
On Tue, 5 Feb 2002, Haroldo Stenger wrote:
Dann Corbit wrote:
Are there any plans to merge the sources from the experimental threaded
server and the forked server so that a compile switch could choose the
model?Just a question, in order to elighten my thought. Does the current experimental
threaded server disable multi-process model? Or does it *add* the functionality
as a compile switch? (This would be the other way round as the one you pointed
out.)I think it is important as to evaluate resistance to go multithreading.
If they disabled the original method, I agree with Tom. If they *merged* both
flawlessly, I would try to consider it for the current tree.Any comments?
That's kinda what I was hoping ... is it something that could be
seamlessly integrated to have minimal impact on the code itself ... even
if there was some way of having a 'thread.c' vs 'non-thread.c' that could
be link'd in, with wrapper functions?
Tha again, has anyone looked at the apache project? Apache2 has several
"process models" ... prefork being one (like ours), or a 'worker', which
is a prefork/threaded model where you can have n child processes, with m
'threads' inside of each ... not sure if something like that coul be
retrofit'd into what we have, but ... ?
-----Original Message-----
From: Marc G. Fournier [mailto:scrappy@hub.org]
Sent: Tuesday, February 05, 2002 11:37 AM
To: Haroldo Stenger
Cc: Dann Corbit; Tom Lane; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Threaded PosgreSQL server
[snip]
That's kinda what I was hoping ... is it something that could be
seamlessly integrated to have minimal impact on the code itself ...
even
if there was some way of having a 'thread.c' vs 'non-thread.c' that
could
be link'd in, with wrapper functions?
Tha again, has anyone looked at the apache project? Apache2 has
several
"process models" ... prefork being one (like ours), or a 'worker',
which
is a prefork/threaded model where you can have n child processes, with
m
'threads' inside of each ... not sure if something like that coul be
retrofit'd into what we have, but ... ?
It could be done, but it might be an effort. As an example the ACE
project:
http://www.cs.wustl.edu/~schmidt/ACE.html
has a number of easily selected threading models. It is also portable
to an
enormous number of platforms (including all flavors of UNIX). However,
it
is C++ rather than C, and so that particular transition would probably
be
pretty traumatic if someone tried to use ACE as a toolset. But at least
it
does demonstrate that such a thing is feasible. As a "for instance" you
can
look at the Jaws web server (which is both open source and very much
faster
than the Apache server). It can easily be built with many different
threading
models.
Import Notes
Resolved by subject fallback
On Tue, 5 Feb 2002, Haroldo Stenger wrote:
Just a question, in order to elighten my thought. Does the current experimental
threaded server disable multi-process model? Or does it *add* the functionality
as a compile switch? (This would be the other way round as the one you pointed
out.)
Currently, exper. threaded postgres can have multiple processes using
multiple threads with the same shared memory. There is no forking
involved in the process though. Shared memory, mutexes, and conditonal
locks go global or private to the process based on a run-time flag.
That's kinda what I was hoping ... is it something that could be
seamlessly integrated to have minimal impact on the code itself ... even
if there was some way of having a 'thread.c' vs 'non-thread.c' that could
be link'd in, with wrapper functions?
The first basic problem is that global variables are scattered throughout
the source as well as some static stack variables. Hunting these down and
finding a home for them is, in and of itself, a major task. For example,
flex
produces code that is not thread safe, you have to modify that too. The
current work around in exper. thrreaded postgres is not pretty, one
"environment" structure that holds all the normal postgres globals in
thread local storage. This makes compile time choices impractical I
think.
Cheers,
Myron
mkscott@sacadia.com
Le Mardi 5 F�vrier 2002 20:36, Marc G. Fournier a �crit :
Apache2 has several "process models" ... prefork being one (like ours), or
a 'worker', which is a prefork/threaded model where you can have n child
processes, with m 'threads' inside of each ... not sure if something like
that coul be retrofit'd into what we have, but ... ?
Why not try to link Cygwin staticly?
Best regards,
Jean-Michel POURE
On Tue, 5 Feb 2002 mkscott@sacadia.com wrote:
The first basic problem is that global variables are scattered
throughout the source as well as some static stack variables. Hunting
these down and finding a home for them is, in and of itself, a major
task. For example, flex produces code that is not thread safe, you have
to modify that too. The current work around in exper. thrreaded
postgres is not pretty, one "environment" structure that holds all the
normal postgres globals in thread local storage. This makes compile
time choices impractical I think.
Okay, but this has been discussed in the past concerning threading ... the
first make work that would have to be done was 'cleaning the code' so that
it was thread-safe ...
Basically, if we were to look at moving *towards* a fork/thread model in
the future, what can we learn and incorporate from the work already being
done? How much of the work in the threaded server is cleaning up the code
to be thread-safe, that would benefit the base code itself and start us
down that path?
Right now, from everythign I've heard, making the code thread-safe is one
big onerous task ... but if we were to start incorporating changes from
the 'thread work' that is being done now, into the base server, and ppl
start thinking thread-safe when they are coding new stuff, over time, this
task becomes smaller ...
On Tue, Feb 05, 2002 at 03:36:41PM -0400, Marc G. Fournier wrote:
Tha again, has anyone looked at the apache project? Apache2 has several
"process models" ... prefork being one (like ours), or a 'worker', which
is a prefork/threaded model where you can have n child processes, with m
'threads' inside of each ... not sure if something like that coul be
retrofit'd into what we have, but ... ?
We could even use the nice Apache Portable Runtime, which is a
platform-independant layer over threading/networking/shm/etc (there's a
summary here: http://apr.apache.org/docs/apr/modules.html).
This might improve PostgreSQL on non-UNIX platforms, namely Win32.
However, I think using threads is only a good idea if it gets us a
substantial performance increase. From what I've seen, that isn't the
case; and even if the time to create a connection is a bottleneck, there
are other, more conservative ways of improving it (e.g. pre-forking,
persistent backends, and IIRC some work Tom Lane was doing to reduce
backend startup time).
And given the complexity and reduced reliability that threads bring, I
think the only advantage would be buzzword-compliance -- which isn't a
priority, personally.
Cheers,
Neil
nconway@klamath.dyndns.org (Neil Conway) writes:
However, I think using threads is only a good idea if it gets us a
substantial performance increase. From what I've seen, that isn't the
case; and even if the time to create a connection is a bottleneck, there
are other, more conservative ways of improving it (e.g. pre-forking,
persistent backends, and IIRC some work Tom Lane was doing to reduce
backend startup time).
The one place where it could be a clear win would be in splitting
single very large queries over multiple CPUs. This would probably
require an even larger redesign of the whole system than moving to a
query-per-thread rather than per-process model. I think "real"
multi-master replication and clustering is a better goal in the short
term...
-Doug
--
Let us cross over the river, and rest under the shade of the trees.
--T. J. Jackson, 1863
Import Notes
Reply to msg id not found: nconway@klamath.dyndns.orgsmessageofWed6Feb2002143326-0500
Doug McNaught wrote:
The one place where it could be a clear win would be in splitting
single very large queries over multiple CPUs. This would probably
require an even larger redesign of the whole system than moving to a
query-per-thread rather than per-process model. I think "real"
multi-master replication and clustering is a better goal in the short
term...
Agreed.
Though, starting to think & code thread safe would be nice too.
Regards,
Haroldo.
On Wed, 6 Feb 2002, Marc G. Fournier wrote:
Right now, from everythign I've heard, making the code thread-safe is one
big onerous task ... but if we were to start incorporating changes from
the 'thread work' that is being done now, into the base server, and ppl
start thinking thread-safe when they are coding new stuff, over time, this
task becomes smaller ...
I agree, once the move is made to thread-safe it becomes much easier to
maintain thread-safe code. I also very much like the idea of multiple
thread/process models that could be chosen from. I think the question has
always been the
inital cost vs. benefit. The group has not seen much to be gained for
the amount of initial work involved. After working with the code, I too
felt it wasn't worth it.
After revisiting the threaded code after a long break I now see some real
benefits to threading. For example, I was able to incorporate Tom Lane's
lazy_vacuum code to do relation clean up automatically when a threshold of
page writes occurred. I was also able to use the freespace information to
be shared among threads in the process without touching shared mem. As a
result, a pgbench run with 20 clients and over 1,000,000
trasactions maintained a more or less constant tps with manual
vacuum commands and far less heap expansion. You can do this with
processes (planned for 7.3 I think) but I
think it was much easier with threads. Other things may open up with
threads as well like Java stored procedures. Anyway, now I think it is
worth it.
Myron
mkscott@sacadia.com
Haroldo Stenger writes:
Though, starting to think & code thread safe would be nice too.
The thing about thread-safeness is that it's only actually useful when
you're using threads. Otherwise it wastes everybody's time -- the
programmer's, the computer's, and the user's.
--
Peter Eisentraut peter_e@gmx.net
On Wed, 6 Feb 2002 mkscott@sacadia.com wrote:
After revisiting the threaded code after a long break I now see some
real benefits to threading. For example, I was able to incorporate Tom
Lane's lazy_vacuum code to do relation clean up automatically when a
threshold of page writes occurred. I was also able to use the freespace
information to be shared among threads in the process without touching
shared mem. As a result, a pgbench run with 20 clients and over
1,000,000 trasactions maintained a more or less constant tps with manual
vacuum commands and far less heap expansion. You can do this with
processes (planned for 7.3 I think) but I think it was much easier with
threads. Other things may open up with threads as well like Java stored
procedures. Anyway, now I think it is worth it.
Are there code clean-ups that have gone into the thread'd code that could
be incorporated into the existing code base that would start us down that
path? For instance, based my limited understanding of threaded servers, I
believe that 'global variables' are generally considered "A Real Bad
Thing" ... in one of your email's, you mentioned:
"The first basic problem is that global variables are scattered throughout
the source as well as some static stack variables. Hunting these down and
finding a home for them is, in and of itself, a major task. For example,
flex produces code that is not thread safe, you have to modify that too.
The current work around in exper. thrreaded postgres is not pretty, one
"environment" structure that holds all the normal postgres globals in
thread local storage. This makes compile time choices impractical I
think."
Now, what is a 'clean' solution to this? Making sure that all variables
are passed through to various functions, maybe through a struct construct?
So, can we start there and work our way through the code? Start simple
... take one of the global(s), put it into the struct and take it out of
global space and make sure that its passed appropriately through all the
required functions ... add in the next one, and do another trace?
Someone, or a group of ppl, with thread knowledge needs to start this
forward ... once the clean up begins, even without any thread code thrown
in, it shouldn't be too difficult to keep it clean to go to 'the next
step', no?
On Wed, 6 Feb 2002, Peter Eisentraut wrote:
Haroldo Stenger writes:
Though, starting to think & code thread safe would be nice too.
The thing about thread-safeness is that it's only actually useful when
you're using threads. Otherwise it wastes everybody's time -- the
programmer's, the computer's, and the user's.
The thing is, there are several areas where using threads would be a
benefit, from what I've read on this list over the years ... as time goes
on, less and less of the OSs in use dont' have threads, so we have to
start *somewhere* to work towards that sort of hybrid system ...
Peter Eisentraut wrote:
Haroldo Stenger writes:
Though, starting to think & code thread safe would be nice too.
The thing about thread-safeness is that it's only actually useful when
you're using threads. Otherwise it wastes everybody's time -- the
programmer's, the computer's, and the user's.
Yes I see. The scenario under which I see doing it to be useful, is thinking in
adding multi-threading for PG v 7.5 say, and preparing the road. But maybe it's
a worthless effort. Many developers are pointing it. Let's forget about threads
for now.
By the way, my original question about how integrated the multi-threading fork
reached, remained unanswered. I will assume it went threading, dropping forever
the original behaviour, so deciding me towards not considering threading a
viable option (for now).
Regards,
Haroldo.