Reasoning behind process instead of thread based arch?
Hello!
I have a couple of final ( I hope, for your sake ) questions regarding
PostgreSQL.
I understand PostgreSQL uses processes rather than threads. I found this
statement in the archives:
"The developers agree that multiple processes provide
more benefits (mostly in stability and robustness) than costs (more
connection startup costs). The startup costs are easily overcome by
using connection pooling.
"
Please explain why it is more stable and robust? More from the above
statement:
"Also, each query can only use one processor; a single query can't be
executed in parallel across many CPUs. However, several queries running
concurrently will be spread across the available CPUs."
And it is because of the PostgreSQL process architecture that a query
can't be executed by many CPU:s right? Although I wonder if this is the
case in MySQL. It only says in their manual that each connection is a
thread.
Also, MySQL has a library for embedded aplications, the say:
"We also provide MySQL Server as an embedded multi-threaded library that
you can link into your application to get a smaller, faster,
easier-to-manage product."
Do PostgreSQL offer anything similar?
Thank you for your time.
Tim
nd02tsk@student.hig.se writes:
"The developers agree that multiple processes provide
more benefits (mostly in stability and robustness) than costs (more
connection startup costs). The startup costs are easily overcome by
using connection pooling.
"Please explain why it is more stable and robust?
Because threads share the same memory space, a runaway thread can
corrupt the entire system by writing to the wrong part of memory.
With separate processes, the only data that is shared is that which is
meant to be shared, which reduces the potential for such damage.
"Also, each query can only use one processor; a single query can't be
executed in parallel across many CPUs. However, several queries running
concurrently will be spread across the available CPUs."And it is because of the PostgreSQL process architecture that a query
can't be executed by many CPU:s right?
There's no theoretical reason that a query couldn't be split across
multiple helper processes, but no one's implemented that feature--it
would be a pretty major job.
Also, MySQL has a library for embedded aplications, the say:
"We also provide MySQL Server as an embedded multi-threaded library that
you can link into your application to get a smaller, faster,
easier-to-manage product."Do PostgreSQL offer anything similar?
No. See the archives for extensive discussion of why PG doesn't do
this.
-Doug
On Wed, 2004-10-27 at 09:56, nd02tsk@student.hig.se wrote:
Hello!
I have a couple of final ( I hope, for your sake ) questions regarding
PostgreSQL.I understand PostgreSQL uses processes rather than threads. I found this
statement in the archives:"The developers agree that multiple processes provide
more benefits (mostly in stability and robustness) than costs (more
connection startup costs). The startup costs are easily overcome by
using connection pooling.
"Please explain why it is more stable and robust? More from the above
statement:
This question shows up every 6 months or so. You might wanna search the
archives (I use google to do that, but YMMV with the postgresql site's
search engine.)
Basically, there are a few issues with threading that pop up their ugly
heads. One: Not all OSes thread libraries are created equal. There
was a nasty bug in one of the BSDs that causes MySQL to crash a couple
years ago that drove them nuts. So programming a threaded
implementation means you have the vagaries of different levels of
quality and robustness of thread libraries to deal with. Two: If a
single process in a multi-process application crashes, that process
alone dies. The buffer is flushed, and all the other child processes
continue happily along. In a multi-threaded environment, when one
thread dies, they all die. Three: Multi-threaded applications can be
prone to race conditions that are VERY hard to troubleshoot, especially
if they occur once every million or so times the triggering event
happens.
On some operating systems, like Windows and Solaris, processes are
expensive, while threads are cheap, so to speak. this is not the case
in Linux or BSD, where the differences are much smaller, and the
multi-process design suffers no great disadvantage.
"Also, each query can only use one processor; a single query can't be
executed in parallel across many CPUs. However, several queries running
concurrently will be spread across the available CPUs."And it is because of the PostgreSQL process architecture that a query
can't be executed by many CPU:s right? Although I wonder if this is the
case in MySQL. It only says in their manual that each connection is a
thread.
Actually, if it were converted to multi-threaded tomorrow, it would
still be true, because the postgresql engine isn't designed to split off
queries into constituent parts to be executed by seperate threads or
processes. Conversely, if one wished to implement it, one could likely
patch postgresql to break up parts of queries to different child
processes of the current child process (grand child processes so to
speak) that would allow a query to hit multiple CPUs.
Also, MySQL has a library for embedded aplications, the say:
"We also provide MySQL Server as an embedded multi-threaded library that
you can link into your application to get a smaller, faster,
easier-to-manage product."Do PostgreSQL offer anything similar?
No, because in that design, if your application crashes, so does, by
extension, your database. Now, I'd argue that if I had to choose
between which database to have crash in the middle of transactions, I'd
pick PostgreSQL, it's generally considered a bad thing to have a
database crash mid transaction. PostgreSQL is more robust about crash
recovery, but still...
That's another subject that shows up every x months, an embedded version
of PostgreSQL. Basically, the suggestion is to use something like
SQLlite, which is built to be embedded, and therefore has a much lower
footprint than PostgreSQL could ever hope to achieve. No one wants
their embedded library using up gobs of RAM and disk space when it's
just handling one thread / process doing one thing. It's like
delivering Pizzas with a Ferrari, you could do it, it just eouldn't make
a lot of sense.
On Wed, Oct 27, 2004 at 05:56:16PM +0200, nd02tsk@student.hig.se wrote:
I understand PostgreSQL uses processes rather than threads. I found this
statement in the archives:"The developers agree that multiple processes provide
more benefits (mostly in stability and robustness) than costs (more
connection startup costs). The startup costs are easily overcome by
using connection pooling."Please explain why it is more stable and robust?
I can't speak for the developers, but here are my thoughts:
A critical problem in a thread could terminate the entire process
or corrupt its data. If the database server were threaded, such
problems would affect the entire server. With each connection
handled by a separate process, a critical error is more likely to
affect only the connection that had the problem; the rest of the
server survives unscathed.
"Also, each query can only use one processor; a single query can't be
executed in parallel across many CPUs. However, several queries running
concurrently will be spread across the available CPUs."And it is because of the PostgreSQL process architecture that a query
can't be executed by many CPU:s right? Although I wonder if this is the
case in MySQL. It only says in their manual that each connection is a
thread.
I don't know if MySQL can use multiple threads for a single query;
it might simply be using one thread per connection instead of a one
process per connection. If that's the case, then queries executed
by a particular connection are still single-threaded, the same as
in PostgreSQL.
A database that uses a separate process for each connection could
still employ multiple threads within each process if somebody could
figure out a way to distribute a query amongst the threads. I don't
know what the PostgreSQL developers' thoughts on that are.
A disadvantage of threads is that some systems (e.g., FreeBSD 4)
implement threads in userland and threads don't take advantage of
multiple CPUs. On such systems, using multiple processes better
employs additional CPUs.
--
Michael Fuhr
http://www.fuhr.org/~mfuhr/
On some operating systems, like Windows and Solaris, processes are
expensive, while threads are cheap, so to speak. this is not the case
in Linux or BSD, where the differences are much smaller, and the
multi-process design suffers no great disadvantage.
Even on Windows or Solaris you can use techniques like persistent
connections or connection pooling to eliminate the process overhead.
Actually, if it were converted to multi-threaded tomorrow, it would
still be true, because the postgresql engine isn't designed to split off
queries into constituent parts to be executed by seperate threads or
processes. Conversely, if one wished to implement it, one could likely
patch postgresql to break up parts of queries to different child
processes of the current child process (grand child processes so to
speak) that would allow a query to hit multiple CPUs.
I would be curious as to what this would actually gain. Of course there
are corner cases but I rarely find that it is the CPU that is doing all
the work, thus splitting the query may not do you any good.
In theory I guess being able to break it up and execute it to different
CPUs could cause the results to process faster, but I wonder if it would
be a large enough benefit to even notice?
"We also provide MySQL Server as an embedded multi-threaded library that
you can link into your application to get a smaller, faster,
easier-to-manage product."Do PostgreSQL offer anything similar?
No, it isn't really designed to do that. Like Oracle also is not a
database you would Embed.
pick PostgreSQL, it's generally considered a bad thing to have a
database crash mid transaction. PostgreSQL is more robust about crash
recovery, but still...That's another subject that shows up every x months, an embedded version
of PostgreSQL. Basically, the suggestion is to use something like
SQLlite, which is built to be embedded, and therefore has a much lower
footprint than PostgreSQL could ever hope to achieve. No one wants
their embedded library using up gobs of RAM and disk space when it's
just handling one thread / process doing one thing. It's like
delivering Pizzas with a Ferrari, you could do it, it just eouldn't make
a lot of sense.---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
--
Command Prompt, Inc., home of PostgreSQL Replication, and plPHP.
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL
Two: If a
single process in a multi-process application crashes, that process
alone dies. The buffer is flushed, and all the other child processes
continue happily along. In a multi-threaded environment, when one
thread dies, they all die.
So this means that if a single connection thread dies in MySQL, all
connections die?
Seems rather serious. I am doubtful that is how they have implemented it.
On Wed, Oct 27, 2004 at 07:47:03PM +0200, nd02tsk@student.hig.se wrote:
Two: If a
single process in a multi-process application crashes, that process
alone dies. The buffer is flushed, and all the other child processes
continue happily along. In a multi-threaded environment, when one
thread dies, they all die.So this means that if a single connection thread dies in MySQL, all
connections die?Seems rather serious. I am doubtful that is how they have implemented it.
It's part of the design of threads. If a thread does an invalid lookup,
it's the *process* (ie all threads) that receives the signal and it's
the *process* that dies.
Just like a SIGSTOP stops all threads and a SIGTERM terminates them
all. Signals are shared between threads. Now, you could ofcourse catch
these signals but you only have one address space shared between all
the threads, so if you want to exit to get a new process image (because
something is corrupted), you have to close all connections.
And indeed, the one MySQL server I can see is four threads. Nasty.
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for someone
else to do the other 95% so you can sue them.
nd02tsk@student.hig.se wrote:
Two: If a
single process in a multi-process application crashes, that process
alone dies. The buffer is flushed, and all the other child processes
continue happily along. In a multi-threaded environment, when one
thread dies, they all die.So this means that if a single connection thread dies in MySQL, all
connections die?Seems rather serious. I am doubtful that is how they have implemented it.
That all depends on how you define crash. If a thread causes an
unhandled signal to be raised such as an illegal memory access or a
floating point exception, the process will die, hence killing all
threads. But a more advanced multi-threaded environment will install
handlers for such signals that will handle the error gracefully. It
might not even be necesarry to kill the offending thread.
Some conditions are harder to handle than others, such as stack overflow
and out of memory, but it can be done. So to state that multi-threaded
environments in general kills all threads when one thread chrashes is
not true. Having said that, I have no clue as to how advanced MySQL is
in this respect.
Regards,
Thomas Hallgren
-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of
Thomas Hallgren
Sent: Wednesday, October 27, 2004 11:16 AM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Reasoning behind process instead of
thread basednd02tsk@student.hig.se wrote:
Two: If a
single process in a multi-process application crashes, that process
alone dies. The buffer is flushed, and all the other childprocesses
continue happily along. In a multi-threaded environment, when one
thread dies, they all die.So this means that if a single connection thread dies in MySQL, all
connections die?Seems rather serious. I am doubtful that is how they have
implemented
it.
That all depends on how you define crash. If a thread causes an
unhandled signal to be raised such as an illegal memory access or a
floating point exception, the process will die, hence killing all
threads. But a more advanced multi-threaded environment will install
handlers for such signals that will handle the error gracefully. It
might not even be necesarry to kill the offending thread.Some conditions are harder to handle than others, such as
stack overflow
and out of memory, but it can be done. So to state that
multi-threaded
environments in general kills all threads when one thread chrashes is
not true. Having said that, I have no clue as to how advanced
MySQL is
in this respect.
There are clear advantages to separate process space for servers.
1. Separate threads can stomp on each other's memory space. (e.g.
imagine a wild, home-brew C function gone bad).
2. Separate processes can have separate user ids, and [hence] different
rights for file access. A threaded server will have to either be
started at the level of the highest user who will attach or will have to
impersonate the users in threads. Impersonation is very difficult to
make portable.
3. Separate processes die when they finish, releasing all resources to
the operating system. Imagine a threaded server with a teeny-tiny
memory leak, that stays up 24x7. Eventually, you will start using disk
for ram, or even use all available disk and simply crash.
Threaded servers have one main advantate:
Threads are lightweight processes and starting a new thread is faster
than starting a new executable.
The thread advantage can be partly mitigated by pre-launching a pool of
servers.
Import Notes
Resolved by subject fallback
Dann,
I'm not advocating a multi-threaded PostgreSQL server (been there, done
that :-). But I still must come to the defense of multi-threaded systems
in general.
You try to convince us that a single threaded system is better because
it is more tolerant to buggy code. That argument is valid and I agree, a
multi-threaded environment is more demanding in terms of developer
skills and code quality.
But what if I don't write crappy code or if I am prepared to take the
consequences of my bugs, what then? Maybe I really know what I'm doing
and really want to get the absolute best performance out of my server.
There are clear advantages to separate process space for servers.
1. Separate threads can stomp on each other's memory space. (e.g.
imagine a wild, home-brew C function gone bad).
Not all servers allow home-brewed C functions. And even when they do,
not all home-brewers will write crappy code. This is only a clear
advantage when buggy code is executed.
2. Separate processes can have separate user ids, and [hence] different
rights for file access. A threaded server will have to either be
started at the level of the highest user who will attach or will have to
impersonate the users in threads. Impersonation is very difficult to
make portable.
Yes, this is true and a valid advantage if you ever want access external
and private files. Such access is normally discouraged though, since you
are outside of the boundaries of your transaction.
3. Separate processes die when they finish, releasing all resources to
the operating system. Imagine a threaded server with a teeny-tiny
memory leak, that stays up 24x7. Eventually, you will start using disk
for ram, or even use all available disk and simply crash.
Sure, but a memory leak is a serious bug and most leaks will have a
negative impact on single threaded systems as well. I'm sure you will
find memory leak examples that are fatal only in a multi-threaded 24x7
environment but they are probably very few overall.
Threaded servers have one main advantate:
Threads are lightweight processes and starting a new thread is faster
than starting a new executable.
A few more from the top of my head:
1. Threads communicate much faster than processes (applies to locking
and parallel query processing).
2. All threads in a process can share a common set of optimized query plans.
3. All threads can share lots of data cached in memory (static but
frequently accessed tables etc.).
4. In environments built using garbage collection, all threads can share
the same heap of garbage collected data.
5. A multi-threaded system can apply in-memory heuristics for self
adjusting heaps and other optimizations.
6. And lastly, my favorite; a multi-threaded system can be easily
integrated with, and make full use of, a multi-threaded virtual
execution environment such as a Java VM.
...
Regards,
Thomas Hallgren
On Wed, Oct 27, 2004 at 10:07:48PM +0200, Thomas Hallgren wrote:
Threaded servers have one main advantate:
Threads are lightweight processes and starting a new thread is faster
than starting a new executable.A few more from the top of my head:
A lot of these advantages are due to sharing an address space, right?
Well, the processes in PostgreSQL share address space, just not *all*
of it. They communicate via this shared memory.
1. Threads communicate much faster than processes (applies to locking
and parallel query processing).
2. All threads in a process can share a common set of optimized query plans.
PostgreSQL could do this too, but I don't think anyone's looked into
sharing query plans, probably quite difficult.
3. All threads can share lots of data cached in memory (static but
frequently accessed tables etc.).
Table data is already shared. If two backends are manipulating the same
table, they can lock directly via shared memory rather than some OS
primitive.
4. In environments built using garbage collection, all threads can share
the same heap of garbage collected data.
5. A multi-threaded system can apply in-memory heuristics for self
adjusting heaps and other optimizations.
6. And lastly, my favorite; a multi-threaded system can be easily
integrated with, and make full use of, a multi-threaded virtual
execution environment such as a Java VM.
I can't really comment on these.
I think PostgreSQL has nicely combined the benefits of shared memory
with the robustness of multiple processes...
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for someone
else to do the other 95% so you can sue them.
Martijn van Oosterhout <kleptog@svana.org> writes:
... Signals are shared between threads. Now, you could ofcourse catch
these signals but you only have one address space shared between all
the threads, so if you want to exit to get a new process image (because
something is corrupted), you have to close all connections.
Right. Depending on your OS you may be able to catch a signal that
would kill a thread and keep it from killing the whole process, but
this still leaves you with a process memory space that may or may not
be corrupted. Continuing in that situation is not cool, at least not
according to the Postgres project's notions of reliable software design.
It should be pointed out that when we get a hard backend crash, Postgres
will forcibly terminate all the backends and reinitialize; which means
that in terms of letting concurrent sessions keep going, we are not any
more forgiving than a single-address-space multithreaded server. The
real bottom line here is that we have good prospects of confining the
damage done by the failed process: it's unlikely that anything bad will
happen to already-committed data on disk or that any other sessions will
return wrong answers to their clients before we are able to kill them.
It'd be a lot harder to say that with any assurance for a multithreaded
server.
regards, tom lane
Martijn van Oosterhout wrote:
A lot of these advantages are due to sharing an address space, right?
Well, the processes in PostgreSQL share address space, just not *all*
of it. They communicate via this shared memory.
Whitch is a different beast altogether. The inter-process mutex handling
that you need to synchronize shared memory access is much more expensive
than the mechanisms used to synchronize threads.
2. All threads in a process can share a common set of optimized query plans.
PostgreSQL could do this too, but I don't think anyone's looked into
sharing query plans, probably quite difficult.
Perhaps. It depends on the design. If the plans are immutable once
generated, it should not be that difficult. But managing the mutable
area where the plans are cached again calls for expensive inter-process
synchronization.
Table data is already shared. If two backends are manipulating the same
table, they can lock directly via shared memory rather than some OS
primitive.
Sure, some functionality can be achieved using shared memory. But it
consumes more resources and the mutexes are a lot slower.
I think PostgreSQL has nicely combined the benefits of shared memory
with the robustness of multiple processes...
So do I. I've learned to really like PostgreSQL and the way it's built,
and as I said in my previous mail, I'm not advocating a switch. I just
react to the unfair bashing of multi-threaded systems.
Regards,
Thomas Hallgren
Tom Lane wrote:
Right. Depending on your OS you may be able to catch a signal that
would kill a thread and keep it from killing the whole process, but
this still leaves you with a process memory space that may or may not
be corrupted. Continuing in that situation is not cool, at least not
according to the Postgres project's notions of reliable software design.
There can't be any "may or may not" involved. You must of course know
what went wrong.
It is very common that you either get a null pointer exception (attempt
to access address zero), that your stack will hit a write protected page
(stack overflow), or that you get some sort of arithemtic exception.
These conditions can be trapped and gracefully handled. The signal
handler must be able to check the cause of the exception. This usually
involves stack unwinding and investingating the state of the CPU at the
point where the signal was generated. The process must be terminated if
the reason is not a recognized one.
Out of memory can be managed using thread local allocation areas
(similar to MemoryContext) and killing a thread based on some criteria
when no more memory is available. A criteria could be the thread that
encountered the problem, the thread that consumes the most memory, the
thread that was least recently active, or something else.
It should be pointed out that when we get a hard backend crash, Postgres
will forcibly terminate all the backends and reinitialize; which means
that in terms of letting concurrent sessions keep going, we are not any
more forgiving than a single-address-space multithreaded server. The
real bottom line here is that we have good prospects of confining the
damage done by the failed process: it's unlikely that anything bad will
happen to already-committed data on disk or that any other sessions will
return wrong answers to their clients before we are able to kill them.
It'd be a lot harder to say that with any assurance for a multithreaded
server.
I'm not sure I follow. You will be able to bring all threads of one
process to a halt much faster than you can kill a number of external
processes. Killing the multithreaded process is more like pulling the plug.
Regards,
Thomas Hallgren
Thomas Hallgren <thhal@mailblocks.com> writes:
Tom Lane wrote:
Right. Depending on your OS you may be able to catch a signal that
would kill a thread and keep it from killing the whole process, but
this still leaves you with a process memory space that may or may not
be corrupted.
It is very common that you either get a null pointer exception (attempt
to access address zero), that your stack will hit a write protected page
(stack overflow), or that you get some sort of arithemtic exception.
These conditions can be trapped and gracefully handled.
That argument has zilch to do with the question at hand. If you use a
coding style in which these things should be considered recoverable
errors, then setting up a signal handler to recover from them works
about the same whether the process is multi-threaded or not. The point
I was trying to make is that when an unrecognized trap occurs, you have
to assume not only that the current thread of execution is a lost cause,
but that it may have clobbered any memory it can get its hands on.
I'm not sure I follow. You will be able to bring all threads of one
process to a halt much faster than you can kill a number of external
processes.
Speed is not even a factor in this discussion; or do you habitually
spend time optimizing cases that aren't supposed to happen? The point
here is circumscribing how much can go wrong before you realize you're
in trouble.
regards, tom lane
Tom Lane wrote:
That argument has zilch to do with the question at hand. If you use a
coding style in which these things should be considered recoverable
errors, then setting up a signal handler to recover from them works
about the same whether the process is multi-threaded or not. The point
I was trying to make is that when an unrecognized trap occurs, you have
to assume not only that the current thread of execution is a lost cause,
but that it may have clobbered any memory it can get its hands on.
I'm just arguing that far from all signals are caused by unrecoverable
errors and that threads causing them can be killed individually and
gracefully.
I can go further and say that in some multi-threaded environments you as
a developer don't even have the opportunity to corrupt memory. In such
environments the recognized traps are the only ones you encounter unless
the environment is corrupt in itself. In addition, there are a number of
techniques that can be used to make it impossible for the threads to
unintentionally interfere with each others memory.
I'm not at all contesting the fact that a single-threaded server
architecture is more bug-tolerant and in some ways easier to manage.
What I'm trying to say is that it is very possible to write even better,
yet very reliable servers using a multi-threaded architecture and high
quality code.
... The point here is circumscribing how much can go wrong before you
realize you're in trouble.
Ok now I do follow. With respect to my last comment about speed, I guess
it's long overdue to kill this thread now. Let's hope the forum stays
intact :-)
Regards,
Thomas Hallgren
On Thu, Oct 28, 2004 at 12:13:41AM +0200, Thomas Hallgren wrote:
Martijn van Oosterhout wrote:
A lot of these advantages are due to sharing an address space, right?
Well, the processes in PostgreSQL share address space, just not *all*
of it. They communicate via this shared memory.Whitch is a different beast altogether. The inter-process mutex handling
that you need to synchronize shared memory access is much more expensive
than the mechanisms used to synchronize threads.
Now you've piqued my curiosity. You have two threads of control (either
two processes or two threads) which shared a peice of memory. How can
the threads syncronise easier than processes, what other feature is
there? AFAIK the futexes used by Linux threads is just as applicable
and fast between two processes as two threads. All that is required is
some shared memory.
Or are you suggesting the only difference is in switching time (which
is not that significant).
Also, I admit that on some operating systems, threads are much faster
than processes, but I'm talking specifically about linux here.
Thanks in advance,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for someone
else to do the other 95% so you can sue them.
nd02tsk@student.hig.se wrote:
So Thomas, you say you like the PostgreSQL process based modell better
than the threaded one used by MySQL. But you sound like the opposite. I'd
like to know why you like processes more.
Ok, let me try and explain why I can be perceived as a scatterbrain :-).
PostgreSQL is a very stable and well functioning product. It is one of
the few databases out there that has a well documented way of adding
plugins written in C and quite a few plugins exists today. You have all
the server side languages, (PL/pgsql PL/Perl, PL/Tcl, PL/Java, etc.),
and a plethora of custom functions and other utilities. Most of this is
beyond the control of the PostgreSQL core team since it's not part of
the core product. It would be extremely hard to convert everything into
a multi-threaded environment and it would be even harder to maintain the
very high quality that would be required.
I think PostgreSQL in it's current shape, is ideal for a distributed,
Open Source based conglomerate of products. The high quality core firmly
controlled by the core team, in conjunction with all surrounding
features, brings you DBMS functionality that is otherwise unheard of in
the free software market. I believe that this advantage is very much due
to the simplicity and bug-resilient single-threaded design of the
PostgreSQL.
My only regret is that the PL/Java, to which I'm the father, is confined
to one connection only. But that too has some advantages in terms of
simplicity and reliability.
So far PostgreSQL
At present, I'm part of a team that develops a very reliable
multi-threaded system (a Java VM). In this role, I've learned a lot
about how high performance thread based systems can be made. If people
on this list wants to dismiss multi-threaded systems, I feel they should
do it based on facts. It's more than possible to build a great
multi-threaded server. It is my belief that as PostgreSQL get more
representation in the high end market where the advantages of
multi-threaded solutions get more and more apparent, it will find that
the competition from a performance standpoint is sometimes overwhelming.
I can't say anything about MySQL robustness because I haven't used it
much. Perhaps the code quality is indeed below what is required for a
multi-threaded system, perhaps not. I choose PostgreSQL over MySQL
because MySQL lacks some of the features that I feel are essential,
because it does some things dead wrong, and because it is dual licensed.
Hope that cleared up some of the confusion.
Regards,
Thomas Hallgren
Import Notes
Reply to msg id not found: 3265.130.243.14.122.1098953198.squirrel@130.243.14.122
Martijn van Oosterhout wrote:
Now you've piqued my curiosity. You have two threads of control (either
two processes or two threads) which shared a peice of memory. How can
the threads syncronise easier than processes, what other feature is
there? AFAIK the futexes used by Linux threads is just as applicable
and fast between two processes as two threads. All that is required is
some shared memory.
Agree. On Linux, this is not a big issue. Linux is rather special
though, since the whole kernel is built in a way that more or less puts
an equal sign between a process and a thread. This is changing though.
Don't know what relevance that will have on this issue.
Shared Memory and multiple processes have other negative impacts on
performance since you force the CPU to jump between different memory
spaces. Switching between those address spaces will decrease the CPU
cache hits. You might think this is esoteric and irrelevant, but the
fact is, cache misses are extremely expensive and the problem is
increasing. While CPU speed has increased 152 times or so since the
80's, the speed on memory has only quadrupled.
Or are you suggesting the only difference is in switching time (which
is not that significant).
"not that significant" all depends on how often you need to switch. On
most OS'es, a process switch is significantly slower than switching
between threads (again, Linux may be an exception to the rule).
Regards,
Thomas Hallgren
[processes vs threads stuff deleted]
In any modern and reasonable Unix-like OS, there's very little difference
between the multi-process or the multi-thread model. _Default_ behaviour
is different, e.g. memory is shared by default for threads, but processes
can share memory as well. There are very few features threads have
that processes don't, and vice versa. And if the OS is good enough,
there are hardly performance issues.
I think that it would be interesting to discuss multi(processes/threades)
model vs mono (process/thread). Mono as in _one_ single process/thread
per CPU, not one per session. That is, moving all the "scheduling"
between sessions entirely to userspace. The server gains almost complete
control over the data structures allocated per session, and the resources
allocated _to_ sessions.
I bet this is very theoretical since it'd require a complete redesign
of some core stuff. And I have strong concerns about portability. Still,
it could be interesting.
.TM.
--
____/ ____/ /
/ / / Marco Colombo
___/ ___ / / Technical Manager
/ / / ESI s.r.l.
_____/ _____/ _/ Colombo@ESI.it