Reasoning behind process instead of thread based arch?

Started by Nonameover 21 years ago32 messagesgeneral

Jump to latest

Noname

nd02tsk@student.hig.se

over 21 years ago

Hello!

I have a couple of final ( I hope, for your sake ) questions regarding
PostgreSQL.

I understand PostgreSQL uses processes rather than threads. I found this
statement in the archives:

"The developers agree that multiple processes provide
more benefits (mostly in stability and robustness) than costs (more
connection startup costs). The startup costs are easily overcome by
using connection pooling.
"

Please explain why it is more stable and robust? More from the above
statement:

"Also, each query can only use one processor; a single query can't be
executed in parallel across many CPUs. However, several queries running
concurrently will be spread across the available CPUs."

And it is because of the PostgreSQL process architecture that a query
can't be executed by many CPU:s right? Although I wonder if this is the
case in MySQL. It only says in their manual that each connection is a
thread.

Also, MySQL has a library for embedded aplications, the say:

"We also provide MySQL Server as an embedded multi-threaded library that
you can link into your application to get a smaller, faster,
easier-to-manage product."

Do PostgreSQL offer anything similar?

Thank you for your time.

Tim

Doug McNaught

doug@mcnaught.org

over 21 years ago

In reply to: Noname (#1)

Re: Reasoning behind process instead of thread based

nd02tsk@student.hig.se writes:

"The developers agree that multiple processes provide
more benefits (mostly in stability and robustness) than costs (more
connection startup costs). The startup costs are easily overcome by
using connection pooling.
"

Please explain why it is more stable and robust?

Because threads share the same memory space, a runaway thread can
corrupt the entire system by writing to the wrong part of memory.
With separate processes, the only data that is shared is that which is
meant to be shared, which reduces the potential for such damage.

"Also, each query can only use one processor; a single query can't be
executed in parallel across many CPUs. However, several queries running
concurrently will be spread across the available CPUs."

And it is because of the PostgreSQL process architecture that a query
can't be executed by many CPU:s right?

There's no theoretical reason that a query couldn't be split across
multiple helper processes, but no one's implemented that feature--it
would be a pretty major job.

Also, MySQL has a library for embedded aplications, the say:

"We also provide MySQL Server as an embedded multi-threaded library that
you can link into your application to get a smaller, faster,
easier-to-manage product."

Do PostgreSQL offer anything similar?

No. See the archives for extensive discussion of why PG doesn't do
this.

-Doug

Scott Marlowe

smarlowe@qwest.net

over 21 years ago

In reply to: Noname (#1)

Re: Reasoning behind process instead of thread based

On Wed, 2004-10-27 at 09:56, nd02tsk@student.hig.se wrote:

Hello!

I have a couple of final ( I hope, for your sake ) questions regarding
PostgreSQL.

I understand PostgreSQL uses processes rather than threads. I found this
statement in the archives:

"The developers agree that multiple processes provide
more benefits (mostly in stability and robustness) than costs (more
connection startup costs). The startup costs are easily overcome by
using connection pooling.
"

Please explain why it is more stable and robust? More from the above
statement:

This question shows up every 6 months or so. You might wanna search the
archives (I use google to do that, but YMMV with the postgresql site's
search engine.)

Basically, there are a few issues with threading that pop up their ugly
heads. One: Not all OSes thread libraries are created equal. There
was a nasty bug in one of the BSDs that causes MySQL to crash a couple
years ago that drove them nuts. So programming a threaded
implementation means you have the vagaries of different levels of
quality and robustness of thread libraries to deal with. Two: If a
single process in a multi-process application crashes, that process
alone dies. The buffer is flushed, and all the other child processes
continue happily along. In a multi-threaded environment, when one
thread dies, they all die. Three: Multi-threaded applications can be
prone to race conditions that are VERY hard to troubleshoot, especially
if they occur once every million or so times the triggering event
happens.

On some operating systems, like Windows and Solaris, processes are
expensive, while threads are cheap, so to speak. this is not the case
in Linux or BSD, where the differences are much smaller, and the
multi-process design suffers no great disadvantage.

"Also, each query can only use one processor; a single query can't be
executed in parallel across many CPUs. However, several queries running
concurrently will be spread across the available CPUs."

And it is because of the PostgreSQL process architecture that a query
can't be executed by many CPU:s right? Although I wonder if this is the
case in MySQL. It only says in their manual that each connection is a
thread.

Actually, if it were converted to multi-threaded tomorrow, it would
still be true, because the postgresql engine isn't designed to split off
queries into constituent parts to be executed by seperate threads or
processes. Conversely, if one wished to implement it, one could likely
patch postgresql to break up parts of queries to different child
processes of the current child process (grand child processes so to
speak) that would allow a query to hit multiple CPUs.

Also, MySQL has a library for embedded aplications, the say:

"We also provide MySQL Server as an embedded multi-threaded library that
you can link into your application to get a smaller, faster,
easier-to-manage product."

Do PostgreSQL offer anything similar?

No, because in that design, if your application crashes, so does, by
extension, your database. Now, I'd argue that if I had to choose
between which database to have crash in the middle of transactions, I'd
pick PostgreSQL, it's generally considered a bad thing to have a
database crash mid transaction. PostgreSQL is more robust about crash
recovery, but still...

That's another subject that shows up every x months, an embedded version
of PostgreSQL. Basically, the suggestion is to use something like
SQLlite, which is built to be embedded, and therefore has a much lower
footprint than PostgreSQL could ever hope to achieve. No one wants
their embedded library using up gobs of RAM and disk space when it's
just handling one thread / process doing one thing. It's like
delivering Pizzas with a Ferrari, you could do it, it just eouldn't make
a lot of sense.