PostgreSQL process architecture question.

Started by Amberover 17 years ago8 messagesgeneral

guxiaobo1982@hotmail.com

over 17 years ago

We know PostgreSQL uses one dedicated server process to serve one client connection, what we want to know is whether PostgreSQL use multiple threads inside agents processes to take advantage of multiple CPUs. In our site we have only a few concurrent connections, so what occurs inside agent process is very important to us.

Scott Marlowe

scott.marlowe@gmail.com

over 17 years ago

In reply to: Amber (#1)

Re: PostgreSQL process architecture question.

On Tue, Sep 9, 2008 at 9:35 AM, Amber <guxiaobo1982@hotmail.com> wrote:

We know PostgreSQL uses one dedicated server process to serve one client
connection, what we want to know is whether PostgreSQL use multiple threads
inside agents processes to take advantage of multiple CPUs. In our site we
have only a few concurrent connections, so what occurs inside agent process
is very important to us.

No it doesn't. One connection gets one process which uses one CPU at a time.

Andrew Sullivan

ajs@commandprompt.com

over 17 years ago

In reply to: Amber (#1)

Re: PostgreSQL process architecture question.

On Tue, Sep 09, 2008 at 11:35:56PM +0800, Amber wrote:

We know PostgreSQL uses one dedicated server process to serve one

client connection, what we want to know is whether PostgreSQL use
multiple threads inside agents processes to take advantage of multiple
CPUs.

No. Note that "threading" is not automatically necessary to get more
than one processor to work on a single query. But at the moment,
Postgres doesn't do that either.

A
--
Andrew Sullivan
ajs@commandprompt.com
+1 503 667 4564 x104
http://www.commandprompt.com/

Holger Hoffstaette

holger@wizards.de

over 17 years ago

In reply to: Amber (#1)

Re: PostgreSQL process architecture question.

On Tue, 09 Sep 2008 10:07:32 -0600, Scott Marlowe wrote:

On Tue, Sep 9, 2008 at 9:35 AM, Amber <guxiaobo1982@hotmail.com> wrote:

We know PostgreSQL uses one dedicated server process to serve one client
connection, what we want to know is whether PostgreSQL use multiple threads
inside agents processes to take advantage of multiple CPUs. In our site we
have only a few concurrent connections, so what occurs inside agent process
is very important to us.

No it doesn't. One connection gets one process which uses one CPU at a time.

I understand the history/technical reasons/motivation for this, yet want
to ask if anybody has thought about using OpenMP for careful
parallelization of per-process work sections? Scanning large (e.g. already
locked) arrays, parallel sweeps or calculations might benefit from
parallelizatoin without requiring a full-out threaded design. Such an
approach could retain the per-process isolation model yet still reap
multicore benefits. To boot OpenMP is pretty easy to use and comes with
gcc.

Since I don't know much about PG's internals and their data dependencies
etc. this might well be a dumb idea, but I figured asking couldn't hurt. :)

regards
Holger

Amber

guxiaobo1982@hotmail.com

over 17 years ago

In reply to: Holger Hoffstaette (#4)

Re: PostgreSQL process architecture question.

That's it, we have 4 CPUs, each of which has 4 cores, that is we have 16 cores in total, but we have only 4 to 8 concurrent users, who regularly run complex queries. That is we can't use all our CPU resources in such a situation to speed up response time.

To: pgsql-general@postgresql.org> From: holger@wizards.de> Subject: Re: [GENERAL] PostgreSQL process architecture question.> Date: Tue, 9 Sep 2008 18:30:17 +0200> > On Tue, 09 Sep 2008 10:07:32 -0600, Scott Marlowe wrote:> > > On Tue, Sep 9, 2008 at 9:35 AM, Amber <guxiaobo1982@hotmail.com> wrote:> >> We know PostgreSQL uses one dedicated server process to serve one client> >> connection, what we want to know is whether PostgreSQL use multiple threads> >> inside agents processes to take advantage of multiple CPUs. In our site we> >> have only a few concurrent connections, so what occurs inside agent process> >> is very important to us.> > > > No it doesn't. One connection gets one process which uses one CPU at a time.> > I understand the history/technical reasons/motivation for this, yet want> to ask if anybody has thought about using OpenMP for careful> parallelization of per-process work sections? Scanning large (e.g. already> locked) arrays, parallel sweeps or calculatio

ns might benefit from> parallelizatoin without requiring a full-out threaded design. Such an> approach could retain the per-process isolation model yet still reap> multicore benefits. To boot OpenMP is pretty easy to use and comes with> gcc.> > Since I don't know much about PG's internals and their data dependencies> etc. this might well be a dumb idea, but I figured asking couldn't hurt. :)> > regards> Holger> > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org)> To make changes to your subscription:> http://www.postgresql.org/mailpref/pgsql-general
_________________________________________________________________
Invite your mail contacts to join your friends list with Windows Live Spaces. It's easy!
http://spaces.live.com/spacesapi.aspx?wx_action=create&wx_url=/friends.aspx&mkt=en-us

Scott Marlowe

scott.marlowe@gmail.com

over 17 years ago

In reply to: Amber (#5)

Re: PostgreSQL process architecture question.

On Tue, Sep 9, 2008 at 11:17 PM, 小波顾 <guxiaobo1982@hotmail.com> wrote:

That's it, we have 4 CPUs, each of which has 4 cores, that is we have 16
cores in total, but we have only 4 to 8 concurrent users, who regularly run
complex queries. That is we can't use all our CPU resources in such a
situation to speed up response time.

Unless you have either a small data set or a very powerful RAID array,
most the time you won't be CPU bound anyway. But it would be nice to
see some work come out to parallelize some of the work done in the
back end.

Chris Browne

cbbrowne@acm.org

over 17 years ago

In reply to: Amber (#1)

Re: PostgreSQL process architecture question.

guxiaobo1982@hotmail.com ("Amber") writes:

We know PostgreSQL uses one dedicated server process to serve one
client connection, what we want to know is whether PostgreSQL use
multiple threads inside agents processes to take advantage of
multiple CPUs. In our site we have only a few concurrent
connections, so what occurs inside > agent process is very
important to us.

No, PostgreSQL does not attempt to make any use of threading at this
time. The FAQ describes this quite nicely:

http://wiki.postgresql.org/wiki/Developer_FAQ#Why_don.27t_you_use_threads.2C_raw_devices.2C_async-I.2FO.2C_.3Cinsert_your_favorite_wizz-bang_feature_here.3E.3F

"Why don't you use threads, raw devices, async-I/O, <insert your favorite wizz-bang feature here>?

There is always a temptation to use the newest operating system features as soon as they arrive. We resist that temptation.

First, we support 15+ operating systems, so any new feature has to be
well established before we will consider it. Second, most new
wizz-bang features don't provide dramatic improvements. Third, they
usually have some downside, such as decreased reliability or
additional code required. Therefore, we don't rush to use new features
but rather wait for the feature to be established, then ask for
testing to show that a measurable improvement is possible.

As an example, threads are not currently used in the backend code because:

* Historically, threads were unsupported and buggy.
* An error in one backend can corrupt other backends.
* Speed improvements using threads are small compared to the remaining backend startup time.
* The backend code would be more complex.

So, we are not ignorant of new features. It is just that we are
cautious about their adoption. The TODO list often contains links to
discussions showing our reasoning in these areas."
--
select 'cbbrowne' || '@' || 'cbbrowne.com';
http://cbbrowne.com/info/oses.html
Given recent events in Florida, the tourism board in Texas has
developed a new advertising campaign based on the slogan "Ya'll come
to Texas, where we ain't shot a tourist in a car since November 1963."

Reece Hart

reece@harts.net

over 17 years ago

In reply to: Scott Marlowe (#6)

Re: PostgreSQL process architecture question.

On Wed, 2008-09-10 at 00:02 -0600, Scott Marlowe wrote:

Unless you have either a small data set or a very powerful RAID array,
most the time you won't be CPU bound anyway. But it would be nice to
see some work come out to parallelize some of the work done in the
back end.

I would have agreed with this several years ago, but many folks now buy
enough RAM to reduce the impact of IO. We're routinely CPU-bound on
small queries, and even on some large ones, on a 32GB / 16-core Opteron
box that serves a ~200GB database (on disk tables+indexes).

Does anyone know of research/references on query optimizers that include
parallelization as part of the cost estimate? I can envision how
PostgreSQL might parallelize a query plan that was optimized with an
assumption of one core. However, I wonder whether cpu and io costs are
sufficient for efficient parallel query optimization -- presumably
contention for memory (for parallel sorts, say) becomes critical.

-Reece

--
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0