Parallel query execution

Started by Bruce Momjianover 13 years ago139 messageshackers

bruce@momjian.us

over 13 years ago

I mentioned last year that I wanted to start working on parallelism:

https://wiki.postgresql.org/wiki/Parallel_Query_Execution

Years ago I added thread-safety to libpq. Recently I added two parallel
execution paths to pg_upgrade. The first parallel path allows execution
of external binaries pg_dump and psql (to restore). The second parallel
path does copy/link by calling fork/thread-safe C functions. I was able
to do each in 2-3 days.

I believe it is time to start adding parallel execution to the backend.
We already have some parallelism in the backend:
effective_io_concurrency and helper processes. I think it is time we
start to consider additional options.

Parallelism isn't going to help all queries, in fact it might be just a
small subset, but it will be the larger queries. The pg_upgrade
parallelism only helps clusters with multiple databases or tablespaces,
but the improvements are significant.

I have summarized my ideas by updating our Parallel Query Execution wiki
page:

https://wiki.postgresql.org/wiki/Parallel_Query_Execution

Please consider updating the page yourself or posting your ideas to this
thread. Thanks.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Stephen Frost

sfrost@snowman.net

over 13 years ago

In reply to: Bruce Momjian (#1)

Re: Parallel query execution

* Bruce Momjian (bruce@momjian.us) wrote:

Parallelism isn't going to help all queries, in fact it might be just a
small subset, but it will be the larger queries. The pg_upgrade
parallelism only helps clusters with multiple databases or tablespaces,
but the improvements are significant.

This would be fantastic and I'd like to help. Parallel query and real
partitioning are two of our biggest holes for OLAP and data warehouse
users.

Please consider updating the page yourself or posting your ideas to this
thread. Thanks.

Will do.

Thanks,

Stephen

Peter Geoghegan

pg@bowt.ie

over 13 years ago

In reply to: Bruce Momjian (#1)

Re: Parallel query execution

On 15 January 2013 22:14, Bruce Momjian <bruce@momjian.us> wrote:

I believe it is time to start adding parallel execution to the backend.
We already have some parallelism in the backend:
effective_io_concurrency and helper processes. I think it is time we
start to consider additional options.

A few months back, I remarked [1]/messages/by-id/CAEYLb_VeZpKDX54VEx3X30oy_UOTh89XoejJW6aucjjiUjskXw@mail.gmail.com that speeding up sorting using
pipelining and asynchronous I/O was probably parallelism low-hanging
fruit. That hasn't changed, though I personally still don't have the
bandwidth to look into it in a serious way.

[1]: /messages/by-id/CAEYLb_VeZpKDX54VEx3X30oy_UOTh89XoejJW6aucjjiUjskXw@mail.gmail.com

--
Peter Geoghegan http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Bruce Momjian

bruce@momjian.us

over 13 years ago

In reply to: Peter Geoghegan (#3)

Re: Parallel query execution

On Tue, Jan 15, 2013 at 10:39:10PM +0000, Peter Geoghegan wrote:

On 15 January 2013 22:14, Bruce Momjian <bruce@momjian.us> wrote:

I believe it is time to start adding parallel execution to the backend.
We already have some parallelism in the backend:
effective_io_concurrency and helper processes. I think it is time we
start to consider additional options.

A few months back, I remarked [1] that speeding up sorting using
pipelining and asynchronous I/O was probably parallelism low-hanging
fruit. That hasn't changed, though I personally still don't have the
bandwidth to look into it in a serious way.

[1] /messages/by-id/CAEYLb_VeZpKDX54VEx3X30oy_UOTh89XoejJW6aucjjiUjskXw@mail.gmail.com

OK, I added the link to the wiki.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Simon Riggs

simon@2ndQuadrant.com

over 13 years ago

In reply to: Bruce Momjian (#1)

Re: Parallel query execution

On 15 January 2013 22:14, Bruce Momjian <bruce@momjian.us> wrote:

I mentioned last year that I wanted to start working on parallelism:

We don't normally begin discussing topics for next release just as a
CF is starting.

Why is this being discussed now?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Bruce Momjian

bruce@momjian.us

over 13 years ago

In reply to: Simon Riggs (#5)

Re: Parallel query execution

On Tue, Jan 15, 2013 at 10:53:29PM +0000, Simon Riggs wrote:

On 15 January 2013 22:14, Bruce Momjian <bruce@momjian.us> wrote:

I mentioned last year that I wanted to start working on parallelism:

We don't normally begin discussing topics for next release just as a
CF is starting.

Why is this being discussed now?

It is for 9.4 and will take months. I didn't think there was a better
time. We don't usually discuss features during beta testing.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Simon Riggs

simon@2ndQuadrant.com

over 13 years ago

In reply to: Bruce Momjian (#6)

Re: Parallel query execution

On 15 January 2013 22:55, Bruce Momjian <bruce@momjian.us> wrote:

Why is this being discussed now?

It is for 9.4 and will take months. I didn't think there was a better
time. We don't usually discuss features during beta testing.

Bruce, there are many, many patches on the queue. How will we ever get
to beta testing if we begin open ended discussions on next release?

If we can't finish what we've started for 9.3, why talk about 9.4?

Yes, its a great topic for discussion, but there are better times.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Bruce Momjian

bruce@momjian.us

over 13 years ago

In reply to: Simon Riggs (#7)

Re: Parallel query execution

On Tue, Jan 15, 2013 at 11:01:04PM +0000, Simon Riggs wrote:

On 15 January 2013 22:55, Bruce Momjian <bruce@momjian.us> wrote:

Why is this being discussed now?

It is for 9.4 and will take months. I didn't think there was a better
time. We don't usually discuss features during beta testing.

Bruce, there are many, many patches on the queue. How will we ever get
to beta testing if we begin open ended discussions on next release?

If we can't finish what we've started for 9.3, why talk about 9.4?

Yes, its a great topic for discussion, but there are better times.

Like when? I don't remember a policy of not discussing things now.
Does anyone else remember this? Are you saying feature discussion is
only between commit-fests? Is this written down anywhere? I only
remember beta-time as a time not to discuss features.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Gavin Flower

GavinFlower@archidevsys.co.nz

over 13 years ago

In reply to: Bruce Momjian (#1)

Re: Parallel query execution

On 16/01/13 11:14, Bruce Momjian wrote:

I mentioned last year that I wanted to start working on parallelism:

https://wiki.postgresql.org/wiki/Parallel_Query_Execution

Years ago I added thread-safety to libpq. Recently I added two parallel
execution paths to pg_upgrade. The first parallel path allows execution
of external binaries pg_dump and psql (to restore). The second parallel
path does copy/link by calling fork/thread-safe C functions. I was able
to do each in 2-3 days.

I believe it is time to start adding parallel execution to the backend.
We already have some parallelism in the backend:
effective_io_concurrency and helper processes. I think it is time we
start to consider additional options.

Parallelism isn't going to help all queries, in fact it might be just a
small subset, but it will be the larger queries. The pg_upgrade
parallelism only helps clusters with multiple databases or tablespaces,
but the improvements are significant.

I have summarized my ideas by updating our Parallel Query Execution wiki
page:

https://wiki.postgresql.org/wiki/Parallel_Query_Execution

Please consider updating the page yourself or posting your ideas to this
thread. Thanks.

Hmm...

How about being aware of multiple spindles - so if the requested data
covers multiple spindles, then data could be extracted in parallel. This
may, or may not, involve multiple I/O channels?

On large multiple processor machines, there are different blocks of
memory that might be accessed at different speeds depending on the
processor. Possibly a mechanism could be used to split a transaction
over multiple processors to ensure the fastest memory is used?

Once a selection of rows has been made, then if there is a lot of
reformatting going on, then could this be done in parallel? I can of
think of 2 very simplistic strategies: (A) use a different processor
core for each column, or (B) farm out sets of rows to different cores.
I am sure in reality, there are more subtleties and aspects of both the
strategies will be used in a hybrid fashion along with other approaches.

I expect that before any parallel algorithm is invoked, then some sort
of threshold needs to be exceeded to make it worth while. Different
aspects of the parallel algorithm may have their own thresholds. It may
not be worth applying a parallel algorithm for 10 rows from a simple
table, but selecting 10,000 records from multiple tables each over 10
million rows using joins may benefit for more extreme parallelism.

I expect that UNIONs, as well as the processing of partitioned tables,
may be amenable to parallel processing.

Cheers,
Gavin

#10

Bruce Momjian

bruce@momjian.us

over 13 years ago

In reply to: Gavin Flower (#9)

Re: Parallel query execution

On Wed, Jan 16, 2013 at 12:03:50PM +1300, Gavin Flower wrote:

On 16/01/13 11:14, Bruce Momjian wrote:

I mentioned last year that I wanted to start working on parallelism:

https://wiki.postgresql.org/wiki/Parallel_Query_Execution

Years ago I added thread-safety to libpq. Recently I added two parallel
execution paths to pg_upgrade. The first parallel path allows execution
of external binaries pg_dump and psql (to restore). The second parallel
path does copy/link by calling fork/thread-safe C functions. I was able
to do each in 2-3 days.

I believe it is time to start adding parallel execution to the backend.
We already have some parallelism in the backend:
effective_io_concurrency and helper processes. I think it is time we
start to consider additional options.

Parallelism isn't going to help all queries, in fact it might be just a
small subset, but it will be the larger queries. The pg_upgrade
parallelism only helps clusters with multiple databases or tablespaces,
but the improvements are significant.

I have summarized my ideas by updating our Parallel Query Execution wiki
page:

https://wiki.postgresql.org/wiki/Parallel_Query_Execution

Please consider updating the page yourself or posting your ideas to this
thread. Thanks.

Hmm...

How about being aware of multiple spindles - so if the requested data covers
multiple spindles, then data could be extracted in parallel. This may, or may
not, involve multiple I/O channels?

Well, we usually label these as tablespaces. I don't know if
spindle-level is a reasonable level to add.

On large multiple processor machines, there are different blocks of memory that
might be accessed at different speeds depending on the processor. Possibly a
mechanism could be used to split a transaction over multiple processors to
ensure the fastest memory is used?

That seems too far-out for an initial approach.

Once a selection of rows has been made, then if there is a lot of reformatting
going on, then could this be done in parallel? I can of think of 2 very
simplistic strategies: (A) use a different processor core for each column, or
(B) farm out sets of rows to different cores. I am sure in reality, there are
more subtleties and aspects of both the strategies will be used in a hybrid
fashion along with other approaches.

Probably #2, but that is going to require having some of modules
thread/fork-safe, and that is going to be tricky.

I expect that before any parallel algorithm is invoked, then some sort of
threshold needs to be exceeded to make it worth while. Different aspects of
the parallel algorithm may have their own thresholds. It may not be worth
applying a parallel algorithm for 10 rows from a simple table, but selecting
10,000 records from multiple tables each over 10 million rows using joins may
benefit for more extreme parallelism.

Right, I bet we will need some way to control when the overhead of
parallel execution is worth it.

I expect that UNIONs, as well as the processing of partitioned tables, may be
amenable to parallel processing.

Interesting idea on UNION.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Stephen Frost

sfrost@snowman.net

over 13 years ago

In reply to: Gavin Flower (#9)

Re: Parallel query execution

* Gavin Flower (GavinFlower@archidevsys.co.nz) wrote:

How about being aware of multiple spindles - so if the requested
data covers multiple spindles, then data could be extracted in
parallel. This may, or may not, involve multiple I/O channels?

Yes, this should dovetail with partitioning and tablespaces to pick up
on exactly that. We're implementing our own poor-man's parallelism
using exactly this to use as much of the CPU and I/O bandwidth as we
can. I have every confidence that it could be done better and be
simpler for us if it was handled in the backend.

On large multiple processor machines, there are different blocks of
memory that might be accessed at different speeds depending on the
processor. Possibly a mechanism could be used to split a transaction
over multiple processors to ensure the fastest memory is used?

Let's work on getting it working on the h/w that PG is most commonly
deployed on first.. I agree that we don't want to paint ourselves into
a corner with this, but I don't think massive NUMA systems are what we
should focus on first (are you familiar with any that run PG today..?).
I don't expect we're going to be trying to fight with the Linux (or
whatever) kernel over what threads run on what processors with access to
what memory on small-NUMA systems (x86-based).

Once a selection of rows has been made, then if there is a lot of
reformatting going on, then could this be done in parallel? I can
of think of 2 very simplistic strategies: (A) use a different
processor core for each column, or (B) farm out sets of rows to
different cores. I am sure in reality, there are more subtleties
and aspects of both the strategies will be used in a hybrid fashion
along with other approaches.

Given our row-based storage architecture, I can't imagine we'd do
anything other than take a row-based approach to this.. I would think
we'd do two things: parallelize based on partitioning, and parallelize
seqscan's across the individual heap files which are split on a per-1G
boundary already. Perhaps we can generalize that and scale it based on
the number of available processors and the size of the relation but I
could see advantages in matching up with what the kernel thinks are
independent files.

I expect that before any parallel algorithm is invoked, then some
sort of threshold needs to be exceeded to make it worth while.

Certainly. That's need to be included in the optimization model to
support this.

Thanks,

Stephen

#12

Bruce Momjian

bruce@momjian.us

over 13 years ago

In reply to: Stephen Frost (#11)

Re: Parallel query execution

On Tue, Jan 15, 2013 at 06:15:57PM -0500, Stephen Frost wrote:

* Gavin Flower (GavinFlower@archidevsys.co.nz) wrote:

How about being aware of multiple spindles - so if the requested
data covers multiple spindles, then data could be extracted in
parallel. This may, or may not, involve multiple I/O channels?

Yes, this should dovetail with partitioning and tablespaces to pick up
on exactly that. We're implementing our own poor-man's parallelism
using exactly this to use as much of the CPU and I/O bandwidth as we
can. I have every confidence that it could be done better and be
simpler for us if it was handled in the backend.

Yes, I have listed tablespaces and partitions as possible parallel
options on the wiki.

On large multiple processor machines, there are different blocks of
memory that might be accessed at different speeds depending on the
processor. Possibly a mechanism could be used to split a transaction
over multiple processors to ensure the fastest memory is used?

Let's work on getting it working on the h/w that PG is most commonly
deployed on first.. I agree that we don't want to paint ourselves into
a corner with this, but I don't think massive NUMA systems are what we
should focus on first (are you familiar with any that run PG today..?).
I don't expect we're going to be trying to fight with the Linux (or
whatever) kernel over what threads run on what processors with access to
what memory on small-NUMA systems (x86-based).

Agreed.

Once a selection of rows has been made, then if there is a lot of
reformatting going on, then could this be done in parallel? I can
of think of 2 very simplistic strategies: (A) use a different
processor core for each column, or (B) farm out sets of rows to
different cores. I am sure in reality, there are more subtleties
and aspects of both the strategies will be used in a hybrid fashion
along with other approaches.

Given our row-based storage architecture, I can't imagine we'd do
anything other than take a row-based approach to this.. I would think
we'd do two things: parallelize based on partitioning, and parallelize
seqscan's across the individual heap files which are split on a per-1G
boundary already. Perhaps we can generalize that and scale it based on
the number of available processors and the size of the relation but I
could see advantages in matching up with what the kernel thinks are
independent files.

The 1GB idea is interesting. I found in pg_upgrade that file copy would
just overwhelm the I/O channel, and that doing multiple copies on the
same device had no win, but those were pure I/O operations --- a
sequential scan might be enough of a mix of I/O and CPU that parallelism
might help.

I expect that before any parallel algorithm is invoked, then some
sort of threshold needs to be exceeded to make it worth while.

Certainly. That's need to be included in the optimization model to
support this.

I have updated the wiki to reflect the ideas mentioned above.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Michael Paquier

michael@paquier.xyz

over 13 years ago

In reply to: Bruce Momjian (#1)

Re: Parallel query execution

On Wed, Jan 16, 2013 at 7:14 AM, Bruce Momjian <bruce@momjian.us> wrote:

I mentioned last year that I wanted to start working on parallelism:

https://wiki.postgresql.org/wiki/Parallel_Query_Execution

Years ago I added thread-safety to libpq. Recently I added two parallel
execution paths to pg_upgrade. The first parallel path allows execution
of external binaries pg_dump and psql (to restore). The second parallel
path does copy/link by calling fork/thread-safe C functions. I was able
to do each in 2-3 days.

I believe it is time to start adding parallel execution to the backend.
We already have some parallelism in the backend:
effective_io_concurrency and helper processes. I think it is time we
start to consider additional options.

Parallelism isn't going to help all queries, in fact it might be just a
small subset, but it will be the larger queries. The pg_upgrade
parallelism only helps clusters with multiple databases or tablespaces,
but the improvements are significant.

I have summarized my ideas by updating our Parallel Query Execution wiki
page:

https://wiki.postgresql.org/wiki/Parallel_Query_Execution

Please consider updating the page yourself or posting your ideas to this
thread. Thanks.

Honestly that would be a great feature, and I would be happy helping
working on it.
Taking advantage of parallelism in a server with multiple core, especially
for things like large sorting operations would be great.
Just thinking loudly, but wouldn't it be the role of the planner to
determine if such or such query is worth using parallelism? The executor
would then be in charge of actually firing the tasks in parallel that
planner has determined necessary to do.
--
Michael Paquier
http://michael.otacoo.com

#14

Bruce Momjian

bruce@momjian.us

over 13 years ago

In reply to: Michael Paquier (#13)

Re: Parallel query execution

On Wed, Jan 16, 2013 at 09:11:20AM +0900, Michael Paquier wrote:

On Wed, Jan 16, 2013 at 7:14 AM, Bruce Momjian <bruce@momjian.us> wrote:

I mentioned last year that I wanted to start working on parallelism:

https://wiki.postgresql.org/wiki/Parallel_Query_Execution

Years ago I added thread-safety to libpq. Recently I added two parallel
execution paths to pg_upgrade. The first parallel path allows execution
of external binaries pg_dump and psql (to restore). The second parallel
path does copy/link by calling fork/thread-safe C functions. I was able
to do each in 2-3 days.

I believe it is time to start adding parallel execution to the backend.
We already have some parallelism in the backend:
effective_io_concurrency and helper processes. I think it is time we
start to consider additional options.

Parallelism isn't going to help all queries, in fact it might be just a
small subset, but it will be the larger queries. The pg_upgrade
parallelism only helps clusters with multiple databases or tablespaces,
but the improvements are significant.

I have summarized my ideas by updating our Parallel Query Execution wiki
page:

https://wiki.postgresql.org/wiki/Parallel_Query_Execution

Please consider updating the page yourself or posting your ideas to this
thread. Thanks.

Honestly that would be a great feature, and I would be happy helping working on
it.
Taking advantage of parallelism in a server with multiple core, especially for
things like large sorting operations would be great.
Just thinking loudly, but wouldn't it be the role of the planner to determine
if such or such query is worth using parallelism? The executor would then be in
charge of actually firing the tasks in parallel that planner has determined
necessary to do.

Yes, it would probably be driven off of the optimizer statistics.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Claudio Freire

klaussfreire@gmail.com

over 13 years ago

In reply to: Bruce Momjian (#12)

Re: Parallel query execution

On Tue, Jan 15, 2013 at 8:19 PM, Bruce Momjian <bruce@momjian.us> wrote:

Given our row-based storage architecture, I can't imagine we'd do
anything other than take a row-based approach to this.. I would think
we'd do two things: parallelize based on partitioning, and parallelize
seqscan's across the individual heap files which are split on a per-1G
boundary already. Perhaps we can generalize that and scale it based on
the number of available processors and the size of the relation but I
could see advantages in matching up with what the kernel thinks are
independent files.

The 1GB idea is interesting. I found in pg_upgrade that file copy would
just overwhelm the I/O channel, and that doing multiple copies on the
same device had no win, but those were pure I/O operations --- a
sequential scan might be enough of a mix of I/O and CPU that parallelism
might help.

AFAIR, synchroscans were introduced because multiple large sequential
scans were counterproductive (big time).

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Stephen Frost

sfrost@snowman.net

over 13 years ago

In reply to: Claudio Freire (#15)

Re: Parallel query execution

* Claudio Freire (klaussfreire@gmail.com) wrote:

On Tue, Jan 15, 2013 at 8:19 PM, Bruce Momjian <bruce@momjian.us> wrote:

The 1GB idea is interesting. I found in pg_upgrade that file copy would
just overwhelm the I/O channel, and that doing multiple copies on the
same device had no win, but those were pure I/O operations --- a
sequential scan might be enough of a mix of I/O and CPU that parallelism
might help.

AFAIR, synchroscans were introduced because multiple large sequential
scans were counterproductive (big time).

Sequentially scanning the *same* data over and over is certainly
counterprouctive. Synchroscans fixed that, yes. That's not what we're
talking about though- we're talking about scanning and processing
independent sets of data using multiple processes. It's certainly
possible that in some cases that won't be as good, but there will be
quite a few cases where it's much, much better.

Consider a very complicated function running against each row which
makes the CPU the bottleneck instead of the i/o system. That type of a
query will never run faster than a single CPU in a single-process
environment, regardless of if you have synch-scans or not, while in a
multi-process environment you'll take advantage of the extra CPUs which
are available and use more of the I/O bandwidth that isn't yet
exhausted.

Thanks,

Stephen

#17

Claudio Freire

klaussfreire@gmail.com

over 13 years ago

In reply to: Stephen Frost (#16)

Re: Parallel query execution

On Wed, Jan 16, 2013 at 12:13 AM, Stephen Frost <sfrost@snowman.net> wrote:

* Claudio Freire (klaussfreire@gmail.com) wrote:

On Tue, Jan 15, 2013 at 8:19 PM, Bruce Momjian <bruce@momjian.us> wrote:

The 1GB idea is interesting. I found in pg_upgrade that file copy would
just overwhelm the I/O channel, and that doing multiple copies on the
same device had no win, but those were pure I/O operations --- a
sequential scan might be enough of a mix of I/O and CPU that parallelism
might help.

AFAIR, synchroscans were introduced because multiple large sequential
scans were counterproductive (big time).

Sequentially scanning the *same* data over and over is certainly
counterprouctive. Synchroscans fixed that, yes. That's not what we're
talking about though- we're talking about scanning and processing
independent sets of data using multiple processes.

I don't see the difference. Blocks are blocks (unless they're cached).

It's certainly
possible that in some cases that won't be as good

If memory serves me correctly (and it does, I suffered it a lot), the
performance hit is quite considerable. Enough to make it "a lot worse"
rather than "not as good".

but there will be
quite a few cases where it's much, much better.

Just cached segments.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Stephen Frost

sfrost@snowman.net

over 13 years ago

In reply to: Claudio Freire (#17)

Re: Parallel query execution

* Claudio Freire (klaussfreire@gmail.com) wrote:

On Wed, Jan 16, 2013 at 12:13 AM, Stephen Frost <sfrost@snowman.net> wrote:

Sequentially scanning the *same* data over and over is certainly
counterprouctive. Synchroscans fixed that, yes. That's not what we're
talking about though- we're talking about scanning and processing
independent sets of data using multiple processes.

I don't see the difference. Blocks are blocks (unless they're cached).

Not quite. Having to go out to the kernel isn't free. Additionally,
the seq scans used to pollute our shared buffers prior to
synch-scanning, which didn't help things.

It's certainly
possible that in some cases that won't be as good

If memory serves me correctly (and it does, I suffered it a lot), the
performance hit is quite considerable. Enough to make it "a lot worse"
rather than "not as good".

I feel like we must not be communicating very well.

If the CPU is pegged at 100% and the I/O system is at 20%, adding
another CPU at 100% will bring the I/O load up to 40% and you're now
processing data twice as fast overall. If you're running a single CPU
at 20% and your I/O system is at 100%, then adding another CPU isn't
going to help and may even degrade performance by causing problems for
the I/O system. The goal of the optimizer will be to model the plan to
account for exactly that, as best it can.

but there will be
quite a few cases where it's much, much better.

Just cached segments.

No, certainly not just cached segments. Any situation where the CPU is
the bottleneck.

Thanks,

Stephen

#19

Josh Berkus

josh@agliodbs.com

over 13 years ago

In reply to: Claudio Freire (#17)

Re: Parallel query execution

but there will be
quite a few cases where it's much, much better.

Just cached segments.

Actually, thanks to much faster storage (think SSD, SAN), it's easily
possible for PostgreSQL to become CPU-limited on a seq scan query, even
when reading from disk.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Stephen Frost

sfrost@snowman.net

over 13 years ago

In reply to: Josh Berkus (#19)

Re: Parallel query execution

* Josh Berkus (josh@agliodbs.com) wrote:

Actually, thanks to much faster storage (think SSD, SAN), it's easily
possible for PostgreSQL to become CPU-limited on a seq scan query, even
when reading from disk.

Particularly with a complex filter being applied or if it's feeding into
something above that's expensive..

Thanks,

Stephen

#21

Josh Berkus

josh@agliodbs.com

over 13 years ago

In reply to: Stephen Frost (#20)

#22

Michael Paquier

michael@paquier.xyz

over 13 years ago

In reply to: Josh Berkus (#21)

#23

Bruce Momjian

bruce@momjian.us

over 13 years ago

In reply to: Michael Paquier (#22)

#24

Michael Paquier

michael@paquier.xyz

over 13 years ago

In reply to: Bruce Momjian (#23)

#25

Claudio Freire

klaussfreire@gmail.com

over 13 years ago

In reply to: Stephen Frost (#18)

#26

Alvaro Herrera

alvherre@2ndquadrant.com

over 13 years ago

In reply to: Bruce Momjian (#23)

#27

Jeff Janes

jeff.janes@gmail.com

over 13 years ago

In reply to: Simon Riggs (#7)

#28

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Alvaro Herrera (#26)

#29

Abhijit Menon-Sen

ams@2ndQuadrant.com

over 13 years ago

In reply to: Tom Lane (#28)

#30

Daniel Farina

daniel@heroku.com

over 13 years ago

In reply to: Tom Lane (#28)

#31

Magnus Hagander

magnus@hagander.net

over 13 years ago

In reply to: Bruce Momjian (#8)

#32

Andres Freund

andres@anarazel.de

over 13 years ago

In reply to: Tom Lane (#28)

#33

Magnus Hagander

magnus@hagander.net

over 13 years ago

In reply to: Abhijit Menon-Sen (#29)

#34

Simon Riggs

simon@2ndQuadrant.com

over 13 years ago

In reply to: Abhijit Menon-Sen (#29)

#35

Magnus Hagander

magnus@hagander.net

over 13 years ago

In reply to: Simon Riggs (#34)

#36

Robert Haas

robertmhaas@gmail.com

over 13 years ago

In reply to: Tom Lane (#28)

#37

Robert Haas

robertmhaas@gmail.com

over 13 years ago

In reply to: Magnus Hagander (#31)

#38

Abhijit Menon-Sen

ams@2ndQuadrant.com

over 13 years ago

In reply to: Magnus Hagander (#33)

#39

Stephen Frost

sfrost@snowman.net

over 13 years ago

In reply to: Claudio Freire (#25)

#40

Stephen Frost

sfrost@snowman.net

over 13 years ago

In reply to: Tom Lane (#28)

#41

Stephen Frost

sfrost@snowman.net

over 13 years ago

In reply to: Daniel Farina (#30)

#42

Boszormenyi Zoltan

zb@cybertec.at

over 13 years ago

In reply to: Abhijit Menon-Sen (#38)

#43

Alvaro Herrera

alvherre@2ndquadrant.com

over 13 years ago

In reply to: Simon Riggs (#34)

#44

Stephen Frost

sfrost@snowman.net

over 13 years ago

In reply to: Alvaro Herrera (#43)

#45

Andrew Dunstan

andrew@dunslane.net

over 13 years ago

In reply to: Bruce Momjian (#23)

#46

Abhijit Menon-Sen

ams@2ndQuadrant.com

over 13 years ago

In reply to: Stephen Frost (#44)

#47

Craig Ringer

craig@2ndquadrant.com

over 13 years ago

In reply to: Stephen Frost (#44)

#48

Craig Ringer

craig@2ndquadrant.com

over 13 years ago

In reply to: Simon Riggs (#34)

#49

Stephen Frost

sfrost@snowman.net

over 13 years ago

In reply to: Abhijit Menon-Sen (#46)

#50

Claudio Freire

klaussfreire@gmail.com

over 13 years ago

In reply to: Stephen Frost (#39)

#51

Noah Misch

noah@leadboat.com

over 13 years ago

In reply to: Magnus Hagander (#33)

#52

Noah Misch

noah@leadboat.com

over 13 years ago

In reply to: Stephen Frost (#41)

#53

Bruce Momjian

bruce@momjian.us

over 13 years ago

In reply to: Michael Paquier (#24)

#54

Bruce Momjian

bruce@momjian.us

over 13 years ago

In reply to: Alvaro Herrera (#26)

#55

Bruce Momjian

bruce@momjian.us

over 13 years ago

In reply to: Robert Haas (#37)

#56

Bruce Momjian

bruce@momjian.us

over 13 years ago

In reply to: Andrew Dunstan (#45)

#57

Andrew Dunstan

andrew@dunslane.net

over 13 years ago

In reply to: Bruce Momjian (#56)

#58

Josh Berkus

josh@agliodbs.com

over 13 years ago

In reply to: Simon Riggs (#34)

#59

Josh Berkus

josh@agliodbs.com

over 13 years ago

In reply to: Bruce Momjian (#55)

#60

Bruce Momjian

bruce@momjian.us

over 13 years ago

In reply to: Josh Berkus (#59)

#61

Josh Berkus

josh@agliodbs.com

over 13 years ago

In reply to: Bruce Momjian (#60)

#62

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Stephen Frost (#49)

#63

Josh Berkus

josh@agliodbs.com

over 13 years ago

In reply to: Tom Lane (#62)

#64

Stephen Frost

sfrost@snowman.net

over 13 years ago

In reply to: Bruce Momjian (#53)

#65

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Josh Berkus (#63)

#66

Bruce Momjian

bruce@momjian.us

over 13 years ago

In reply to: Tom Lane (#65)

#67

Pavel Stehule

pavel.stehule@gmail.com

over 13 years ago

In reply to: Stephen Frost (#64)

#68

Bruce Momjian

bruce@momjian.us

over 13 years ago

In reply to: Pavel Stehule (#67)

#69

Dickson S. Guedes

listas@guedesoft.net

over 13 years ago

In reply to: Bruce Momjian (#68)

#70

Bruce Momjian

bruce@momjian.us

over 13 years ago

In reply to: Dickson S. Guedes (#69)

#71

Robert Haas

robertmhaas@gmail.com

over 13 years ago

In reply to: Josh Berkus (#58)

#72

Jeff Janes

jeff.janes@gmail.com

over 13 years ago

In reply to: Stephen Frost (#11)

#73

Jeff Janes

jeff.janes@gmail.com

over 13 years ago

In reply to: Gavin Flower (#9)

#74

Simon Riggs

simon@2ndQuadrant.com

over 13 years ago

In reply to: Tom Lane (#62)

#75

Claudio Freire

klaussfreire@gmail.com

over 13 years ago

In reply to: Jeff Janes (#73)

#76

Bruce Momjian

bruce@momjian.us

over 13 years ago

In reply to: Jeff Janes (#72)

#77

Claudio Freire

klaussfreire@gmail.com

over 13 years ago

In reply to: Bruce Momjian (#76)

#78

Bruce Momjian

bruce@momjian.us

over 13 years ago

In reply to: Claudio Freire (#77)

#79

Jeff Janes

jeff.janes@gmail.com

over 13 years ago

In reply to: Stephen Frost (#64)

#80

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Simon Riggs (#74)

#81

Craig Ringer

craig@2ndquadrant.com

over 13 years ago

In reply to: Robert Haas (#71)

#82

Abhijit Menon-Sen

ams@2ndQuadrant.com

over 13 years ago

In reply to: Tom Lane (#80)

#83

Peter Eisentraut

peter_e@gmx.net

over 13 years ago

In reply to: Tom Lane (#65)

#84

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Peter Eisentraut (#83)

#85

Craig Ringer

craig@2ndquadrant.com

over 13 years ago

In reply to: Abhijit Menon-Sen (#82)

#86

Michael Paquier

michael@paquier.xyz

over 13 years ago

In reply to: Craig Ringer (#85)

#87

Abhijit Menon-Sen

ams@2ndQuadrant.com

over 13 years ago

In reply to: Michael Paquier (#86)

#88

Pavan Deolasee

pavan.deolasee@gmail.com

over 13 years ago

In reply to: Abhijit Menon-Sen (#29)

#89

Magnus Hagander

magnus@hagander.net

over 13 years ago

In reply to: Abhijit Menon-Sen (#87)

#90

Magnus Hagander

magnus@hagander.net

over 13 years ago

In reply to: Pavan Deolasee (#88)

#91

Craig Ringer

craig@2ndquadrant.com

over 13 years ago

In reply to: Magnus Hagander (#89)

#92

Robert Haas

robertmhaas@gmail.com

over 13 years ago

In reply to: Craig Ringer (#91)

#93

Jeff Janes

jeff.janes@gmail.com

over 13 years ago

In reply to: Magnus Hagander (#90)

#94

Robert Haas

robertmhaas@gmail.com

over 13 years ago

In reply to: Jeff Janes (#93)

#95

Simon Riggs

simon@2ndQuadrant.com

over 13 years ago

In reply to: Robert Haas (#94)

#96

Pavel Stehule

pavel.stehule@gmail.com

over 13 years ago

In reply to: Simon Riggs (#95)

#97

Robert Haas

robertmhaas@gmail.com

over 13 years ago

In reply to: Simon Riggs (#95)

#98

Jeff Janes

jeff.janes@gmail.com

over 13 years ago

In reply to: Simon Riggs (#95)

#99

Robert Haas

robertmhaas@gmail.com

over 13 years ago

In reply to: Jeff Janes (#98)

#100

Andrew Dunstan

andrew@dunslane.net

over 13 years ago

In reply to: Robert Haas (#99)

#101

Robert Haas

robertmhaas@gmail.com

over 13 years ago

In reply to: Andrew Dunstan (#100)

#102

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Robert Haas (#99)

#103

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 13 years ago

In reply to: Jeff Janes (#98)

#104

Josh Berkus

josh@agliodbs.com

over 13 years ago

In reply to: Heikki Linnakangas (#103)

#105

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Josh Berkus (#104)

#106

Josh Berkus

josh@agliodbs.com

over 13 years ago

In reply to: Tom Lane (#105)

#107

Robert Haas

robertmhaas@gmail.com

over 13 years ago

In reply to: Josh Berkus (#106)

#108

Dimitri Fontaine

dimitri@2ndQuadrant.fr

over 13 years ago

In reply to: Robert Haas (#99)

#109

Phil Sorber

phil@omniti.com

over 13 years ago

In reply to: Abhijit Menon-Sen (#38)

#110

Robert Haas

robertmhaas@gmail.com

over 13 years ago

In reply to: Phil Sorber (#109)

#111

Robert Haas

robertmhaas@gmail.com

over 13 years ago

In reply to: Dimitri Fontaine (#108)

#112

Josh Berkus

josh@agliodbs.com

over 13 years ago

In reply to: Robert Haas (#111)

#113

Phil Sorber

phil@omniti.com

over 13 years ago

In reply to: Robert Haas (#111)

#114

Phil Sorber

phil@omniti.com

over 13 years ago

In reply to: Josh Berkus (#112)

#115

Robert Haas

robertmhaas@gmail.com

over 13 years ago

In reply to: Josh Berkus (#112)

#116

Pavan Deolasee

pavan.deolasee@gmail.com

over 13 years ago

In reply to: Phil Sorber (#114)

#117

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Pavan Deolasee (#116)

#118

Magnus Hagander

magnus@hagander.net

over 13 years ago

In reply to: Robert Haas (#110)

#119

Dimitri Fontaine

dimitri@2ndQuadrant.fr

over 13 years ago

In reply to: Tom Lane (#117)

#120

Pavan Deolasee

pavan.deolasee@gmail.com

over 13 years ago

In reply to: Tom Lane (#117)

#121

Dimitri Fontaine

dimitri@2ndQuadrant.fr

over 13 years ago

In reply to: Robert Haas (#111)

#122

Pavel Stehule

pavel.stehule@gmail.com

over 13 years ago

In reply to: Pavan Deolasee (#120)

#123

Andrew Dunstan

andrew@dunslane.net

over 13 years ago

In reply to: Tom Lane (#117)

#124

Gavin Flower

GavinFlower@archidevsys.co.nz

over 13 years ago

In reply to: Dimitri Fontaine (#119)

#125

Robert Haas

robertmhaas@gmail.com

over 13 years ago

In reply to: Tom Lane (#117)

#126

Andres Freund

andres@anarazel.de

over 13 years ago

In reply to: Robert Haas (#125)

#127

Bruce Momjian

bruce@momjian.us

over 13 years ago

In reply to: Tom Lane (#105)

#128

Josh Berkus

josh@agliodbs.com

over 13 years ago

In reply to: Andres Freund (#126)

#129

Joshua D. Drake

jd@commandprompt.com

over 13 years ago

In reply to: Josh Berkus (#128)

#130

Stephen Frost

sfrost@snowman.net

over 13 years ago

In reply to: Robert Haas (#125)

#131

Phil Sorber

phil@omniti.com

over 13 years ago

In reply to: Stephen Frost (#130)

#132

Robert Haas

robertmhaas@gmail.com

over 13 years ago

In reply to: Phil Sorber (#131)

#133

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 13 years ago

In reply to: Stephen Frost (#130)

#134

Stephen Frost

sfrost@snowman.net

over 13 years ago

In reply to: Heikki Linnakangas (#133)

#135

Pavel Stehule

pavel.stehule@gmail.com

over 13 years ago

In reply to: Stephen Frost (#134)

#136

Pavan Deolasee

pavan.deolasee@gmail.com

over 13 years ago

In reply to: Robert Haas (#125)

#137

Amit Kapila

amit.kapila16@gmail.com

over 13 years ago

In reply to: Stephen Frost (#134)

#138

Paul Ramsey

pramsey@cleverelephant.ca

over 13 years ago

In reply to: Bruce Momjian (#1)

#139

Bruce Momjian

bruce@momjian.us

over 13 years ago

In reply to: Paul Ramsey (#138)