Parallel query execution

Started by Bruce Momjianabout 13 years ago139 messageshackers
Jump to latest
#1Bruce Momjian
bruce@momjian.us

I mentioned last year that I wanted to start working on parallelism:

https://wiki.postgresql.org/wiki/Parallel_Query_Execution

Years ago I added thread-safety to libpq. Recently I added two parallel
execution paths to pg_upgrade. The first parallel path allows execution
of external binaries pg_dump and psql (to restore). The second parallel
path does copy/link by calling fork/thread-safe C functions. I was able
to do each in 2-3 days.

I believe it is time to start adding parallel execution to the backend.
We already have some parallelism in the backend:
effective_io_concurrency and helper processes. I think it is time we
start to consider additional options.

Parallelism isn't going to help all queries, in fact it might be just a
small subset, but it will be the larger queries. The pg_upgrade
parallelism only helps clusters with multiple databases or tablespaces,
but the improvements are significant.

I have summarized my ideas by updating our Parallel Query Execution wiki
page:

https://wiki.postgresql.org/wiki/Parallel_Query_Execution

Please consider updating the page yourself or posting your ideas to this
thread. Thanks.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Stephen Frost
sfrost@snowman.net
In reply to: Bruce Momjian (#1)
Re: Parallel query execution

* Bruce Momjian (bruce@momjian.us) wrote:

Parallelism isn't going to help all queries, in fact it might be just a
small subset, but it will be the larger queries. The pg_upgrade
parallelism only helps clusters with multiple databases or tablespaces,
but the improvements are significant.

This would be fantastic and I'd like to help. Parallel query and real
partitioning are two of our biggest holes for OLAP and data warehouse
users.

Please consider updating the page yourself or posting your ideas to this
thread. Thanks.

Will do.

Thanks,

Stephen

In reply to: Bruce Momjian (#1)
Re: Parallel query execution

On 15 January 2013 22:14, Bruce Momjian <bruce@momjian.us> wrote:

I believe it is time to start adding parallel execution to the backend.
We already have some parallelism in the backend:
effective_io_concurrency and helper processes. I think it is time we
start to consider additional options.

A few months back, I remarked [1]/messages/by-id/CAEYLb_VeZpKDX54VEx3X30oy_UOTh89XoejJW6aucjjiUjskXw@mail.gmail.com that speeding up sorting using
pipelining and asynchronous I/O was probably parallelism low-hanging
fruit. That hasn't changed, though I personally still don't have the
bandwidth to look into it in a serious way.

[1]: /messages/by-id/CAEYLb_VeZpKDX54VEx3X30oy_UOTh89XoejJW6aucjjiUjskXw@mail.gmail.com

--
Peter Geoghegan http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Bruce Momjian
bruce@momjian.us
In reply to: Peter Geoghegan (#3)
Re: Parallel query execution

On Tue, Jan 15, 2013 at 10:39:10PM +0000, Peter Geoghegan wrote:

On 15 January 2013 22:14, Bruce Momjian <bruce@momjian.us> wrote:

I believe it is time to start adding parallel execution to the backend.
We already have some parallelism in the backend:
effective_io_concurrency and helper processes. I think it is time we
start to consider additional options.

A few months back, I remarked [1] that speeding up sorting using
pipelining and asynchronous I/O was probably parallelism low-hanging
fruit. That hasn't changed, though I personally still don't have the
bandwidth to look into it in a serious way.

[1] /messages/by-id/CAEYLb_VeZpKDX54VEx3X30oy_UOTh89XoejJW6aucjjiUjskXw@mail.gmail.com

OK, I added the link to the wiki.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Simon Riggs
simon@2ndQuadrant.com
In reply to: Bruce Momjian (#1)
Re: Parallel query execution

On 15 January 2013 22:14, Bruce Momjian <bruce@momjian.us> wrote:

I mentioned last year that I wanted to start working on parallelism:

We don't normally begin discussing topics for next release just as a
CF is starting.

Why is this being discussed now?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#5)
Re: Parallel query execution

On Tue, Jan 15, 2013 at 10:53:29PM +0000, Simon Riggs wrote:

On 15 January 2013 22:14, Bruce Momjian <bruce@momjian.us> wrote:

I mentioned last year that I wanted to start working on parallelism:

We don't normally begin discussing topics for next release just as a
CF is starting.

Why is this being discussed now?

It is for 9.4 and will take months. I didn't think there was a better
time. We don't usually discuss features during beta testing.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Simon Riggs
simon@2ndQuadrant.com
In reply to: Bruce Momjian (#6)
Re: Parallel query execution

On 15 January 2013 22:55, Bruce Momjian <bruce@momjian.us> wrote:

Why is this being discussed now?

It is for 9.4 and will take months. I didn't think there was a better
time. We don't usually discuss features during beta testing.

Bruce, there are many, many patches on the queue. How will we ever get
to beta testing if we begin open ended discussions on next release?

If we can't finish what we've started for 9.3, why talk about 9.4?

Yes, its a great topic for discussion, but there are better times.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#7)
Re: Parallel query execution

On Tue, Jan 15, 2013 at 11:01:04PM +0000, Simon Riggs wrote:

On 15 January 2013 22:55, Bruce Momjian <bruce@momjian.us> wrote:

Why is this being discussed now?

It is for 9.4 and will take months. I didn't think there was a better
time. We don't usually discuss features during beta testing.

Bruce, there are many, many patches on the queue. How will we ever get
to beta testing if we begin open ended discussions on next release?

If we can't finish what we've started for 9.3, why talk about 9.4?

Yes, its a great topic for discussion, but there are better times.

Like when? I don't remember a policy of not discussing things now.
Does anyone else remember this? Are you saying feature discussion is
only between commit-fests? Is this written down anywhere? I only
remember beta-time as a time not to discuss features.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Gavin Flower
GavinFlower@archidevsys.co.nz
In reply to: Bruce Momjian (#1)
Re: Parallel query execution

On 16/01/13 11:14, Bruce Momjian wrote:

I mentioned last year that I wanted to start working on parallelism:

https://wiki.postgresql.org/wiki/Parallel_Query_Execution

Years ago I added thread-safety to libpq. Recently I added two parallel
execution paths to pg_upgrade. The first parallel path allows execution
of external binaries pg_dump and psql (to restore). The second parallel
path does copy/link by calling fork/thread-safe C functions. I was able
to do each in 2-3 days.

I believe it is time to start adding parallel execution to the backend.
We already have some parallelism in the backend:
effective_io_concurrency and helper processes. I think it is time we
start to consider additional options.

Parallelism isn't going to help all queries, in fact it might be just a
small subset, but it will be the larger queries. The pg_upgrade
parallelism only helps clusters with multiple databases or tablespaces,
but the improvements are significant.

I have summarized my ideas by updating our Parallel Query Execution wiki
page:

https://wiki.postgresql.org/wiki/Parallel_Query_Execution

Please consider updating the page yourself or posting your ideas to this
thread. Thanks.

Hmm...

How about being aware of multiple spindles - so if the requested data
covers multiple spindles, then data could be extracted in parallel. This
may, or may not, involve multiple I/O channels?

On large multiple processor machines, there are different blocks of
memory that might be accessed at different speeds depending on the
processor. Possibly a mechanism could be used to split a transaction
over multiple processors to ensure the fastest memory is used?

Once a selection of rows has been made, then if there is a lot of
reformatting going on, then could this be done in parallel? I can of
think of 2 very simplistic strategies: (A) use a different processor
core for each column, or (B) farm out sets of rows to different cores.
I am sure in reality, there are more subtleties and aspects of both the
strategies will be used in a hybrid fashion along with other approaches.

I expect that before any parallel algorithm is invoked, then some sort
of threshold needs to be exceeded to make it worth while. Different
aspects of the parallel algorithm may have their own thresholds. It may
not be worth applying a parallel algorithm for 10 rows from a simple
table, but selecting 10,000 records from multiple tables each over 10
million rows using joins may benefit for more extreme parallelism.

I expect that UNIONs, as well as the processing of partitioned tables,
may be amenable to parallel processing.

Cheers,
Gavin

#10Bruce Momjian
bruce@momjian.us
In reply to: Gavin Flower (#9)
Re: Parallel query execution

On Wed, Jan 16, 2013 at 12:03:50PM +1300, Gavin Flower wrote:

On 16/01/13 11:14, Bruce Momjian wrote:

I mentioned last year that I wanted to start working on parallelism:

https://wiki.postgresql.org/wiki/Parallel_Query_Execution

Years ago I added thread-safety to libpq. Recently I added two parallel
execution paths to pg_upgrade. The first parallel path allows execution
of external binaries pg_dump and psql (to restore). The second parallel
path does copy/link by calling fork/thread-safe C functions. I was able
to do each in 2-3 days.

I believe it is time to start adding parallel execution to the backend.
We already have some parallelism in the backend:
effective_io_concurrency and helper processes. I think it is time we
start to consider additional options.

Parallelism isn't going to help all queries, in fact it might be just a
small subset, but it will be the larger queries. The pg_upgrade
parallelism only helps clusters with multiple databases or tablespaces,
but the improvements are significant.

I have summarized my ideas by updating our Parallel Query Execution wiki
page:

https://wiki.postgresql.org/wiki/Parallel_Query_Execution

Please consider updating the page yourself or posting your ideas to this
thread. Thanks.

Hmm...

How about being aware of multiple spindles - so if the requested data covers
multiple spindles, then data could be extracted in parallel. This may, or may
not, involve multiple I/O channels?

Well, we usually label these as tablespaces. I don't know if
spindle-level is a reasonable level to add.

On large multiple processor machines, there are different blocks of memory that
might be accessed at different speeds depending on the processor. Possibly a
mechanism could be used to split a transaction over multiple processors to
ensure the fastest memory is used?

That seems too far-out for an initial approach.

Once a selection of rows has been made, then if there is a lot of reformatting
going on, then could this be done in parallel? I can of think of 2 very
simplistic strategies: (A) use a different processor core for each column, or
(B) farm out sets of rows to different cores. I am sure in reality, there are
more subtleties and aspects of both the strategies will be used in a hybrid
fashion along with other approaches.

Probably #2, but that is going to require having some of modules
thread/fork-safe, and that is going to be tricky.

I expect that before any parallel algorithm is invoked, then some sort of
threshold needs to be exceeded to make it worth while. Different aspects of
the parallel algorithm may have their own thresholds. It may not be worth
applying a parallel algorithm for 10 rows from a simple table, but selecting
10,000 records from multiple tables each over 10 million rows using joins may
benefit for more extreme parallelism.

Right, I bet we will need some way to control when the overhead of
parallel execution is worth it.

I expect that UNIONs, as well as the processing of partitioned tables, may be
amenable to parallel processing.

Interesting idea on UNION.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Stephen Frost
sfrost@snowman.net
In reply to: Gavin Flower (#9)
Re: Parallel query execution

* Gavin Flower (GavinFlower@archidevsys.co.nz) wrote:

How about being aware of multiple spindles - so if the requested
data covers multiple spindles, then data could be extracted in
parallel. This may, or may not, involve multiple I/O channels?

Yes, this should dovetail with partitioning and tablespaces to pick up
on exactly that. We're implementing our own poor-man's parallelism
using exactly this to use as much of the CPU and I/O bandwidth as we
can. I have every confidence that it could be done better and be
simpler for us if it was handled in the backend.

On large multiple processor machines, there are different blocks of
memory that might be accessed at different speeds depending on the
processor. Possibly a mechanism could be used to split a transaction
over multiple processors to ensure the fastest memory is used?

Let's work on getting it working on the h/w that PG is most commonly
deployed on first.. I agree that we don't want to paint ourselves into
a corner with this, but I don't think massive NUMA systems are what we
should focus on first (are you familiar with any that run PG today..?).
I don't expect we're going to be trying to fight with the Linux (or
whatever) kernel over what threads run on what processors with access to
what memory on small-NUMA systems (x86-based).

Once a selection of rows has been made, then if there is a lot of
reformatting going on, then could this be done in parallel? I can
of think of 2 very simplistic strategies: (A) use a different
processor core for each column, or (B) farm out sets of rows to
different cores. I am sure in reality, there are more subtleties
and aspects of both the strategies will be used in a hybrid fashion
along with other approaches.

Given our row-based storage architecture, I can't imagine we'd do
anything other than take a row-based approach to this.. I would think
we'd do two things: parallelize based on partitioning, and parallelize
seqscan's across the individual heap files which are split on a per-1G
boundary already. Perhaps we can generalize that and scale it based on
the number of available processors and the size of the relation but I
could see advantages in matching up with what the kernel thinks are
independent files.

I expect that before any parallel algorithm is invoked, then some
sort of threshold needs to be exceeded to make it worth while.

Certainly. That's need to be included in the optimization model to
support this.

Thanks,

Stephen

#12Bruce Momjian
bruce@momjian.us
In reply to: Stephen Frost (#11)
Re: Parallel query execution

On Tue, Jan 15, 2013 at 06:15:57PM -0500, Stephen Frost wrote:

* Gavin Flower (GavinFlower@archidevsys.co.nz) wrote:

How about being aware of multiple spindles - so if the requested
data covers multiple spindles, then data could be extracted in
parallel. This may, or may not, involve multiple I/O channels?

Yes, this should dovetail with partitioning and tablespaces to pick up
on exactly that. We're implementing our own poor-man's parallelism
using exactly this to use as much of the CPU and I/O bandwidth as we
can. I have every confidence that it could be done better and be
simpler for us if it was handled in the backend.

Yes, I have listed tablespaces and partitions as possible parallel
options on the wiki.

On large multiple processor machines, there are different blocks of
memory that might be accessed at different speeds depending on the
processor. Possibly a mechanism could be used to split a transaction
over multiple processors to ensure the fastest memory is used?

Let's work on getting it working on the h/w that PG is most commonly
deployed on first.. I agree that we don't want to paint ourselves into
a corner with this, but I don't think massive NUMA systems are what we
should focus on first (are you familiar with any that run PG today..?).
I don't expect we're going to be trying to fight with the Linux (or
whatever) kernel over what threads run on what processors with access to
what memory on small-NUMA systems (x86-based).

Agreed.

Once a selection of rows has been made, then if there is a lot of
reformatting going on, then could this be done in parallel? I can
of think of 2 very simplistic strategies: (A) use a different
processor core for each column, or (B) farm out sets of rows to
different cores. I am sure in reality, there are more subtleties
and aspects of both the strategies will be used in a hybrid fashion
along with other approaches.

Given our row-based storage architecture, I can't imagine we'd do
anything other than take a row-based approach to this.. I would think
we'd do two things: parallelize based on partitioning, and parallelize
seqscan's across the individual heap files which are split on a per-1G
boundary already. Perhaps we can generalize that and scale it based on
the number of available processors and the size of the relation but I
could see advantages in matching up with what the kernel thinks are
independent files.

The 1GB idea is interesting. I found in pg_upgrade that file copy would
just overwhelm the I/O channel, and that doing multiple copies on the
same device had no win, but those were pure I/O operations --- a
sequential scan might be enough of a mix of I/O and CPU that parallelism
might help.

I expect that before any parallel algorithm is invoked, then some
sort of threshold needs to be exceeded to make it worth while.

Certainly. That's need to be included in the optimization model to
support this.

I have updated the wiki to reflect the ideas mentioned above.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Michael Paquier
michael@paquier.xyz
In reply to: Bruce Momjian (#1)
Re: Parallel query execution

On Wed, Jan 16, 2013 at 7:14 AM, Bruce Momjian <bruce@momjian.us> wrote:

I mentioned last year that I wanted to start working on parallelism:

https://wiki.postgresql.org/wiki/Parallel_Query_Execution

Years ago I added thread-safety to libpq. Recently I added two parallel
execution paths to pg_upgrade. The first parallel path allows execution
of external binaries pg_dump and psql (to restore). The second parallel
path does copy/link by calling fork/thread-safe C functions. I was able
to do each in 2-3 days.

I believe it is time to start adding parallel execution to the backend.
We already have some parallelism in the backend:
effective_io_concurrency and helper processes. I think it is time we
start to consider additional options.

Parallelism isn't going to help all queries, in fact it might be just a
small subset, but it will be the larger queries. The pg_upgrade
parallelism only helps clusters with multiple databases or tablespaces,
but the improvements are significant.

I have summarized my ideas by updating our Parallel Query Execution wiki
page:

https://wiki.postgresql.org/wiki/Parallel_Query_Execution

Please consider updating the page yourself or posting your ideas to this
thread. Thanks.

Honestly that would be a great feature, and I would be happy helping
working on it.
Taking advantage of parallelism in a server with multiple core, especially
for things like large sorting operations would be great.
Just thinking loudly, but wouldn't it be the role of the planner to
determine if such or such query is worth using parallelism? The executor
would then be in charge of actually firing the tasks in parallel that
planner has determined necessary to do.
--
Michael Paquier
http://michael.otacoo.com

#14Bruce Momjian
bruce@momjian.us
In reply to: Michael Paquier (#13)
Re: Parallel query execution

On Wed, Jan 16, 2013 at 09:11:20AM +0900, Michael Paquier wrote:

On Wed, Jan 16, 2013 at 7:14 AM, Bruce Momjian <bruce@momjian.us> wrote:

I mentioned last year that I wanted to start working on parallelism:

https://wiki.postgresql.org/wiki/Parallel_Query_Execution

Years ago I added thread-safety to libpq. Recently I added two parallel
execution paths to pg_upgrade. The first parallel path allows execution
of external binaries pg_dump and psql (to restore). The second parallel
path does copy/link by calling fork/thread-safe C functions. I was able
to do each in 2-3 days.

I believe it is time to start adding parallel execution to the backend.
We already have some parallelism in the backend:
effective_io_concurrency and helper processes. I think it is time we
start to consider additional options.

Parallelism isn't going to help all queries, in fact it might be just a
small subset, but it will be the larger queries. The pg_upgrade
parallelism only helps clusters with multiple databases or tablespaces,
but the improvements are significant.

I have summarized my ideas by updating our Parallel Query Execution wiki
page:

https://wiki.postgresql.org/wiki/Parallel_Query_Execution

Please consider updating the page yourself or posting your ideas to this
thread. Thanks.

Honestly that would be a great feature, and I would be happy helping working on
it.
Taking advantage of parallelism in a server with multiple core, especially for
things like large sorting operations would be great.
Just thinking loudly, but wouldn't it be the role of the planner to determine
if such or such query is worth using parallelism? The executor would then be in
charge of actually firing the tasks in parallel that planner has determined
necessary to do.

Yes, it would probably be driven off of the optimizer statistics.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Claudio Freire
klaussfreire@gmail.com
In reply to: Bruce Momjian (#12)
Re: Parallel query execution

On Tue, Jan 15, 2013 at 8:19 PM, Bruce Momjian <bruce@momjian.us> wrote:

Given our row-based storage architecture, I can't imagine we'd do
anything other than take a row-based approach to this.. I would think
we'd do two things: parallelize based on partitioning, and parallelize
seqscan's across the individual heap files which are split on a per-1G
boundary already. Perhaps we can generalize that and scale it based on
the number of available processors and the size of the relation but I
could see advantages in matching up with what the kernel thinks are
independent files.

The 1GB idea is interesting. I found in pg_upgrade that file copy would
just overwhelm the I/O channel, and that doing multiple copies on the
same device had no win, but those were pure I/O operations --- a
sequential scan might be enough of a mix of I/O and CPU that parallelism
might help.

AFAIR, synchroscans were introduced because multiple large sequential
scans were counterproductive (big time).

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Stephen Frost
sfrost@snowman.net
In reply to: Claudio Freire (#15)
Re: Parallel query execution

* Claudio Freire (klaussfreire@gmail.com) wrote:

On Tue, Jan 15, 2013 at 8:19 PM, Bruce Momjian <bruce@momjian.us> wrote:

The 1GB idea is interesting. I found in pg_upgrade that file copy would
just overwhelm the I/O channel, and that doing multiple copies on the
same device had no win, but those were pure I/O operations --- a
sequential scan might be enough of a mix of I/O and CPU that parallelism
might help.

AFAIR, synchroscans were introduced because multiple large sequential
scans were counterproductive (big time).

Sequentially scanning the *same* data over and over is certainly
counterprouctive. Synchroscans fixed that, yes. That's not what we're
talking about though- we're talking about scanning and processing
independent sets of data using multiple processes. It's certainly
possible that in some cases that won't be as good, but there will be
quite a few cases where it's much, much better.

Consider a very complicated function running against each row which
makes the CPU the bottleneck instead of the i/o system. That type of a
query will never run faster than a single CPU in a single-process
environment, regardless of if you have synch-scans or not, while in a
multi-process environment you'll take advantage of the extra CPUs which
are available and use more of the I/O bandwidth that isn't yet
exhausted.

Thanks,

Stephen

#17Claudio Freire
klaussfreire@gmail.com
In reply to: Stephen Frost (#16)
Re: Parallel query execution

On Wed, Jan 16, 2013 at 12:13 AM, Stephen Frost <sfrost@snowman.net> wrote:

* Claudio Freire (klaussfreire@gmail.com) wrote:

On Tue, Jan 15, 2013 at 8:19 PM, Bruce Momjian <bruce@momjian.us> wrote:

The 1GB idea is interesting. I found in pg_upgrade that file copy would
just overwhelm the I/O channel, and that doing multiple copies on the
same device had no win, but those were pure I/O operations --- a
sequential scan might be enough of a mix of I/O and CPU that parallelism
might help.

AFAIR, synchroscans were introduced because multiple large sequential
scans were counterproductive (big time).

Sequentially scanning the *same* data over and over is certainly
counterprouctive. Synchroscans fixed that, yes. That's not what we're
talking about though- we're talking about scanning and processing
independent sets of data using multiple processes.

I don't see the difference. Blocks are blocks (unless they're cached).

It's certainly
possible that in some cases that won't be as good

If memory serves me correctly (and it does, I suffered it a lot), the
performance hit is quite considerable. Enough to make it "a lot worse"
rather than "not as good".

but there will be
quite a few cases where it's much, much better.

Just cached segments.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Stephen Frost
sfrost@snowman.net
In reply to: Claudio Freire (#17)
Re: Parallel query execution

* Claudio Freire (klaussfreire@gmail.com) wrote:

On Wed, Jan 16, 2013 at 12:13 AM, Stephen Frost <sfrost@snowman.net> wrote:

Sequentially scanning the *same* data over and over is certainly
counterprouctive. Synchroscans fixed that, yes. That's not what we're
talking about though- we're talking about scanning and processing
independent sets of data using multiple processes.

I don't see the difference. Blocks are blocks (unless they're cached).

Not quite. Having to go out to the kernel isn't free. Additionally,
the seq scans used to pollute our shared buffers prior to
synch-scanning, which didn't help things.

It's certainly
possible that in some cases that won't be as good

If memory serves me correctly (and it does, I suffered it a lot), the
performance hit is quite considerable. Enough to make it "a lot worse"
rather than "not as good".

I feel like we must not be communicating very well.

If the CPU is pegged at 100% and the I/O system is at 20%, adding
another CPU at 100% will bring the I/O load up to 40% and you're now
processing data twice as fast overall. If you're running a single CPU
at 20% and your I/O system is at 100%, then adding another CPU isn't
going to help and may even degrade performance by causing problems for
the I/O system. The goal of the optimizer will be to model the plan to
account for exactly that, as best it can.

but there will be
quite a few cases where it's much, much better.

Just cached segments.

No, certainly not just cached segments. Any situation where the CPU is
the bottleneck.

Thanks,

Stephen

#19Josh Berkus
josh@agliodbs.com
In reply to: Claudio Freire (#17)
Re: Parallel query execution

but there will be
quite a few cases where it's much, much better.

Just cached segments.

Actually, thanks to much faster storage (think SSD, SAN), it's easily
possible for PostgreSQL to become CPU-limited on a seq scan query, even
when reading from disk.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Stephen Frost
sfrost@snowman.net
In reply to: Josh Berkus (#19)
Re: Parallel query execution

* Josh Berkus (josh@agliodbs.com) wrote:

Actually, thanks to much faster storage (think SSD, SAN), it's easily
possible for PostgreSQL to become CPU-limited on a seq scan query, even
when reading from disk.

Particularly with a complex filter being applied or if it's feeding into
something above that's expensive..

Thanks,

Stephen

#21Josh Berkus
josh@agliodbs.com
In reply to: Stephen Frost (#20)
#22Michael Paquier
michael@paquier.xyz
In reply to: Josh Berkus (#21)
#23Bruce Momjian
bruce@momjian.us
In reply to: Michael Paquier (#22)
#24Michael Paquier
michael@paquier.xyz
In reply to: Bruce Momjian (#23)
#25Claudio Freire
klaussfreire@gmail.com
In reply to: Stephen Frost (#18)
#26Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bruce Momjian (#23)
#27Jeff Janes
jeff.janes@gmail.com
In reply to: Simon Riggs (#7)
#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#26)
#29Abhijit Menon-Sen
ams@2ndQuadrant.com
In reply to: Tom Lane (#28)
#30Daniel Farina
daniel@heroku.com
In reply to: Tom Lane (#28)
#31Magnus Hagander
magnus@hagander.net
In reply to: Bruce Momjian (#8)
#32Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#28)
#33Magnus Hagander
magnus@hagander.net
In reply to: Abhijit Menon-Sen (#29)
#34Simon Riggs
simon@2ndQuadrant.com
In reply to: Abhijit Menon-Sen (#29)
#35Magnus Hagander
magnus@hagander.net
In reply to: Simon Riggs (#34)
#36Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#28)
#37Robert Haas
robertmhaas@gmail.com
In reply to: Magnus Hagander (#31)
#38Abhijit Menon-Sen
ams@2ndQuadrant.com
In reply to: Magnus Hagander (#33)
#39Stephen Frost
sfrost@snowman.net
In reply to: Claudio Freire (#25)
#40Stephen Frost
sfrost@snowman.net
In reply to: Tom Lane (#28)
#41Stephen Frost
sfrost@snowman.net
In reply to: Daniel Farina (#30)
#42Boszormenyi Zoltan
zb@cybertec.at
In reply to: Abhijit Menon-Sen (#38)
#43Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Simon Riggs (#34)
#44Stephen Frost
sfrost@snowman.net
In reply to: Alvaro Herrera (#43)
#45Andrew Dunstan
andrew@dunslane.net
In reply to: Bruce Momjian (#23)
#46Abhijit Menon-Sen
ams@2ndQuadrant.com
In reply to: Stephen Frost (#44)
#47Craig Ringer
craig@2ndquadrant.com
In reply to: Stephen Frost (#44)
#48Craig Ringer
craig@2ndquadrant.com
In reply to: Simon Riggs (#34)
#49Stephen Frost
sfrost@snowman.net
In reply to: Abhijit Menon-Sen (#46)
#50Claudio Freire
klaussfreire@gmail.com
In reply to: Stephen Frost (#39)
#51Noah Misch
noah@leadboat.com
In reply to: Magnus Hagander (#33)
#52Noah Misch
noah@leadboat.com
In reply to: Stephen Frost (#41)
#53Bruce Momjian
bruce@momjian.us
In reply to: Michael Paquier (#24)
#54Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#26)
#55Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#37)
#56Bruce Momjian
bruce@momjian.us
In reply to: Andrew Dunstan (#45)
#57Andrew Dunstan
andrew@dunslane.net
In reply to: Bruce Momjian (#56)
#58Josh Berkus
josh@agliodbs.com
In reply to: Simon Riggs (#34)
#59Josh Berkus
josh@agliodbs.com
In reply to: Bruce Momjian (#55)
#60Bruce Momjian
bruce@momjian.us
In reply to: Josh Berkus (#59)
#61Josh Berkus
josh@agliodbs.com
In reply to: Bruce Momjian (#60)
#62Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephen Frost (#49)
#63Josh Berkus
josh@agliodbs.com
In reply to: Tom Lane (#62)
#64Stephen Frost
sfrost@snowman.net
In reply to: Bruce Momjian (#53)
#65Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#63)
#66Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#65)
#67Pavel Stehule
pavel.stehule@gmail.com
In reply to: Stephen Frost (#64)
#68Bruce Momjian
bruce@momjian.us
In reply to: Pavel Stehule (#67)
#69Dickson S. Guedes
listas@guedesoft.net
In reply to: Bruce Momjian (#68)
#70Bruce Momjian
bruce@momjian.us
In reply to: Dickson S. Guedes (#69)
#71Robert Haas
robertmhaas@gmail.com
In reply to: Josh Berkus (#58)
#72Jeff Janes
jeff.janes@gmail.com
In reply to: Stephen Frost (#11)
#73Jeff Janes
jeff.janes@gmail.com
In reply to: Gavin Flower (#9)
#74Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#62)
#75Claudio Freire
klaussfreire@gmail.com
In reply to: Jeff Janes (#73)
#76Bruce Momjian
bruce@momjian.us
In reply to: Jeff Janes (#72)
#77Claudio Freire
klaussfreire@gmail.com
In reply to: Bruce Momjian (#76)
#78Bruce Momjian
bruce@momjian.us
In reply to: Claudio Freire (#77)
#79Jeff Janes
jeff.janes@gmail.com
In reply to: Stephen Frost (#64)
#80Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#74)
#81Craig Ringer
craig@2ndquadrant.com
In reply to: Robert Haas (#71)
#82Abhijit Menon-Sen
ams@2ndQuadrant.com
In reply to: Tom Lane (#80)
#83Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#65)
#84Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#83)
#85Craig Ringer
craig@2ndquadrant.com
In reply to: Abhijit Menon-Sen (#82)
#86Michael Paquier
michael@paquier.xyz
In reply to: Craig Ringer (#85)
#87Abhijit Menon-Sen
ams@2ndQuadrant.com
In reply to: Michael Paquier (#86)
#88Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Abhijit Menon-Sen (#29)
#89Magnus Hagander
magnus@hagander.net
In reply to: Abhijit Menon-Sen (#87)
#90Magnus Hagander
magnus@hagander.net
In reply to: Pavan Deolasee (#88)
#91Craig Ringer
craig@2ndquadrant.com
In reply to: Magnus Hagander (#89)
#92Robert Haas
robertmhaas@gmail.com
In reply to: Craig Ringer (#91)
#93Jeff Janes
jeff.janes@gmail.com
In reply to: Magnus Hagander (#90)
#94Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Janes (#93)
#95Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#94)
#96Pavel Stehule
pavel.stehule@gmail.com
In reply to: Simon Riggs (#95)
#97Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#95)
#98Jeff Janes
jeff.janes@gmail.com
In reply to: Simon Riggs (#95)
#99Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Janes (#98)
#100Andrew Dunstan
andrew@dunslane.net
In reply to: Robert Haas (#99)
#101Robert Haas
robertmhaas@gmail.com
In reply to: Andrew Dunstan (#100)
#102Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#99)
#103Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Jeff Janes (#98)
#104Josh Berkus
josh@agliodbs.com
In reply to: Heikki Linnakangas (#103)
#105Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#104)
#106Josh Berkus
josh@agliodbs.com
In reply to: Tom Lane (#105)
#107Robert Haas
robertmhaas@gmail.com
In reply to: Josh Berkus (#106)
#108Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Robert Haas (#99)
#109Phil Sorber
phil@omniti.com
In reply to: Abhijit Menon-Sen (#38)
#110Robert Haas
robertmhaas@gmail.com
In reply to: Phil Sorber (#109)
#111Robert Haas
robertmhaas@gmail.com
In reply to: Dimitri Fontaine (#108)
#112Josh Berkus
josh@agliodbs.com
In reply to: Robert Haas (#111)
#113Phil Sorber
phil@omniti.com
In reply to: Robert Haas (#111)
#114Phil Sorber
phil@omniti.com
In reply to: Josh Berkus (#112)
#115Robert Haas
robertmhaas@gmail.com
In reply to: Josh Berkus (#112)
#116Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Phil Sorber (#114)
#117Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavan Deolasee (#116)
#118Magnus Hagander
magnus@hagander.net
In reply to: Robert Haas (#110)
#119Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Tom Lane (#117)
#120Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Tom Lane (#117)
#121Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Robert Haas (#111)
#122Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavan Deolasee (#120)
#123Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#117)
#124Gavin Flower
GavinFlower@archidevsys.co.nz
In reply to: Dimitri Fontaine (#119)
#125Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#117)
#126Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#125)
#127Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#105)
#128Josh Berkus
josh@agliodbs.com
In reply to: Andres Freund (#126)
#129Joshua D. Drake
jd@commandprompt.com
In reply to: Josh Berkus (#128)
#130Stephen Frost
sfrost@snowman.net
In reply to: Robert Haas (#125)
#131Phil Sorber
phil@omniti.com
In reply to: Stephen Frost (#130)
#132Robert Haas
robertmhaas@gmail.com
In reply to: Phil Sorber (#131)
#133Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Stephen Frost (#130)
#134Stephen Frost
sfrost@snowman.net
In reply to: Heikki Linnakangas (#133)
#135Pavel Stehule
pavel.stehule@gmail.com
In reply to: Stephen Frost (#134)
#136Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Robert Haas (#125)
#137Amit Kapila
amit.kapila16@gmail.com
In reply to: Stephen Frost (#134)
#138Paul Ramsey
pramsey@cleverelephant.ca
In reply to: Bruce Momjian (#1)
#139Bruce Momjian
bruce@momjian.us
In reply to: Paul Ramsey (#138)