Parallel query execution
I mentioned last year that I wanted to start working on parallelism:
https://wiki.postgresql.org/wiki/Parallel_Query_Execution
Years ago I added thread-safety to libpq. Recently I added two parallel
execution paths to pg_upgrade. The first parallel path allows execution
of external binaries pg_dump and psql (to restore). The second parallel
path does copy/link by calling fork/thread-safe C functions. I was able
to do each in 2-3 days.
I believe it is time to start adding parallel execution to the backend.
We already have some parallelism in the backend:
effective_io_concurrency and helper processes. I think it is time we
start to consider additional options.
Parallelism isn't going to help all queries, in fact it might be just a
small subset, but it will be the larger queries. The pg_upgrade
parallelism only helps clusters with multiple databases or tablespaces,
but the improvements are significant.
I have summarized my ideas by updating our Parallel Query Execution wiki
page:
https://wiki.postgresql.org/wiki/Parallel_Query_Execution
Please consider updating the page yourself or posting your ideas to this
thread. Thanks.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
* Bruce Momjian (bruce@momjian.us) wrote:
Parallelism isn't going to help all queries, in fact it might be just a
small subset, but it will be the larger queries. The pg_upgrade
parallelism only helps clusters with multiple databases or tablespaces,
but the improvements are significant.
This would be fantastic and I'd like to help. Parallel query and real
partitioning are two of our biggest holes for OLAP and data warehouse
users.
Please consider updating the page yourself or posting your ideas to this
thread. Thanks.
Will do.
Thanks,
Stephen
On 15 January 2013 22:14, Bruce Momjian <bruce@momjian.us> wrote:
I believe it is time to start adding parallel execution to the backend.
We already have some parallelism in the backend:
effective_io_concurrency and helper processes. I think it is time we
start to consider additional options.
A few months back, I remarked [1]/messages/by-id/CAEYLb_VeZpKDX54VEx3X30oy_UOTh89XoejJW6aucjjiUjskXw@mail.gmail.com that speeding up sorting using
pipelining and asynchronous I/O was probably parallelism low-hanging
fruit. That hasn't changed, though I personally still don't have the
bandwidth to look into it in a serious way.
[1]: /messages/by-id/CAEYLb_VeZpKDX54VEx3X30oy_UOTh89XoejJW6aucjjiUjskXw@mail.gmail.com
--
Peter Geoghegan http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jan 15, 2013 at 10:39:10PM +0000, Peter Geoghegan wrote:
On 15 January 2013 22:14, Bruce Momjian <bruce@momjian.us> wrote:
I believe it is time to start adding parallel execution to the backend.
We already have some parallelism in the backend:
effective_io_concurrency and helper processes. I think it is time we
start to consider additional options.A few months back, I remarked [1] that speeding up sorting using
pipelining and asynchronous I/O was probably parallelism low-hanging
fruit. That hasn't changed, though I personally still don't have the
bandwidth to look into it in a serious way.[1] /messages/by-id/CAEYLb_VeZpKDX54VEx3X30oy_UOTh89XoejJW6aucjjiUjskXw@mail.gmail.com
OK, I added the link to the wiki.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 15 January 2013 22:14, Bruce Momjian <bruce@momjian.us> wrote:
I mentioned last year that I wanted to start working on parallelism:
We don't normally begin discussing topics for next release just as a
CF is starting.
Why is this being discussed now?
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jan 15, 2013 at 10:53:29PM +0000, Simon Riggs wrote:
On 15 January 2013 22:14, Bruce Momjian <bruce@momjian.us> wrote:
I mentioned last year that I wanted to start working on parallelism:
We don't normally begin discussing topics for next release just as a
CF is starting.Why is this being discussed now?
It is for 9.4 and will take months. I didn't think there was a better
time. We don't usually discuss features during beta testing.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 15 January 2013 22:55, Bruce Momjian <bruce@momjian.us> wrote:
Why is this being discussed now?
It is for 9.4 and will take months. I didn't think there was a better
time. We don't usually discuss features during beta testing.
Bruce, there are many, many patches on the queue. How will we ever get
to beta testing if we begin open ended discussions on next release?
If we can't finish what we've started for 9.3, why talk about 9.4?
Yes, its a great topic for discussion, but there are better times.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jan 15, 2013 at 11:01:04PM +0000, Simon Riggs wrote:
On 15 January 2013 22:55, Bruce Momjian <bruce@momjian.us> wrote:
Why is this being discussed now?
It is for 9.4 and will take months. I didn't think there was a better
time. We don't usually discuss features during beta testing.Bruce, there are many, many patches on the queue. How will we ever get
to beta testing if we begin open ended discussions on next release?If we can't finish what we've started for 9.3, why talk about 9.4?
Yes, its a great topic for discussion, but there are better times.
Like when? I don't remember a policy of not discussing things now.
Does anyone else remember this? Are you saying feature discussion is
only between commit-fests? Is this written down anywhere? I only
remember beta-time as a time not to discuss features.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 16/01/13 11:14, Bruce Momjian wrote:
I mentioned last year that I wanted to start working on parallelism:
https://wiki.postgresql.org/wiki/Parallel_Query_Execution
Years ago I added thread-safety to libpq. Recently I added two parallel
execution paths to pg_upgrade. The first parallel path allows execution
of external binaries pg_dump and psql (to restore). The second parallel
path does copy/link by calling fork/thread-safe C functions. I was able
to do each in 2-3 days.I believe it is time to start adding parallel execution to the backend.
We already have some parallelism in the backend:
effective_io_concurrency and helper processes. I think it is time we
start to consider additional options.Parallelism isn't going to help all queries, in fact it might be just a
small subset, but it will be the larger queries. The pg_upgrade
parallelism only helps clusters with multiple databases or tablespaces,
but the improvements are significant.I have summarized my ideas by updating our Parallel Query Execution wiki
page:https://wiki.postgresql.org/wiki/Parallel_Query_Execution
Please consider updating the page yourself or posting your ideas to this
thread. Thanks.
Hmm...
How about being aware of multiple spindles - so if the requested data
covers multiple spindles, then data could be extracted in parallel. This
may, or may not, involve multiple I/O channels?
On large multiple processor machines, there are different blocks of
memory that might be accessed at different speeds depending on the
processor. Possibly a mechanism could be used to split a transaction
over multiple processors to ensure the fastest memory is used?
Once a selection of rows has been made, then if there is a lot of
reformatting going on, then could this be done in parallel? I can of
think of 2 very simplistic strategies: (A) use a different processor
core for each column, or (B) farm out sets of rows to different cores.
I am sure in reality, there are more subtleties and aspects of both the
strategies will be used in a hybrid fashion along with other approaches.
I expect that before any parallel algorithm is invoked, then some sort
of threshold needs to be exceeded to make it worth while. Different
aspects of the parallel algorithm may have their own thresholds. It may
not be worth applying a parallel algorithm for 10 rows from a simple
table, but selecting 10,000 records from multiple tables each over 10
million rows using joins may benefit for more extreme parallelism.
I expect that UNIONs, as well as the processing of partitioned tables,
may be amenable to parallel processing.
Cheers,
Gavin
On Wed, Jan 16, 2013 at 12:03:50PM +1300, Gavin Flower wrote:
On 16/01/13 11:14, Bruce Momjian wrote:
I mentioned last year that I wanted to start working on parallelism:
https://wiki.postgresql.org/wiki/Parallel_Query_Execution
Years ago I added thread-safety to libpq. Recently I added two parallel
execution paths to pg_upgrade. The first parallel path allows execution
of external binaries pg_dump and psql (to restore). The second parallel
path does copy/link by calling fork/thread-safe C functions. I was able
to do each in 2-3 days.I believe it is time to start adding parallel execution to the backend.
We already have some parallelism in the backend:
effective_io_concurrency and helper processes. I think it is time we
start to consider additional options.Parallelism isn't going to help all queries, in fact it might be just a
small subset, but it will be the larger queries. The pg_upgrade
parallelism only helps clusters with multiple databases or tablespaces,
but the improvements are significant.I have summarized my ideas by updating our Parallel Query Execution wiki
page:https://wiki.postgresql.org/wiki/Parallel_Query_Execution
Please consider updating the page yourself or posting your ideas to this
thread. Thanks.Hmm...
How about being aware of multiple spindles - so if the requested data covers
multiple spindles, then data could be extracted in parallel. This may, or may
not, involve multiple I/O channels?
Well, we usually label these as tablespaces. I don't know if
spindle-level is a reasonable level to add.
On large multiple processor machines, there are different blocks of memory that
might be accessed at different speeds depending on the processor. Possibly a
mechanism could be used to split a transaction over multiple processors to
ensure the fastest memory is used?
That seems too far-out for an initial approach.
Once a selection of rows has been made, then if there is a lot of reformatting
going on, then could this be done in parallel? I can of think of 2 very
simplistic strategies: (A) use a different processor core for each column, or
(B) farm out sets of rows to different cores. I am sure in reality, there are
more subtleties and aspects of both the strategies will be used in a hybrid
fashion along with other approaches.
Probably #2, but that is going to require having some of modules
thread/fork-safe, and that is going to be tricky.
I expect that before any parallel algorithm is invoked, then some sort of
threshold needs to be exceeded to make it worth while. Different aspects of
the parallel algorithm may have their own thresholds. It may not be worth
applying a parallel algorithm for 10 rows from a simple table, but selecting
10,000 records from multiple tables each over 10 million rows using joins may
benefit for more extreme parallelism.
Right, I bet we will need some way to control when the overhead of
parallel execution is worth it.
I expect that UNIONs, as well as the processing of partitioned tables, may be
amenable to parallel processing.
Interesting idea on UNION.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
* Gavin Flower (GavinFlower@archidevsys.co.nz) wrote:
How about being aware of multiple spindles - so if the requested
data covers multiple spindles, then data could be extracted in
parallel. This may, or may not, involve multiple I/O channels?
Yes, this should dovetail with partitioning and tablespaces to pick up
on exactly that. We're implementing our own poor-man's parallelism
using exactly this to use as much of the CPU and I/O bandwidth as we
can. I have every confidence that it could be done better and be
simpler for us if it was handled in the backend.
On large multiple processor machines, there are different blocks of
memory that might be accessed at different speeds depending on the
processor. Possibly a mechanism could be used to split a transaction
over multiple processors to ensure the fastest memory is used?
Let's work on getting it working on the h/w that PG is most commonly
deployed on first.. I agree that we don't want to paint ourselves into
a corner with this, but I don't think massive NUMA systems are what we
should focus on first (are you familiar with any that run PG today..?).
I don't expect we're going to be trying to fight with the Linux (or
whatever) kernel over what threads run on what processors with access to
what memory on small-NUMA systems (x86-based).
Once a selection of rows has been made, then if there is a lot of
reformatting going on, then could this be done in parallel? I can
of think of 2 very simplistic strategies: (A) use a different
processor core for each column, or (B) farm out sets of rows to
different cores. I am sure in reality, there are more subtleties
and aspects of both the strategies will be used in a hybrid fashion
along with other approaches.
Given our row-based storage architecture, I can't imagine we'd do
anything other than take a row-based approach to this.. I would think
we'd do two things: parallelize based on partitioning, and parallelize
seqscan's across the individual heap files which are split on a per-1G
boundary already. Perhaps we can generalize that and scale it based on
the number of available processors and the size of the relation but I
could see advantages in matching up with what the kernel thinks are
independent files.
I expect that before any parallel algorithm is invoked, then some
sort of threshold needs to be exceeded to make it worth while.
Certainly. That's need to be included in the optimization model to
support this.
Thanks,
Stephen
On Tue, Jan 15, 2013 at 06:15:57PM -0500, Stephen Frost wrote:
* Gavin Flower (GavinFlower@archidevsys.co.nz) wrote:
How about being aware of multiple spindles - so if the requested
data covers multiple spindles, then data could be extracted in
parallel. This may, or may not, involve multiple I/O channels?Yes, this should dovetail with partitioning and tablespaces to pick up
on exactly that. We're implementing our own poor-man's parallelism
using exactly this to use as much of the CPU and I/O bandwidth as we
can. I have every confidence that it could be done better and be
simpler for us if it was handled in the backend.
Yes, I have listed tablespaces and partitions as possible parallel
options on the wiki.
On large multiple processor machines, there are different blocks of
memory that might be accessed at different speeds depending on the
processor. Possibly a mechanism could be used to split a transaction
over multiple processors to ensure the fastest memory is used?Let's work on getting it working on the h/w that PG is most commonly
deployed on first.. I agree that we don't want to paint ourselves into
a corner with this, but I don't think massive NUMA systems are what we
should focus on first (are you familiar with any that run PG today..?).
I don't expect we're going to be trying to fight with the Linux (or
whatever) kernel over what threads run on what processors with access to
what memory on small-NUMA systems (x86-based).
Agreed.
Once a selection of rows has been made, then if there is a lot of
reformatting going on, then could this be done in parallel? I can
of think of 2 very simplistic strategies: (A) use a different
processor core for each column, or (B) farm out sets of rows to
different cores. I am sure in reality, there are more subtleties
and aspects of both the strategies will be used in a hybrid fashion
along with other approaches.Given our row-based storage architecture, I can't imagine we'd do
anything other than take a row-based approach to this.. I would think
we'd do two things: parallelize based on partitioning, and parallelize
seqscan's across the individual heap files which are split on a per-1G
boundary already. Perhaps we can generalize that and scale it based on
the number of available processors and the size of the relation but I
could see advantages in matching up with what the kernel thinks are
independent files.
The 1GB idea is interesting. I found in pg_upgrade that file copy would
just overwhelm the I/O channel, and that doing multiple copies on the
same device had no win, but those were pure I/O operations --- a
sequential scan might be enough of a mix of I/O and CPU that parallelism
might help.
I expect that before any parallel algorithm is invoked, then some
sort of threshold needs to be exceeded to make it worth while.Certainly. That's need to be included in the optimization model to
support this.
I have updated the wiki to reflect the ideas mentioned above.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jan 16, 2013 at 7:14 AM, Bruce Momjian <bruce@momjian.us> wrote:
I mentioned last year that I wanted to start working on parallelism:
https://wiki.postgresql.org/wiki/Parallel_Query_Execution
Years ago I added thread-safety to libpq. Recently I added two parallel
execution paths to pg_upgrade. The first parallel path allows execution
of external binaries pg_dump and psql (to restore). The second parallel
path does copy/link by calling fork/thread-safe C functions. I was able
to do each in 2-3 days.I believe it is time to start adding parallel execution to the backend.
We already have some parallelism in the backend:
effective_io_concurrency and helper processes. I think it is time we
start to consider additional options.Parallelism isn't going to help all queries, in fact it might be just a
small subset, but it will be the larger queries. The pg_upgrade
parallelism only helps clusters with multiple databases or tablespaces,
but the improvements are significant.I have summarized my ideas by updating our Parallel Query Execution wiki
page:https://wiki.postgresql.org/wiki/Parallel_Query_Execution
Please consider updating the page yourself or posting your ideas to this
thread. Thanks.
Honestly that would be a great feature, and I would be happy helping
working on it.
Taking advantage of parallelism in a server with multiple core, especially
for things like large sorting operations would be great.
Just thinking loudly, but wouldn't it be the role of the planner to
determine if such or such query is worth using parallelism? The executor
would then be in charge of actually firing the tasks in parallel that
planner has determined necessary to do.
--
Michael Paquier
http://michael.otacoo.com
On Wed, Jan 16, 2013 at 09:11:20AM +0900, Michael Paquier wrote:
On Wed, Jan 16, 2013 at 7:14 AM, Bruce Momjian <bruce@momjian.us> wrote:
I mentioned last year that I wanted to start working on parallelism:
https://wiki.postgresql.org/wiki/Parallel_Query_Execution
Years ago I added thread-safety to libpq. Recently I added two parallel
execution paths to pg_upgrade. The first parallel path allows execution
of external binaries pg_dump and psql (to restore). The second parallel
path does copy/link by calling fork/thread-safe C functions. I was able
to do each in 2-3 days.I believe it is time to start adding parallel execution to the backend.
We already have some parallelism in the backend:
effective_io_concurrency and helper processes. I think it is time we
start to consider additional options.Parallelism isn't going to help all queries, in fact it might be just a
small subset, but it will be the larger queries. The pg_upgrade
parallelism only helps clusters with multiple databases or tablespaces,
but the improvements are significant.I have summarized my ideas by updating our Parallel Query Execution wiki
page:https://wiki.postgresql.org/wiki/Parallel_Query_Execution
Please consider updating the page yourself or posting your ideas to this
thread. Thanks.Honestly that would be a great feature, and I would be happy helping working on
it.
Taking advantage of parallelism in a server with multiple core, especially for
things like large sorting operations would be great.
Just thinking loudly, but wouldn't it be the role of the planner to determine
if such or such query is worth using parallelism? The executor would then be in
charge of actually firing the tasks in parallel that planner has determined
necessary to do.
Yes, it would probably be driven off of the optimizer statistics.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jan 15, 2013 at 8:19 PM, Bruce Momjian <bruce@momjian.us> wrote:
Given our row-based storage architecture, I can't imagine we'd do
anything other than take a row-based approach to this.. I would think
we'd do two things: parallelize based on partitioning, and parallelize
seqscan's across the individual heap files which are split on a per-1G
boundary already. Perhaps we can generalize that and scale it based on
the number of available processors and the size of the relation but I
could see advantages in matching up with what the kernel thinks are
independent files.The 1GB idea is interesting. I found in pg_upgrade that file copy would
just overwhelm the I/O channel, and that doing multiple copies on the
same device had no win, but those were pure I/O operations --- a
sequential scan might be enough of a mix of I/O and CPU that parallelism
might help.
AFAIR, synchroscans were introduced because multiple large sequential
scans were counterproductive (big time).
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
* Claudio Freire (klaussfreire@gmail.com) wrote:
On Tue, Jan 15, 2013 at 8:19 PM, Bruce Momjian <bruce@momjian.us> wrote:
The 1GB idea is interesting. I found in pg_upgrade that file copy would
just overwhelm the I/O channel, and that doing multiple copies on the
same device had no win, but those were pure I/O operations --- a
sequential scan might be enough of a mix of I/O and CPU that parallelism
might help.AFAIR, synchroscans were introduced because multiple large sequential
scans were counterproductive (big time).
Sequentially scanning the *same* data over and over is certainly
counterprouctive. Synchroscans fixed that, yes. That's not what we're
talking about though- we're talking about scanning and processing
independent sets of data using multiple processes. It's certainly
possible that in some cases that won't be as good, but there will be
quite a few cases where it's much, much better.
Consider a very complicated function running against each row which
makes the CPU the bottleneck instead of the i/o system. That type of a
query will never run faster than a single CPU in a single-process
environment, regardless of if you have synch-scans or not, while in a
multi-process environment you'll take advantage of the extra CPUs which
are available and use more of the I/O bandwidth that isn't yet
exhausted.
Thanks,
Stephen
On Wed, Jan 16, 2013 at 12:13 AM, Stephen Frost <sfrost@snowman.net> wrote:
* Claudio Freire (klaussfreire@gmail.com) wrote:
On Tue, Jan 15, 2013 at 8:19 PM, Bruce Momjian <bruce@momjian.us> wrote:
The 1GB idea is interesting. I found in pg_upgrade that file copy would
just overwhelm the I/O channel, and that doing multiple copies on the
same device had no win, but those were pure I/O operations --- a
sequential scan might be enough of a mix of I/O and CPU that parallelism
might help.AFAIR, synchroscans were introduced because multiple large sequential
scans were counterproductive (big time).Sequentially scanning the *same* data over and over is certainly
counterprouctive. Synchroscans fixed that, yes. That's not what we're
talking about though- we're talking about scanning and processing
independent sets of data using multiple processes.
I don't see the difference. Blocks are blocks (unless they're cached).
It's certainly
possible that in some cases that won't be as good
If memory serves me correctly (and it does, I suffered it a lot), the
performance hit is quite considerable. Enough to make it "a lot worse"
rather than "not as good".
but there will be
quite a few cases where it's much, much better.
Just cached segments.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
* Claudio Freire (klaussfreire@gmail.com) wrote:
On Wed, Jan 16, 2013 at 12:13 AM, Stephen Frost <sfrost@snowman.net> wrote:
Sequentially scanning the *same* data over and over is certainly
counterprouctive. Synchroscans fixed that, yes. That's not what we're
talking about though- we're talking about scanning and processing
independent sets of data using multiple processes.I don't see the difference. Blocks are blocks (unless they're cached).
Not quite. Having to go out to the kernel isn't free. Additionally,
the seq scans used to pollute our shared buffers prior to
synch-scanning, which didn't help things.
It's certainly
possible that in some cases that won't be as goodIf memory serves me correctly (and it does, I suffered it a lot), the
performance hit is quite considerable. Enough to make it "a lot worse"
rather than "not as good".
I feel like we must not be communicating very well.
If the CPU is pegged at 100% and the I/O system is at 20%, adding
another CPU at 100% will bring the I/O load up to 40% and you're now
processing data twice as fast overall. If you're running a single CPU
at 20% and your I/O system is at 100%, then adding another CPU isn't
going to help and may even degrade performance by causing problems for
the I/O system. The goal of the optimizer will be to model the plan to
account for exactly that, as best it can.
but there will be
quite a few cases where it's much, much better.Just cached segments.
No, certainly not just cached segments. Any situation where the CPU is
the bottleneck.
Thanks,
Stephen
but there will be
quite a few cases where it's much, much better.Just cached segments.
Actually, thanks to much faster storage (think SSD, SAN), it's easily
possible for PostgreSQL to become CPU-limited on a seq scan query, even
when reading from disk.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
* Josh Berkus (josh@agliodbs.com) wrote:
Actually, thanks to much faster storage (think SSD, SAN), it's easily
possible for PostgreSQL to become CPU-limited on a seq scan query, even
when reading from disk.
Particularly with a complex filter being applied or if it's feeding into
something above that's expensive..
Thanks,
Stephen