What limits Postgres performance when the whole database lives in cache?
Re this talk given by Michael Stonebraker:
http://slideshot.epfl.ch/play/suri_stonebraker
He makes the claim that in a modern ‘big iron’ RDBMS such as Oracle, DB2, MS SQL Server, Postgres, given enough memory that the entire database lives in cache, the server will spend 96% of its memory cycles on unproductive overhead. This includes buffer management, locking, latching (thread/CPU conflicts) and recovery (including log file reads and writes).
[Enough memory in this case assumes that for just about any business, 1TB is enough. The intent of his argument is that a server designed correctly for it would run 25x faster.]
I wondered if there are any figures or measurements on Postgres performance in this ‘enough memory’ environment to support or contest this point of view?
Regards
David M Bennett FACS
_____
Andl - A New Database Language - andl.org
On Fri, Sep 2, 2016 at 4:49 AM, dandl <david@andl.org> wrote:
Re this talk given by Michael Stonebraker:
http://slideshot.epfl.ch/play/suri_stonebraker
He makes the claim that in a modern ‘big iron’ RDBMS such as Oracle, DB2, MS
SQL Server, Postgres, given enough memory that the entire database lives in
cache, the server will spend 96% of its memory cycles on unproductive
overhead. This includes buffer management, locking, latching (thread/CPU
conflicts) and recovery (including log file reads and writes).[Enough memory in this case assumes that for just about any business, 1TB is
enough. The intent of his argument is that a server designed correctly for
it would run 25x faster.]I wondered if there are any figures or measurements on Postgres performance
in this ‘enough memory’ environment to support or contest this point of
view?
What limits postgresql when everything fits in memory? The fact that
it's designed to survive a power outage and not lose all your data.
Stonebraker's new stuff is cool, but it is NOT designed to survive
total power failure.
Two totally different design concepts. It's apples and oranges to compare them.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 2016-09-02 11:10:35 -0600, Scott Marlowe wrote:
On Fri, Sep 2, 2016 at 4:49 AM, dandl <david@andl.org> wrote:
Re this talk given by Michael Stonebraker:
http://slideshot.epfl.ch/play/suri_stonebraker
He makes the claim that in a modern ‘big iron’ RDBMS such as Oracle, DB2, MS
SQL Server, Postgres, given enough memory that the entire database lives in
cache, the server will spend 96% of its memory cycles on unproductive
overhead. This includes buffer management, locking, latching (thread/CPU
conflicts) and recovery (including log file reads and writes).
I think those numbers are overblown, and more PR than reality.
But there certainly are some things that can be made more efficient if
you don't care about durability and replication.
I wondered if there are any figures or measurements on Postgres performance
in this ‘enough memory’ environment to support or contest this point of
view?
I don't think that's really answerable without individual use-cases in
mind. Answering that question for analytics, operational, ... workloads
is going to look different, and the overheads are elsewhere.
I personally think that each implementations restrictions are more
likely to be an issue than anything "fundamental".
What limits postgresql when everything fits in memory? The fact that
it's designed to survive a power outage and not lose all your data.Stonebraker's new stuff is cool, but it is NOT designed to survive
total power failure.Two totally different design concepts. It's apples and oranges to compare them.
I don't think they're that fundamentally different.
Greetings,
Andres Freund
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Check out Voltdb (or Scylladb which is more different) for the changes in
architecture required to achieve those performance increases.
On Fri, Sep 2, 2016 at 7:32 PM, Andres Freund <andres@anarazel.de> wrote:
Show quoted text
On 2016-09-02 11:10:35 -0600, Scott Marlowe wrote:
On Fri, Sep 2, 2016 at 4:49 AM, dandl <david@andl.org> wrote:
Re this talk given by Michael Stonebraker:
http://slideshot.epfl.ch/play/suri_stonebraker
He makes the claim that in a modern ‘big iron’ RDBMS such as Oracle,
DB2, MS
SQL Server, Postgres, given enough memory that the entire database
lives in
cache, the server will spend 96% of its memory cycles on unproductive
overhead. This includes buffer management, locking, latching(thread/CPU
conflicts) and recovery (including log file reads and writes).
I think those numbers are overblown, and more PR than reality.
But there certainly are some things that can be made more efficient if
you don't care about durability and replication.I wondered if there are any figures or measurements on Postgres
performance
in this ‘enough memory’ environment to support or contest this point of
view?I don't think that's really answerable without individual use-cases in
mind. Answering that question for analytics, operational, ... workloads
is going to look different, and the overheads are elsewhere.I personally think that each implementations restrictions are more
likely to be an issue than anything "fundamental".What limits postgresql when everything fits in memory? The fact that
it's designed to survive a power outage and not lose all your data.Stonebraker's new stuff is cool, but it is NOT designed to survive
total power failure.Two totally different design concepts. It's apples and oranges to
compare them.
I don't think they're that fundamentally different.
Greetings,
Andres Freund
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Fri, Sep 2, 2016 at 10:32:46AM -0700, Andres Freund wrote:
On 2016-09-02 11:10:35 -0600, Scott Marlowe wrote:
On Fri, Sep 2, 2016 at 4:49 AM, dandl <david@andl.org> wrote:
Re this talk given by Michael Stonebraker:
http://slideshot.epfl.ch/play/suri_stonebraker
He makes the claim that in a modern ‘big iron’ RDBMS such as Oracle, DB2, MS
SQL Server, Postgres, given enough memory that the entire database lives in
cache, the server will spend 96% of its memory cycles on unproductive
overhead. This includes buffer management, locking, latching (thread/CPU
conflicts) and recovery (including log file reads and writes).I think those numbers are overblown, and more PR than reality.
But there certainly are some things that can be made more efficient if
you don't care about durability and replication.
Agreed. Stonebraker measured Shore DBMS, which is an academic database:
http://research.cs.wisc.edu/shore/
If he had measured a production-quality database that had been optimized
like Postgres, I would take more stock of his "overhead" numbers.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Fri, Sep 2, 2016 at 10:32 AM, Andres Freund <andres@anarazel.de> wrote:
I wondered if there are any figures or measurements on Postgres performance
in this ‘enough memory’ environment to support or contest this point of
view?I don't think that's really answerable without individual use-cases in
mind. Answering that question for analytics, operational, ... workloads
is going to look different, and the overheads are elsewhere.I personally think that each implementations restrictions are more
likely to be an issue than anything "fundamental".
+1
At one point, Stonebraker was regularly claiming that "crabbing" of
buffer locks in B-Trees was a fundamental overhead paid in systems
more or less based on System R. He did eventually start to acknowledge
that Lehman and Yao figured out a technique that made that untrue in
1981, if only barely [1]https://www.cs.cmu.edu/~pavlo/static/papers/stonebraker-ic2e2014.pdf (See his citation 11) -- Peter Geoghegan, but the lesson for me was to take his claims
in this area with a generous pinch of salt.
[1]: https://www.cs.cmu.edu/~pavlo/static/papers/stonebraker-ic2e2014.pdf (See his citation 11) -- Peter Geoghegan
(See his citation 11)
--
Peter Geoghegan
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Re this talk given by Michael Stonebraker:
http://slideshot.epfl.ch/play/suri_stonebraker
He makes the claim that in a modern ‘big iron’ RDBMS such as Oracle,
DB2, MS SQL Server, Postgres, given enough memory that the entire
database lives in cache, the server will spend 96% of its memory
cycles on unproductive overhead. This includes buffer management,
locking, latching (thread/CPU
conflicts) and recovery (including log file reads and writes).[Enough memory in this case assumes that for just about any
business,
1TB is enough. The intent of his argument is that a server designed
correctly for it would run 25x faster.]I wondered if there are any figures or measurements on Postgres
performance in this ‘enough memory’ environment to support orcontest
this point of view?
What limits postgresql when everything fits in memory? The fact that
it's designed to survive a power outage and not lose all your data.Stonebraker's new stuff is cool, but it is NOT designed to survive
total power failure.
I don't think this is quite true. The mechanism he proposes has a small window in which committed transactions can be lost, and this should be addressed by replication or by a small amount of UPC (a few seconds).
But that isn't my question: I'm asking whether anyone *knows* any comparable figures for Postgres. IOW how much performance gain might be available for different design choices.
Two totally different design concepts. It's apples and oranges to
compare them.
Not to an end user. A system that runs 10x on OLTP and provides all the same functionality is a direct competitor.
Regards
David M Bennett FACS
Andl - A New Database Language - andl.org
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
http://slideshot.epfl.ch/play/suri_stonebraker
He makes the claim that in a modern ‘big iron’ RDBMS such as
Oracle,
DB2, MS SQL Server, Postgres, given enough memory that the entire
database lives in cache, the server will spend 96% of its memory
cycles on unproductive overhead. This includes buffer management,
locking, latching (thread/CPU
conflicts) and recovery (including log file reads and writes).I think those numbers are overblown, and more PR than reality.
Did you check out the presentation? He presents figures obtained by experiment from instrumentation. Even if it's only 90% instead of 96%, he has a point.
But there certainly are some things that can be made more efficient if
you don't care about durability and replication.
He cares plenty. Durability and high availability both rely on active replication.
I wondered if there are any figures or measurements on Postgres
performance in this ‘enough memory’ environment to support or
contest this point of view?I don't think that's really answerable without individual use-cases in
mind. Answering that question for analytics, operational, ...
workloads is going to look different, and the overheads are elsewhere.
That's like: we don't have any figures for how fast your car will go: it depends on who's driving and how many passengers. My answer is: yes, of course, but you can still provide figures for some specific set of conditions, and they'll be better than none at all.
I personally think that each implementations restrictions are more
likely to be an issue than anything "fundamental".
Unlikely. But you can still obtain figures.
What limits postgresql when everything fits in memory? The fact that
it's designed to survive a power outage and not lose all your data.Stonebraker's new stuff is cool, but it is NOT designed to survive
total power failure.Two totally different design concepts. It's apples and oranges to
compare them.
I don't think they're that fundamentally different.
Agreed.
Regards
David M Bennett FACS
Andl - A New Database Language - andl.org
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Agreed. Stonebraker measured Shore DBMS, which is an academic
database:http://research.cs.wisc.edu/shore/
If he had measured a production-quality database that had been
optimized like Postgres, I would take more stock of his "overhead"
numbers.
Exactly! And that's what I'm asking: has anyone done or know of any figures for Postgres, to set against these?
Regards
David M Bennett FACS
Andl - A New Database Language - andl.org
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Sat, Sep 3, 2016 at 10:45:47AM +1000, dandl wrote:
Agreed. Stonebraker measured Shore DBMS, which is an academic
database:http://research.cs.wisc.edu/shore/
If he had measured a production-quality database that had been
optimized like Postgres, I would take more stock of his "overhead"
numbers.Exactly! And that's what I'm asking: has anyone done or know of any figures for Postgres, to set against these?
Uh, well, there are Postgres tools that measure the overhead of locking
on queries and stuff. I don't know any numbers myself.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
I wondered if there are any figures or measurements on Postgres
performance in this ‘enough memory’ environment to support or
contest this point of view?I don't think that's really answerable without individual use-cases
in
mind. Answering that question for analytics, operational, ...
workloads is going to look different, and the overheads areelsewhere.
I personally think that each implementations restrictions are more
likely to be an issue than anything "fundamental".+1
At one point, Stonebraker was regularly claiming that "crabbing" of
buffer locks in B-Trees was a fundamental overhead paid in systems
more or less based on System R. He did eventually start to acknowledge
that Lehman and Yao figured out a technique that made that untrue in
1981, if only barely [1], but the lesson for me was to take his claims
in this area with a generous pinch of salt.[1] https://www.cs.cmu.edu/~pavlo/static/papers/stonebraker-
ic2e2014.pdf
(See his citation 11)
The paper is substantially in agreement with the presentation I quoted. If there are differences in detail, they certainly don't dominate his argument.
IMO your claim is far weaker. What specifically do you say is wrong about his current claims, and on what facts to you base it?
In any event, I am not backing his claims. I am simply asking: does anyone have any facts to support or refute his 96% claim as applied to Postgres?
Regards
David M Bennett FACS
Andl - A New Database Language - andl.org
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Fri, Sep 2, 2016 at 8:36 PM, dandl <david@andl.org> wrote:
The paper is substantially in agreement with the presentation I quoted. If there are differences in detail, they certainly don't dominate his argument.
My point is that the paper is rather light on details of the kind that
are really important. And, that it's noteworthy that Stonebraker has
in the past, during presentations, emphasized the buffer lock
crabbing/latch coupling thing *at length*, even though it's a totally
solved problem.
It's also true that Postgres has become vastly more scalable in the
past few years due to optimization that doesn't change the fundamental
nature of the system at all, so it's very easy to imagine individual
differences being more important than differences between major
classes of system.
Those are facts. You may take from them what you will.
IMO your claim is far weaker. What specifically do you say is wrong about his current claims, and on what facts to you base it?
I'm not the one making overarching conclusions. I'm not trying to
convince you of anything.
--
Peter Geoghegan
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 3 September 2016 at 04:36, dandl <david@andl.org> wrote:
In any event, I am not backing his claims. I am simply asking: does anyone have any facts to support or refute his 96% claim as applied to Postgres?
If that is scientific research he will publish evidence. If not, its
just words and no refutation is required.
Anybody can come here and discuss new features. Anybody. They just
need to explain their thoughts and produce evidence for their
assertions.
Come on in, database researchers, we're open to rational contributions.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 9/2/16 7:39 PM, dandl wrote:
I don't think this is quite true. The mechanism he proposes has a small window in which committed transactions can be lost, and this should be addressed by replication or by a small amount of UPC (a few seconds).
Except that's the entire point where all those kind of solutions
*completely* depart ways from Postgres. Postgres is designed to *lose
absolutely no data after a COMMIT*, potentially including requiring that
data to be synchronized out to a second server. That is worlds apart
from "we might lose a few seconds", and there's a lot of stuff Postgres
has to worry about to accomplish that. Some of that stuff can be
short-circuited if you don't care (that's what SET synchronous_commit =
off does), but there's always going to be some amount of extra work to
support synchronous_commit = local or remote_*.
Presumably there's more improvements that could be made to Postgres in
this area, but if you really don't care about losing seconds worth of
data and you need absolutely the best performance possible then maybe
Postgres isn't the right choice for you.
"All databases suck, each one just sucks in a different way." - Me,
circa 1999.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532) mobile: 512-569-9461
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
I don't think this is quite true. The mechanism he proposes has a
small window in which committed transactions can be lost, and this
should be addressed by replication or by a small amount of UPC (a few
seconds).Except that's the entire point where all those kind of solutions
*completely* depart ways from Postgres. Postgres is designed to *lose
absolutely no data after a COMMIT*, potentially including requiring
that data to be synchronized out to a second server. That is worlds
apart from "we might lose a few seconds", and there's a lot of stuff
Postgres has to worry about to accomplish that. Some of that stuff can
be short-circuited if you don't care (that's what SET
synchronous_commit = off does), but there's always going to be some
amount of extra work to support synchronous_commit = local or
remote_*.
I understand that. What I'm trying to get a handle on is the magnitude of that cost and how it influences other parts of the product, specifically for Postgres. If the overhead for perfect durability were (say) 10%, few people would care about the cost. But Stonebraker puts the figure at 2500%! His presentation says that a pure relational in-memory store can beat a row store with disk fully cached in memory by 10x to 25x. [Ditto column stores beat row stores by 10x for complex queries in non-updatable data.]
So my question is not to challenge the Postgres way. It's simply to ask whether there are any known figures that would directly support or refute his claims. Does Postgres really spend 96% of its time in thumb-twiddling once the entire database resides in memory?
Presumably there's more improvements that could be made to Postgres in
this area, but if you really don't care about losing seconds worth of
data and you need absolutely the best performance possible then maybe
Postgres isn't the right choice for you.
Achieving durability for an in-memory database requires either UPS or active replication or both, which is an additional cost that is not needed for every application. My question precedes that one: is there a big performance gain there for the taking, or is it smoke and mirrors?
Regards
David M Bennett FACS
Andl - A New Database Language - andl.org
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Thu, Sep 8, 2016 at 2:35 AM, dandl <david@andl.org> wrote:
I understand that. What I'm trying to get a handle on is the magnitude of
that cost and how it influences other parts of the product, specifically
for Postgres. If the overhead for perfect durability were (say) 10%, few
people would care about the cost. But Stonebraker puts the figure at 2500%!
His presentation says that a pure relational in-memory store can beat a row
store with disk fully cached in memory by 10x to 25x. [Ditto column stores
beat row stores by 10x for complex queries in non-updatable data.]
VoltDB replication is synchronous in the same cluster/data center, and
asynchronous with a remote cluster/data center. As a consequence, if your
application needs to survive a data center power failure with zero data
loss, then you have to enable VoltDB's synchronous command logging (which
by the way is not available in the Community Edition — only in the
Enterprise Edition). When Stonebraker says VoltDB's throughput is 10~25x
greater, I'd guess this is with no command logging at all, and no periodic
snapshotting.
So my question is not to challenge the Postgres way. It's simply to ask
whether there are any known figures that would directly support or refute
his claims. Does Postgres really spend 96% of its time in thumb-twiddling
once the entire database resides in memory?
Alas, I've been unable to find any relevant benchmark. I'm not motivated
enough to install a PostgreSQL and VoltDB and try it for myself :-)
On 9/8/16 3:15 AM, Nicolas Grilly wrote:
So my question is not to challenge the Postgres way. It's simply to
ask whether there are any known figures that would directly support
or refute his claims. Does Postgres really spend 96% of its time in
thumb-twiddling once the entire database resides in memory?Alas, I've been unable to find any relevant benchmark. I'm not motivated
enough to install a PostgreSQL and VoltDB and try it for myself :-)
My guess is this is a test scenario that completely favors VoltDB while
hamstringing Postgres, such as using no transaction durability at all in
VoltDB while using maximum durability in Postgres. Comparing the cost of
every COMMIT doing an fsync vs not could certainly produce a 25x
difference. There could be other cases where you'd get a 25x difference.
You need to be careful of benchmarks from commercial companies. MySQL
used to tout how fast it was compared to Postgres, using a benchmark it
created specifically for that purpose that had very little to do with
the real world. People eventually discovered that as soon as you had a
concurrent workload Postgres was actually faster.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532) mobile: 512-569-9461
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
From: Jim Nasby [mailto:Jim.Nasby@BlueTreble.com]
My guess is this is a test scenario that completely favors VoltDB
while hamstringing Postgres, such as using no transaction durability
at all in VoltDB while using maximum durability in Postgres. Comparing
the cost of every COMMIT doing an fsync vs not could certainly produce
a 25x difference. There could be other cases where you'd get a 25x
difference.
I guess my question then is: how much do you pay for that durability? If you benchmark Postgres configured for pure in-memory usage with absolutely no writes to disk (or SSD or network), where is it spending its time? Is there a lot of overhead in getting data in and out of cache buffers and conversions and in concurrency control?
As a case study, assume an RBMS is required to monitor and record Internet (or phone or VHF) traffic. If the power goes off the traffic continues, and it really doesn’t matter whether you lose 60 seconds of down time or 63 seconds; in any case another instance in another data centre will pick up the slack. So the requirement is atomicity yes, but not durability. Should you bid Postgres for the job, or look elsewhere? How much slower would Postgres be than a competitor? Do we care?
You need to be careful of benchmarks from commercial companies. MySQL
used to tout how fast it was compared to Postgres, using a benchmark
it created specifically for that purpose that had very little to do
with the real world. People eventually discovered that as soon as you
had a concurrent workload Postgres was actually faster.
Of course; but at the same time insisting on including durability favours Postgres when I'm actually asking about alternatives.
Regards
David M Bennett FACS
Andl - A New Database Language - andl.org
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
2016-09-11 7:20 GMT+02:00 dandl <david@andl.org>:
From: Jim Nasby [mailto:Jim.Nasby@BlueTreble.com]
My guess is this is a test scenario that completely favors VoltDB
while hamstringing Postgres, such as using no transaction durability
at all in VoltDB while using maximum durability in Postgres. Comparing
the cost of every COMMIT doing an fsync vs not could certainly produce
a 25x difference. There could be other cases where you'd get a 25x
difference.I guess my question then is: how much do you pay for that durability? If
you benchmark Postgres configured for pure in-memory usage with absolutely
no writes to disk (or SSD or network), where is it spending its time? Is
there a lot of overhead in getting data in and out of cache buffers and
conversions and in concurrency control?
It is not about durability only.
Postgres holds data in format equal or similar to saved data on persistent
storage. There are repeated serialization and deserialization. Some
structures are designed to be simply saved (like Btree), but the
performance is second target.
I believe so new memory databases can be 10-100x faster - depends on use
case, because they hold data primary in memory and uses different data
structures. The performance of these databases is great, when all data are
well placed in memory all time. But the performance is pretty bad, when
this rule is not true. There is another issue - when you increase speed of
database write operations, probably you will hit a file system limits, spin
lock issues - so it is one reason, why big system are based on distributed
systems more and more.
Regards
Pavel
Show quoted text
As a case study, assume an RBMS is required to monitor and record Internet
(or phone or VHF) traffic. If the power goes off the traffic continues, and
it really doesn’t matter whether you lose 60 seconds of down time or 63
seconds; in any case another instance in another data centre will pick up
the slack. So the requirement is atomicity yes, but not durability. Should
you bid Postgres for the job, or look elsewhere? How much slower would
Postgres be than a competitor? Do we care?
You need to be careful of benchmarks from commercial companies. MySQL
used to tout how fast it was compared to Postgres, using a benchmark
it created specifically for that purpose that had very little to do
with the real world. People eventually discovered that as soon as you
had a concurrent workload Postgres was actually faster.Of course; but at the same time insisting on including durability favours
Postgres when I'm actually asking about alternatives.Regards
David M Bennett FACSAndl - A New Database Language - andl.org
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Pavel Stehule
I guess my question then is: how much do you pay for that durability? If you benchmark Postgres configured for pure in-memory usage with absolutely no writes to disk (or SSD or network), where is it spending its time? Is there a lot of overhead in getting data in and out of cache buffers and conversions and in concurrency control?
It is not about durability only.
Postgres holds data in format equal or similar to saved data on persistent storage. There are repeated serialization and deserialization. Some structures are designed to be simply saved (like Btree), but the performance is second target.
I believe so new memory databases can be 10-100x faster - depends on use case, because they hold data primary in memory and uses different data structures. The performance of these databases is great, when all data are well placed in memory all time. But the performance is pretty bad, when this rule is not true. There is another issue - when you increase speed of database write operations, probably you will hit a file system limits, spin lock issues - so it is one reason, why big system are based on distributed systems more and more.
That’s the point I’m making, exactly. The question is: does anyone have a handle on how big that cost really is, as a guide to whether to try to do anything about it? Is it really 25x as Stonebraker says?
Regards
David M Bennett FACS
_____
Andl - A New Database Language - andl.org