What limits Postgres performance when the whole database lives in cache?

Started by dandlover 9 years ago21 messagesgeneral
Jump to latest
#1dandl
david@andl.org

Re this talk given by Michael Stonebraker:

http://slideshot.epfl.ch/play/suri_stonebraker

He makes the claim that in a modern ‘big iron’ RDBMS such as Oracle, DB2, MS SQL Server, Postgres, given enough memory that the entire database lives in cache, the server will spend 96% of its memory cycles on unproductive overhead. This includes buffer management, locking, latching (thread/CPU conflicts) and recovery (including log file reads and writes).

[Enough memory in this case assumes that for just about any business, 1TB is enough. The intent of his argument is that a server designed correctly for it would run 25x faster.]

I wondered if there are any figures or measurements on Postgres performance in this ‘enough memory’ environment to support or contest this point of view?

Regards

David M Bennett FACS

_____

Andl - A New Database Language - andl.org

#2Scott Marlowe
scott.marlowe@gmail.com
In reply to: dandl (#1)
Re: What limits Postgres performance when the whole database lives in cache?

On Fri, Sep 2, 2016 at 4:49 AM, dandl <david@andl.org> wrote:

Re this talk given by Michael Stonebraker:

http://slideshot.epfl.ch/play/suri_stonebraker

He makes the claim that in a modern ‘big iron’ RDBMS such as Oracle, DB2, MS
SQL Server, Postgres, given enough memory that the entire database lives in
cache, the server will spend 96% of its memory cycles on unproductive
overhead. This includes buffer management, locking, latching (thread/CPU
conflicts) and recovery (including log file reads and writes).

[Enough memory in this case assumes that for just about any business, 1TB is
enough. The intent of his argument is that a server designed correctly for
it would run 25x faster.]

I wondered if there are any figures or measurements on Postgres performance
in this ‘enough memory’ environment to support or contest this point of
view?

What limits postgresql when everything fits in memory? The fact that
it's designed to survive a power outage and not lose all your data.

Stonebraker's new stuff is cool, but it is NOT designed to survive
total power failure.

Two totally different design concepts. It's apples and oranges to compare them.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Andres Freund
andres@anarazel.de
In reply to: Scott Marlowe (#2)
Re: What limits Postgres performance when the whole database lives in cache?

On 2016-09-02 11:10:35 -0600, Scott Marlowe wrote:

On Fri, Sep 2, 2016 at 4:49 AM, dandl <david@andl.org> wrote:

Re this talk given by Michael Stonebraker:

http://slideshot.epfl.ch/play/suri_stonebraker

He makes the claim that in a modern ‘big iron’ RDBMS such as Oracle, DB2, MS
SQL Server, Postgres, given enough memory that the entire database lives in
cache, the server will spend 96% of its memory cycles on unproductive
overhead. This includes buffer management, locking, latching (thread/CPU
conflicts) and recovery (including log file reads and writes).

I think those numbers are overblown, and more PR than reality.

But there certainly are some things that can be made more efficient if
you don't care about durability and replication.

I wondered if there are any figures or measurements on Postgres performance
in this ‘enough memory’ environment to support or contest this point of
view?

I don't think that's really answerable without individual use-cases in
mind. Answering that question for analytics, operational, ... workloads
is going to look different, and the overheads are elsewhere.

I personally think that each implementations restrictions are more
likely to be an issue than anything "fundamental".

What limits postgresql when everything fits in memory? The fact that
it's designed to survive a power outage and not lose all your data.

Stonebraker's new stuff is cool, but it is NOT designed to survive
total power failure.

Two totally different design concepts. It's apples and oranges to compare them.

I don't think they're that fundamentally different.

Greetings,

Andres Freund

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Dorian Hoxha
dorian.hoxha@gmail.com
In reply to: Andres Freund (#3)
Re: What limits Postgres performance when the whole database lives in cache?

Check out Voltdb (or Scylladb which is more different) for the changes in
architecture required to achieve those performance increases.

On Fri, Sep 2, 2016 at 7:32 PM, Andres Freund <andres@anarazel.de> wrote:

Show quoted text

On 2016-09-02 11:10:35 -0600, Scott Marlowe wrote:

On Fri, Sep 2, 2016 at 4:49 AM, dandl <david@andl.org> wrote:

Re this talk given by Michael Stonebraker:

http://slideshot.epfl.ch/play/suri_stonebraker

He makes the claim that in a modern ‘big iron’ RDBMS such as Oracle,

DB2, MS

SQL Server, Postgres, given enough memory that the entire database

lives in

cache, the server will spend 96% of its memory cycles on unproductive
overhead. This includes buffer management, locking, latching

(thread/CPU

conflicts) and recovery (including log file reads and writes).

I think those numbers are overblown, and more PR than reality.

But there certainly are some things that can be made more efficient if
you don't care about durability and replication.

I wondered if there are any figures or measurements on Postgres

performance

in this ‘enough memory’ environment to support or contest this point of
view?

I don't think that's really answerable without individual use-cases in
mind. Answering that question for analytics, operational, ... workloads
is going to look different, and the overheads are elsewhere.

I personally think that each implementations restrictions are more
likely to be an issue than anything "fundamental".

What limits postgresql when everything fits in memory? The fact that
it's designed to survive a power outage and not lose all your data.

Stonebraker's new stuff is cool, but it is NOT designed to survive
total power failure.

Two totally different design concepts. It's apples and oranges to

compare them.

I don't think they're that fundamentally different.

Greetings,

Andres Freund

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Bruce Momjian
bruce@momjian.us
In reply to: Andres Freund (#3)
Re: What limits Postgres performance when the whole database lives in cache?

On Fri, Sep 2, 2016 at 10:32:46AM -0700, Andres Freund wrote:

On 2016-09-02 11:10:35 -0600, Scott Marlowe wrote:

On Fri, Sep 2, 2016 at 4:49 AM, dandl <david@andl.org> wrote:

Re this talk given by Michael Stonebraker:

http://slideshot.epfl.ch/play/suri_stonebraker

He makes the claim that in a modern ‘big iron’ RDBMS such as Oracle, DB2, MS
SQL Server, Postgres, given enough memory that the entire database lives in
cache, the server will spend 96% of its memory cycles on unproductive
overhead. This includes buffer management, locking, latching (thread/CPU
conflicts) and recovery (including log file reads and writes).

I think those numbers are overblown, and more PR than reality.

But there certainly are some things that can be made more efficient if
you don't care about durability and replication.

Agreed. Stonebraker measured Shore DBMS, which is an academic database:

http://research.cs.wisc.edu/shore/

If he had measured a production-quality database that had been optimized
like Postgres, I would take more stock of his "overhead" numbers.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

In reply to: Andres Freund (#3)
Re: What limits Postgres performance when the whole database lives in cache?

On Fri, Sep 2, 2016 at 10:32 AM, Andres Freund <andres@anarazel.de> wrote:

I wondered if there are any figures or measurements on Postgres performance
in this ‘enough memory’ environment to support or contest this point of
view?

I don't think that's really answerable without individual use-cases in
mind. Answering that question for analytics, operational, ... workloads
is going to look different, and the overheads are elsewhere.

I personally think that each implementations restrictions are more
likely to be an issue than anything "fundamental".

+1

At one point, Stonebraker was regularly claiming that "crabbing" of
buffer locks in B-Trees was a fundamental overhead paid in systems
more or less based on System R. He did eventually start to acknowledge
that Lehman and Yao figured out a technique that made that untrue in
1981, if only barely [1]https://www.cs.cmu.edu/~pavlo/static/papers/stonebraker-ic2e2014.pdf (See his citation 11) -- Peter Geoghegan, but the lesson for me was to take his claims
in this area with a generous pinch of salt.

[1]: https://www.cs.cmu.edu/~pavlo/static/papers/stonebraker-ic2e2014.pdf (See his citation 11) -- Peter Geoghegan
(See his citation 11)
--
Peter Geoghegan

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7dandl
david@andl.org
In reply to: Scott Marlowe (#2)
Re: What limits Postgres performance when the whole database lives in cache?

Re this talk given by Michael Stonebraker:

http://slideshot.epfl.ch/play/suri_stonebraker

He makes the claim that in a modern ‘big iron’ RDBMS such as Oracle,
DB2, MS SQL Server, Postgres, given enough memory that the entire
database lives in cache, the server will spend 96% of its memory
cycles on unproductive overhead. This includes buffer management,
locking, latching (thread/CPU
conflicts) and recovery (including log file reads and writes).

[Enough memory in this case assumes that for just about any

business,

1TB is enough. The intent of his argument is that a server designed
correctly for it would run 25x faster.]

I wondered if there are any figures or measurements on Postgres
performance in this ‘enough memory’ environment to support or

contest

this point of view?

What limits postgresql when everything fits in memory? The fact that
it's designed to survive a power outage and not lose all your data.

Stonebraker's new stuff is cool, but it is NOT designed to survive
total power failure.

I don't think this is quite true. The mechanism he proposes has a small window in which committed transactions can be lost, and this should be addressed by replication or by a small amount of UPC (a few seconds).

But that isn't my question: I'm asking whether anyone *knows* any comparable figures for Postgres. IOW how much performance gain might be available for different design choices.

Two totally different design concepts. It's apples and oranges to
compare them.

Not to an end user. A system that runs 10x on OLTP and provides all the same functionality is a direct competitor.

Regards
David M Bennett FACS

Andl - A New Database Language - andl.org

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8dandl
david@andl.org
In reply to: Andres Freund (#3)
Re: What limits Postgres performance when the whole database lives in cache?

http://slideshot.epfl.ch/play/suri_stonebraker

He makes the claim that in a modern ‘big iron’ RDBMS such as

Oracle,

DB2, MS SQL Server, Postgres, given enough memory that the entire
database lives in cache, the server will spend 96% of its memory
cycles on unproductive overhead. This includes buffer management,
locking, latching (thread/CPU
conflicts) and recovery (including log file reads and writes).

I think those numbers are overblown, and more PR than reality.

Did you check out the presentation? He presents figures obtained by experiment from instrumentation. Even if it's only 90% instead of 96%, he has a point.

But there certainly are some things that can be made more efficient if
you don't care about durability and replication.

He cares plenty. Durability and high availability both rely on active replication.

I wondered if there are any figures or measurements on Postgres
performance in this ‘enough memory’ environment to support or
contest this point of view?

I don't think that's really answerable without individual use-cases in
mind. Answering that question for analytics, operational, ...
workloads is going to look different, and the overheads are elsewhere.

That's like: we don't have any figures for how fast your car will go: it depends on who's driving and how many passengers. My answer is: yes, of course, but you can still provide figures for some specific set of conditions, and they'll be better than none at all.

I personally think that each implementations restrictions are more
likely to be an issue than anything "fundamental".

Unlikely. But you can still obtain figures.

What limits postgresql when everything fits in memory? The fact that
it's designed to survive a power outage and not lose all your data.

Stonebraker's new stuff is cool, but it is NOT designed to survive
total power failure.

Two totally different design concepts. It's apples and oranges to

compare them.

I don't think they're that fundamentally different.

Agreed.

Regards
David M Bennett FACS

Andl - A New Database Language - andl.org

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#9dandl
david@andl.org
In reply to: Bruce Momjian (#5)
Re: What limits Postgres performance when the whole database lives in cache?

Agreed. Stonebraker measured Shore DBMS, which is an academic
database:

http://research.cs.wisc.edu/shore/

If he had measured a production-quality database that had been
optimized like Postgres, I would take more stock of his "overhead"
numbers.

Exactly! And that's what I'm asking: has anyone done or know of any figures for Postgres, to set against these?

Regards
David M Bennett FACS

Andl - A New Database Language - andl.org

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#10Bruce Momjian
bruce@momjian.us
In reply to: dandl (#9)
Re: What limits Postgres performance when the whole database lives in cache?

On Sat, Sep 3, 2016 at 10:45:47AM +1000, dandl wrote:

Agreed. Stonebraker measured Shore DBMS, which is an academic
database:

http://research.cs.wisc.edu/shore/

If he had measured a production-quality database that had been
optimized like Postgres, I would take more stock of his "overhead"
numbers.

Exactly! And that's what I'm asking: has anyone done or know of any figures for Postgres, to set against these?

Uh, well, there are Postgres tools that measure the overhead of locking
on queries and stuff. I don't know any numbers myself.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#11dandl
david@andl.org
In reply to: Peter Geoghegan (#6)
Re: What limits Postgres performance when the whole database lives in cache?

I wondered if there are any figures or measurements on Postgres
performance in this ‘enough memory’ environment to support or
contest this point of view?

I don't think that's really answerable without individual use-cases

in

mind. Answering that question for analytics, operational, ...
workloads is going to look different, and the overheads are

elsewhere.

I personally think that each implementations restrictions are more
likely to be an issue than anything "fundamental".

+1

At one point, Stonebraker was regularly claiming that "crabbing" of
buffer locks in B-Trees was a fundamental overhead paid in systems
more or less based on System R. He did eventually start to acknowledge
that Lehman and Yao figured out a technique that made that untrue in
1981, if only barely [1], but the lesson for me was to take his claims
in this area with a generous pinch of salt.

[1] https://www.cs.cmu.edu/~pavlo/static/papers/stonebraker-
ic2e2014.pdf
(See his citation 11)

The paper is substantially in agreement with the presentation I quoted. If there are differences in detail, they certainly don't dominate his argument.

IMO your claim is far weaker. What specifically do you say is wrong about his current claims, and on what facts to you base it?

In any event, I am not backing his claims. I am simply asking: does anyone have any facts to support or refute his 96% claim as applied to Postgres?

Regards
David M Bennett FACS

Andl - A New Database Language - andl.org

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

In reply to: dandl (#11)
Re: What limits Postgres performance when the whole database lives in cache?

On Fri, Sep 2, 2016 at 8:36 PM, dandl <david@andl.org> wrote:

The paper is substantially in agreement with the presentation I quoted. If there are differences in detail, they certainly don't dominate his argument.

My point is that the paper is rather light on details of the kind that
are really important. And, that it's noteworthy that Stonebraker has
in the past, during presentations, emphasized the buffer lock
crabbing/latch coupling thing *at length*, even though it's a totally
solved problem.

It's also true that Postgres has become vastly more scalable in the
past few years due to optimization that doesn't change the fundamental
nature of the system at all, so it's very easy to imagine individual
differences being more important than differences between major
classes of system.

Those are facts. You may take from them what you will.

IMO your claim is far weaker. What specifically do you say is wrong about his current claims, and on what facts to you base it?

I'm not the one making overarching conclusions. I'm not trying to
convince you of anything.

--
Peter Geoghegan

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#13Simon Riggs
simon@2ndQuadrant.com
In reply to: dandl (#11)
Re: What limits Postgres performance when the whole database lives in cache?

On 3 September 2016 at 04:36, dandl <david@andl.org> wrote:

In any event, I am not backing his claims. I am simply asking: does anyone have any facts to support or refute his 96% claim as applied to Postgres?

If that is scientific research he will publish evidence. If not, its
just words and no refutation is required.

Anybody can come here and discuss new features. Anybody. They just
need to explain their thoughts and produce evidence for their
assertions.

Come on in, database researchers, we're open to rational contributions.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#14Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: dandl (#7)
Re: What limits Postgres performance when the whole database lives in cache?

On 9/2/16 7:39 PM, dandl wrote:

I don't think this is quite true. The mechanism he proposes has a small window in which committed transactions can be lost, and this should be addressed by replication or by a small amount of UPC (a few seconds).

Except that's the entire point where all those kind of solutions
*completely* depart ways from Postgres. Postgres is designed to *lose
absolutely no data after a COMMIT*, potentially including requiring that
data to be synchronized out to a second server. That is worlds apart
from "we might lose a few seconds", and there's a lot of stuff Postgres
has to worry about to accomplish that. Some of that stuff can be
short-circuited if you don't care (that's what SET synchronous_commit =
off does), but there's always going to be some amount of extra work to
support synchronous_commit = local or remote_*.

Presumably there's more improvements that could be made to Postgres in
this area, but if you really don't care about losing seconds worth of
data and you need absolutely the best performance possible then maybe
Postgres isn't the right choice for you.

"All databases suck, each one just sucks in a different way." - Me,
circa 1999.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532) mobile: 512-569-9461

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#15dandl
david@andl.org
In reply to: Jim Nasby (#14)
Re: What limits Postgres performance when the whole database lives in cache?

I don't think this is quite true. The mechanism he proposes has a

small window in which committed transactions can be lost, and this
should be addressed by replication or by a small amount of UPC (a few
seconds).

Except that's the entire point where all those kind of solutions
*completely* depart ways from Postgres. Postgres is designed to *lose
absolutely no data after a COMMIT*, potentially including requiring
that data to be synchronized out to a second server. That is worlds
apart from "we might lose a few seconds", and there's a lot of stuff
Postgres has to worry about to accomplish that. Some of that stuff can
be short-circuited if you don't care (that's what SET
synchronous_commit = off does), but there's always going to be some
amount of extra work to support synchronous_commit = local or
remote_*.

I understand that. What I'm trying to get a handle on is the magnitude of that cost and how it influences other parts of the product, specifically for Postgres. If the overhead for perfect durability were (say) 10%, few people would care about the cost. But Stonebraker puts the figure at 2500%! His presentation says that a pure relational in-memory store can beat a row store with disk fully cached in memory by 10x to 25x. [Ditto column stores beat row stores by 10x for complex queries in non-updatable data.]

So my question is not to challenge the Postgres way. It's simply to ask whether there are any known figures that would directly support or refute his claims. Does Postgres really spend 96% of its time in thumb-twiddling once the entire database resides in memory?

Presumably there's more improvements that could be made to Postgres in
this area, but if you really don't care about losing seconds worth of
data and you need absolutely the best performance possible then maybe
Postgres isn't the right choice for you.

Achieving durability for an in-memory database requires either UPS or active replication or both, which is an additional cost that is not needed for every application. My question precedes that one: is there a big performance gain there for the taking, or is it smoke and mirrors?

Regards
David M Bennett FACS

Andl - A New Database Language - andl.org

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#16Nicolas Grilly
nicolas@gardentechno.com
In reply to: dandl (#15)
Re: What limits Postgres performance when the whole database lives in cache?

On Thu, Sep 8, 2016 at 2:35 AM, dandl <david@andl.org> wrote:

I understand that. What I'm trying to get a handle on is the magnitude of
that cost and how it influences other parts of the product, specifically
for Postgres. If the overhead for perfect durability were (say) 10%, few
people would care about the cost. But Stonebraker puts the figure at 2500%!
His presentation says that a pure relational in-memory store can beat a row
store with disk fully cached in memory by 10x to 25x. [Ditto column stores
beat row stores by 10x for complex queries in non-updatable data.]

VoltDB replication is synchronous in the same cluster/data center, and
asynchronous with a remote cluster/data center. As a consequence, if your
application needs to survive a data center power failure with zero data
loss, then you have to enable VoltDB's synchronous command logging (which
by the way is not available in the Community Edition — only in the
Enterprise Edition). When Stonebraker says VoltDB's throughput is 10~25x
greater, I'd guess this is with no command logging at all, and no periodic
snapshotting.

So my question is not to challenge the Postgres way. It's simply to ask

whether there are any known figures that would directly support or refute
his claims. Does Postgres really spend 96% of its time in thumb-twiddling
once the entire database resides in memory?

Alas, I've been unable to find any relevant benchmark. I'm not motivated
enough to install a PostgreSQL and VoltDB and try it for myself :-)

#17Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Nicolas Grilly (#16)
Re: What limits Postgres performance when the whole database lives in cache?

On 9/8/16 3:15 AM, Nicolas Grilly wrote:

So my question is not to challenge the Postgres way. It's simply to
ask whether there are any known figures that would directly support
or refute his claims. Does Postgres really spend 96% of its time in
thumb-twiddling once the entire database resides in memory?

Alas, I've been unable to find any relevant benchmark. I'm not motivated
enough to install a PostgreSQL and VoltDB and try it for myself :-)

My guess is this is a test scenario that completely favors VoltDB while
hamstringing Postgres, such as using no transaction durability at all in
VoltDB while using maximum durability in Postgres. Comparing the cost of
every COMMIT doing an fsync vs not could certainly produce a 25x
difference. There could be other cases where you'd get a 25x difference.

You need to be careful of benchmarks from commercial companies. MySQL
used to tout how fast it was compared to Postgres, using a benchmark it
created specifically for that purpose that had very little to do with
the real world. People eventually discovered that as soon as you had a
concurrent workload Postgres was actually faster.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532) mobile: 512-569-9461

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#18dandl
david@andl.org
In reply to: Jim Nasby (#17)
Re: What limits Postgres performance when the whole database lives in cache?

From: Jim Nasby [mailto:Jim.Nasby@BlueTreble.com]
My guess is this is a test scenario that completely favors VoltDB
while hamstringing Postgres, such as using no transaction durability
at all in VoltDB while using maximum durability in Postgres. Comparing
the cost of every COMMIT doing an fsync vs not could certainly produce
a 25x difference. There could be other cases where you'd get a 25x
difference.

I guess my question then is: how much do you pay for that durability? If you benchmark Postgres configured for pure in-memory usage with absolutely no writes to disk (or SSD or network), where is it spending its time? Is there a lot of overhead in getting data in and out of cache buffers and conversions and in concurrency control?

As a case study, assume an RBMS is required to monitor and record Internet (or phone or VHF) traffic. If the power goes off the traffic continues, and it really doesn’t matter whether you lose 60 seconds of down time or 63 seconds; in any case another instance in another data centre will pick up the slack. So the requirement is atomicity yes, but not durability. Should you bid Postgres for the job, or look elsewhere? How much slower would Postgres be than a competitor? Do we care?

You need to be careful of benchmarks from commercial companies. MySQL
used to tout how fast it was compared to Postgres, using a benchmark
it created specifically for that purpose that had very little to do
with the real world. People eventually discovered that as soon as you
had a concurrent workload Postgres was actually faster.

Of course; but at the same time insisting on including durability favours Postgres when I'm actually asking about alternatives.

Regards
David M Bennett FACS

Andl - A New Database Language - andl.org

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#19Pavel Stehule
pavel.stehule@gmail.com
In reply to: dandl (#18)
Re: What limits Postgres performance when the whole database lives in cache?

2016-09-11 7:20 GMT+02:00 dandl <david@andl.org>:

From: Jim Nasby [mailto:Jim.Nasby@BlueTreble.com]
My guess is this is a test scenario that completely favors VoltDB
while hamstringing Postgres, such as using no transaction durability
at all in VoltDB while using maximum durability in Postgres. Comparing
the cost of every COMMIT doing an fsync vs not could certainly produce
a 25x difference. There could be other cases where you'd get a 25x
difference.

I guess my question then is: how much do you pay for that durability? If
you benchmark Postgres configured for pure in-memory usage with absolutely
no writes to disk (or SSD or network), where is it spending its time? Is
there a lot of overhead in getting data in and out of cache buffers and
conversions and in concurrency control?

It is not about durability only.

Postgres holds data in format equal or similar to saved data on persistent
storage. There are repeated serialization and deserialization. Some
structures are designed to be simply saved (like Btree), but the
performance is second target.

I believe so new memory databases can be 10-100x faster - depends on use
case, because they hold data primary in memory and uses different data
structures. The performance of these databases is great, when all data are
well placed in memory all time. But the performance is pretty bad, when
this rule is not true. There is another issue - when you increase speed of
database write operations, probably you will hit a file system limits, spin
lock issues - so it is one reason, why big system are based on distributed
systems more and more.

Regards

Pavel

Show quoted text

As a case study, assume an RBMS is required to monitor and record Internet
(or phone or VHF) traffic. If the power goes off the traffic continues, and
it really doesn’t matter whether you lose 60 seconds of down time or 63
seconds; in any case another instance in another data centre will pick up
the slack. So the requirement is atomicity yes, but not durability. Should
you bid Postgres for the job, or look elsewhere? How much slower would
Postgres be than a competitor? Do we care?

You need to be careful of benchmarks from commercial companies. MySQL
used to tout how fast it was compared to Postgres, using a benchmark
it created specifically for that purpose that had very little to do
with the real world. People eventually discovered that as soon as you
had a concurrent workload Postgres was actually faster.

Of course; but at the same time insisting on including durability favours
Postgres when I'm actually asking about alternatives.

Regards
David M Bennett FACS

Andl - A New Database Language - andl.org

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#20dandl
david@andl.org
In reply to: Pavel Stehule (#19)
Re: What limits Postgres performance when the whole database lives in cache?

From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Pavel Stehule

I guess my question then is: how much do you pay for that durability? If you benchmark Postgres configured for pure in-memory usage with absolutely no writes to disk (or SSD or network), where is it spending its time? Is there a lot of overhead in getting data in and out of cache buffers and conversions and in concurrency control?

It is not about durability only.

Postgres holds data in format equal or similar to saved data on persistent storage. There are repeated serialization and deserialization. Some structures are designed to be simply saved (like Btree), but the performance is second target.

I believe so new memory databases can be 10-100x faster - depends on use case, because they hold data primary in memory and uses different data structures. The performance of these databases is great, when all data are well placed in memory all time. But the performance is pretty bad, when this rule is not true. There is another issue - when you increase speed of database write operations, probably you will hit a file system limits, spin lock issues - so it is one reason, why big system are based on distributed systems more and more.

That’s the point I’m making, exactly. The question is: does anyone have a handle on how big that cost really is, as a guide to whether to try to do anything about it? Is it really 25x as Stonebraker says?

Regards

David M Bennett FACS

_____

Andl - A New Database Language - andl.org

#21Pavel Stehule
pavel.stehule@gmail.com
In reply to: dandl (#20)