Postgres as In-Memory Database?

Started by Stefan Kellerover 12 years ago47 messagesgeneral
Jump to latest
#1Stefan Keller
sfkeller@gmail.com

How can Postgres be used and configured as an In-Memory Database?

Does anybody know of thoughts or presentations about this "NoSQL feature" -
beyond e.g. "Perspectives on NoSQL" from Gavin Roy at PGCon 2010)?

Given, say 128 GB memory or more, and (read-mostly) data that fit's into
this, what are the hints to optimize Postgres (postgresql.conf etc.)?

-- Stefan

#2Michael Paquier
michael@paquier.xyz
In reply to: Stefan Keller (#1)
Re: Postgres as In-Memory Database?

On Sun, Nov 17, 2013 at 8:25 PM, Stefan Keller <sfkeller@gmail.com> wrote:

How can Postgres be used and configured as an In-Memory Database?

Does anybody know of thoughts or presentations about this "NoSQL feature" -
beyond e.g. "Perspectives on NoSQL" from Gavin Roy at PGCon 2010)?

Given, say 128 GB memory or more, and (read-mostly) data that fit's into
this, what are the hints to optimize Postgres (postgresql.conf etc.)?

In this case as you are trading system safety (system will not be
crash-safe) for performance... The following parameters would be
suited:
- Improve performance by reducing the amount of data flushed:
fsync = off
synchronous_commit=off
- Reduce the size of WALs:
full_page_writes = off
- Disable the background writer:
bgwriter_lru_maxpages = 0
Regards,
--
Michael

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Edson Richter
edsonrichter@hotmail.com
In reply to: Michael Paquier (#2)
Re: Postgres as In-Memory Database?

Em 17/11/2013 10:00, Michael Paquier escreveu:

On Sun, Nov 17, 2013 at 8:25 PM, Stefan Keller <sfkeller@gmail.com> wrote:

How can Postgres be used and configured as an In-Memory Database?

Does anybody know of thoughts or presentations about this "NoSQL feature" -
beyond e.g. "Perspectives on NoSQL" from Gavin Roy at PGCon 2010)?

Given, say 128 GB memory or more, and (read-mostly) data that fit's into
this, what are the hints to optimize Postgres (postgresql.conf etc.)?

In this case as you are trading system safety (system will not be
crash-safe) for performance... The following parameters would be
suited:
- Improve performance by reducing the amount of data flushed:
fsync = off
synchronous_commit=off
- Reduce the size of WALs:
full_page_writes = off
- Disable the background writer:
bgwriter_lru_maxpages = 0
Regards,

One question: would you please expand your answer and explain how would
this adversely affect async replication?

Edson

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4rob stone
floriparob@gmail.com
In reply to: Stefan Keller (#1)
Re: Postgres as In-Memory Database?

On Sun, 2013-11-17 at 12:25 +0100, Stefan Keller wrote:

How can Postgres be used and configured as an In-Memory Database?

Does anybody know of thoughts or presentations about this "NoSQL
feature" - beyond e.g. "Perspectives on NoSQL" from Gavin Roy at PGCon
2010)?

Given, say 128 GB memory or more, and (read-mostly) data that fit's
into this, what are the hints to optimize Postgres (postgresql.conf
etc.)?

-- Stefan

Not as being completely "in memory".
Back in the "good ol'days" of DMS II (written in Algol and ran on
Burroughs mainframes) and Linc II (also Burroughs) it was possible to
define certain tables as being memory resident. This was useful for low
volatile data such as salutations, street types, county or state codes,
time zones, preferred languages, etc.
It saved disk I/O twice. Firstly building your drop down lists and
secondly when the entered data hit the server and was validated.
It would be good to have a similar feature in PostgreSql.
If a table was altered by, say inserting a new street type, then the
data base engine has to refresh the cache. This is the only overhead.

Cheers,
Robert

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Edson Richter
edsonrichter@hotmail.com
In reply to: rob stone (#4)
Re: Postgres as In-Memory Database?

Em 17/11/2013 12:15, rob stone escreveu:

On Sun, 2013-11-17 at 12:25 +0100, Stefan Keller wrote:

How can Postgres be used and configured as an In-Memory Database?

Does anybody know of thoughts or presentations about this "NoSQL
feature" - beyond e.g. "Perspectives on NoSQL" from Gavin Roy at PGCon
2010)?

Given, say 128 GB memory or more, and (read-mostly) data that fit's
into this, what are the hints to optimize Postgres (postgresql.conf
etc.)?

-- Stefan

Not as being completely "in memory".
Back in the "good ol'days" of DMS II (written in Algol and ran on
Burroughs mainframes) and Linc II (also Burroughs) it was possible to
define certain tables as being memory resident. This was useful for low
volatile data such as salutations, street types, county or state codes,
time zones, preferred languages, etc.
It saved disk I/O twice. Firstly building your drop down lists and
secondly when the entered data hit the server and was validated.
It would be good to have a similar feature in PostgreSql.
If a table was altered by, say inserting a new street type, then the
data base engine has to refresh the cache. This is the only overhead.

Cheers,
Robert

For this purpose (building drop down lists, salutations, street types,
county or state codes), I keep a permanent data cache at app server side
(after all, they will be shared among all users - even on a multi tenant
application). This avoids network connection, and keep database server
memory available for database operations (like reporting and transactions).
But I agree there are lots of gaings having a "in memory" option for
tables and so. I believe PostgreSQL already does that automatically
(most used tables are kept in memory cache).

Edson.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Stefan Keller
sfkeller@gmail.com
In reply to: Edson Richter (#5)
Re: Postgres as In-Memory Database?

Hi Edson

As Rob wrote: Having a feature like an in-memory table like SQLite has [1]http://www.sqlite.org/inmemorydb.html
would make application cahces obsolete and interesting to discuss (but that
was'nt exactly what I asked above).

--Stefan

[1]: http://www.sqlite.org/inmemorydb.html
[2]: http://www.postgresql.org/docs/9.1/static/non-durability.html

2013/11/17 Edson Richter <edsonrichter@hotmail.com>

Show quoted text

Em 17/11/2013 12:15, rob stone escreveu:

On Sun, 2013-11-17 at 12:25 +0100, Stefan Keller wrote:

How can Postgres be used and configured as an In-Memory Database?

Does anybody know of thoughts or presentations about this "NoSQL
feature" - beyond e.g. "Perspectives on NoSQL" from Gavin Roy at PGCon
2010)?

Given, say 128 GB memory or more, and (read-mostly) data that fit's
into this, what are the hints to optimize Postgres (postgresql.conf
etc.)?

-- Stefan

Not as being completely "in memory".
Back in the "good ol'days" of DMS II (written in Algol and ran on
Burroughs mainframes) and Linc II (also Burroughs) it was possible to
define certain tables as being memory resident. This was useful for low
volatile data such as salutations, street types, county or state codes,
time zones, preferred languages, etc.
It saved disk I/O twice. Firstly building your drop down lists and
secondly when the entered data hit the server and was validated.
It would be good to have a similar feature in PostgreSql.
If a table was altered by, say inserting a new street type, then the
data base engine has to refresh the cache. This is the only overhead.

Cheers,
Robert

For this purpose (building drop down lists, salutations, street types,
county or state codes), I keep a permanent data cache at app server side
(after all, they will be shared among all users - even on a multi tenant
application). This avoids network connection, and keep database server
memory available for database operations (like reporting and transactions).
But I agree there are lots of gaings having a "in memory" option for
tables and so. I believe PostgreSQL already does that automatically (most
used tables are kept in memory cache).

Edson.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Stefan Keller
sfkeller@gmail.com
In reply to: Edson Richter (#3)
Re: Postgres as In-Memory Database?

Hi Edson,

On 2013/11/17 Edson Richter <edsonrichter@hotmail.com> you wrote:

One question: would you please expand your answer and explain how would

this adversely affect async replication?

Is this a question or a hint (or both) :-)? Of course almost all
non-durable settings [1]http://www.postgresql.org/docs/9.1/static/non-durability.html will delay replication.

I think I have to add, that pure speed of a read-mostly database is the
main scenario I have in mind.
Duration, High-availability and Scaling out are perhaps additional or
separate scenarios.

So, to come back to my question: I think that Postgres could be even faster
by magnitudes, if the assumption of writing to slow secondary storage (like
disks) is removed (or replaced).

--Stefan

[1]: http://www.postgresql.org/docs/9.1/static/non-durability.html

2013/11/17 Edson Richter <edsonrichter@hotmail.com>

Show quoted text

Em 17/11/2013 10:00, Michael Paquier escreveu:

On Sun, Nov 17, 2013 at 8:25 PM, Stefan Keller <sfkeller@gmail.com>

wrote:

How can Postgres be used and configured as an In-Memory Database?

Does anybody know of thoughts or presentations about this "NoSQL
feature" -
beyond e.g. "Perspectives on NoSQL" from Gavin Roy at PGCon 2010)?

Given, say 128 GB memory or more, and (read-mostly) data that fit's into
this, what are the hints to optimize Postgres (postgresql.conf etc.)?

In this case as you are trading system safety (system will not be
crash-safe) for performance... The following parameters would be
suited:
- Improve performance by reducing the amount of data flushed:
fsync = off
synchronous_commit=off
- Reduce the size of WALs:
full_page_writes = off
- Disable the background writer:
bgwriter_lru_maxpages = 0
Regards,

One question: would you please expand your answer and explain how would
this adversely affect async replication?

Edson

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8Edson Richter
edsonrichter@hotmail.com
In reply to: Stefan Keller (#6)
Re: Postgres as In-Memory Database?

Em 17/11/2013 19:26, Stefan Keller escreveu:

Hi Edson

As Rob wrote: Having a feature like an in-memory table like SQLite has
[1] would make application cahces obsoleteand interesting to discuss
(but that was'nt exactly what I asked above).

Hi, Stephan,

I don't think any feature you add to database server would bring
obsolescence to app server caches: app server caches have just no lag at
all:

1) Don't need network connection to database server
2) Don't need to materialize results (for instance, I have in mind a
Java or .Net app server running hundred thousands of objects in memory).

IMHO, no matter how much you improve database, app server caches
provides additional level of speed that cannot be achieved by database.

That said, I still can see huge improvements in database server.
Having strong in memory operation would bring substantial improvements.
For instance, if you have in-memory database (tables, indexes, etc) for
all sort of queries, and just **commit** to disks, then you will have
unprecedent performance.
I would get benefit from this architecture, since typical customer
database has < 64Gb on size (after 2 or 3 years of data recording). So,
a database server with 64Gb of memory would keep everything in memory,
and just commit data to disc.

In this case, commited data would be instantly available to queries
(because they are all in memory) while log (changes) is recorded in a
fast disk (a SSD, perhaps) and then those changes are made persistent
data, written async into slow massive disks (SCSI or SAS).

This would allow also a hybrid operation (too keep as much data pages as
possible in memory, with a target of 50% or more in memory).

When database server is started, it would have lazy load (data is loaded
and kept in memory as it is used) or eager load (for slower startup but
faster execution).

May be I'm just wondering too much, since I don't know PostgreSQL
internals...

Regards,

Edson

Show quoted text

--Stefan

[1] http://www.sqlite.org/inmemorydb.html
[2] http://www.postgresql.org/docs/9.1/static/non-durability.html

2013/11/17 Edson Richter <edsonrichter@hotmail.com
<mailto:edsonrichter@hotmail.com>>

Em 17/11/2013 12:15, rob stone escreveu:

On Sun, 2013-11-17 at 12:25 +0100, Stefan Keller wrote:

How can Postgres be used and configured as an In-Memory
Database?

Does anybody know of thoughts or presentations about this
"NoSQL
feature" - beyond e.g. "Perspectives on NoSQL" from Gavin
Roy at PGCon
2010)?

Given, say 128 GB memory or more, and (read-mostly) data
that fit's
into this, what are the hints to optimize Postgres
(postgresql.conf
etc.)?

-- Stefan

Not as being completely "in memory".
Back in the "good ol'days" of DMS II (written in Algol and ran on
Burroughs mainframes) and Linc II (also Burroughs) it was
possible to
define certain tables as being memory resident. This was
useful for low
volatile data such as salutations, street types, county or
state codes,
time zones, preferred languages, etc.
It saved disk I/O twice. Firstly building your drop down lists and
secondly when the entered data hit the server and was validated.
It would be good to have a similar feature in PostgreSql.
If a table was altered by, say inserting a new street type,
then the
data base engine has to refresh the cache. This is the only
overhead.

Cheers,
Robert

For this purpose (building drop down lists, salutations, street
types, county or state codes), I keep a permanent data cache at
app server side (after all, they will be shared among all users -
even on a multi tenant application). This avoids network
connection, and keep database server memory available for database
operations (like reporting and transactions).
But I agree there are lots of gaings having a "in memory" option
for tables and so. I believe PostgreSQL already does that
automatically (most used tables are kept in memory cache).

Edson.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org
<mailto:pgsql-general@postgresql.org>)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#9Martijn van Oosterhout
kleptog@svana.org
In reply to: Stefan Keller (#7)
Re: Postgres as In-Memory Database?

On Sun, Nov 17, 2013 at 10:33:30PM +0100, Stefan Keller wrote:

I think I have to add, that pure speed of a read-mostly database is the
main scenario I have in mind.
Duration, High-availability and Scaling out are perhaps additional or
separate scenarios.

So, to come back to my question: I think that Postgres could be even faster
by magnitudes, if the assumption of writing to slow secondary storage (like
disks) is removed (or replaced).

If your dataset fits in memory then the problem is trivial: any decent
programming language provides you with all the necessary tools to deal
with data purely in memory. There are also quite a lot of databases
that cover this area.

PostgreSQL excels in the area where your data is much larger than your
memory. This is a much more difficult problem and I think one worth
focussing on. Pure in memory databases are just not as interesting.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

He who writes carelessly confesses thereby at the very outset that he does
not attach much importance to his own thoughts.

-- Arthur Schopenhauer

#10Andreas Brandl
ml@3.141592654.de
In reply to: Edson Richter (#8)
Re: Postgres as In-Memory Database?

Edson,

Em 17/11/2013 19:26, Stefan Keller escreveu:

Hi Edson

As Rob wrote: Having a feature like an in-memory table like SQLite
has
[1] would make application cahces obsoleteand interesting to
discuss
(but that was'nt exactly what I asked above).

Hi, Stephan,

I don't think any feature you add to database server would bring
obsolescence to app server caches: app server caches have just no lag
at
all:

1) Don't need network connection to database server
2) Don't need to materialize results (for instance, I have in mind a
Java or .Net app server running hundred thousands of objects in
memory).

IMHO, no matter how much you improve database, app server caches
provides additional level of speed that cannot be achieved by
database.

That said, I still can see huge improvements in database server.
Having strong in memory operation would bring substantial
improvements.
For instance, if you have in-memory database (tables, indexes, etc)
for
all sort of queries, and just **commit** to disks, then you will have
unprecedent performance.
I would get benefit from this architecture, since typical customer
database has < 64Gb on size (after 2 or 3 years of data recording).
So,
a database server with 64Gb of memory would keep everything in
memory,
and just commit data to disc.

In this case, commited data would be instantly available to queries
(because they are all in memory) while log (changes) is recorded in a
fast disk (a SSD, perhaps) and then those changes are made persistent
data, written async into slow massive disks (SCSI or SAS).

This would allow also a hybrid operation (too keep as much data pages
as
possible in memory, with a target of 50% or more in memory).

When database server is started, it would have lazy load (data is
loaded
and kept in memory as it is used) or eager load (for slower startup
but
faster execution).

not sure I fully understand your point. Isn't this the typical mode-of-operation plus added cache warming?

Anyways, just wanted to point you to [1]http://raghavt.blogspot.fr/2012/04/caching-in-postgresql.html which gives a good overview of cache warming techniques.

Regards,
Andreas

[1]: http://raghavt.blogspot.fr/2012/04/caching-in-postgresql.html

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#11Edson Richter
edsonrichter@hotmail.com
In reply to: Andreas Brandl (#10)
Re: Postgres as In-Memory Database?

Em 17/11/2013 20:46, Andreas Brandl escreveu:

Edson,

Em 17/11/2013 19:26, Stefan Keller escreveu:

Hi Edson

As Rob wrote: Having a feature like an in-memory table like SQLite
has
[1] would make application cahces obsoleteand interesting to
discuss
(but that was'nt exactly what I asked above).

Hi, Stephan,

I don't think any feature you add to database server would bring
obsolescence to app server caches: app server caches have just no lag
at
all:

1) Don't need network connection to database server
2) Don't need to materialize results (for instance, I have in mind a
Java or .Net app server running hundred thousands of objects in
memory).

IMHO, no matter how much you improve database, app server caches
provides additional level of speed that cannot be achieved by
database.

That said, I still can see huge improvements in database server.
Having strong in memory operation would bring substantial
improvements.
For instance, if you have in-memory database (tables, indexes, etc)
for
all sort of queries, and just **commit** to disks, then you will have
unprecedent performance.
I would get benefit from this architecture, since typical customer
database has < 64Gb on size (after 2 or 3 years of data recording).
So,
a database server with 64Gb of memory would keep everything in
memory,
and just commit data to disc.

In this case, commited data would be instantly available to queries
(because they are all in memory) while log (changes) is recorded in a
fast disk (a SSD, perhaps) and then those changes are made persistent
data, written async into slow massive disks (SCSI or SAS).

This would allow also a hybrid operation (too keep as much data pages
as
possible in memory, with a target of 50% or more in memory).

When database server is started, it would have lazy load (data is
loaded
and kept in memory as it is used) or eager load (for slower startup
but
faster execution).

not sure I fully understand your point. Isn't this the typical mode-of-operation plus added cache warming?

Anyways, just wanted to point you to [1] which gives a good overview of cache warming techniques.

Regards,
Andreas

[1] http://raghavt.blogspot.fr/2012/04/caching-in-postgresql.html

Worndeful, never knew about it.
I'm ready ASAP.

Regards

Edson

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#12Andreas Brandl
ml@3.141592654.de
In reply to: Stefan Keller (#1)
Re: Postgres as In-Memory Database?

Hi Stefan,

How can Postgres be used and configured as an In-Memory Database?

we've put the data directory on our buildserver directly on a ramdisk (e.g. /dev/shm) to improve build times.

Obviously you then don't care too much about durability here, so one can switch off all related settings (as has already been pointed out). The only thing to do on a server reboot would be to re-create a fresh data directory on the ramdisk.

So if you're able to start from scratch relatively cheap (i.e. on a server reboot), don't care about durability/crash safety at all and your database fits into ram that solution is easy to handle.

I've also tried having only a separate tablespace on ramdisk but abandoned the idea because postgres seemed too surprised to see the tablespace empty after a reboot (all tables gone).

Overall the above solution works and improves our build times but I think there are better ways to have in-memory/application caches than using a postgres.

What is your use-case?

Regards
Andreas

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#13Stefan Keller
sfkeller@gmail.com
In reply to: Martijn van Oosterhout (#9)
Re: Postgres as In-Memory Database?

Hi Martijn

2013/11/17 Martijn van Oosterhout <kleptog@svana.org> wrote:

If your dataset fits in memory then the problem is trivial: any decent

programming language provides you with all the necessary tools to deal
with data purely in memory.

What about Atomicity, Concurrency and about SQL query language and the
extension mechanisms of Postgres? To me, that's not trivial.

There are also quite a lot of databases that cover this area.

Agreed. That's what partially triggered my question, It's notably Oracle
TimesTen, MS SQL Server 2014 (project Hekaton), (distributed) "MySQL
Cluster", SAP HANA or SQLite >3. To me this rather confirms that an
architecture and/or configuration for in-memory could be an issue also in
Postgres.

The actual architecture of Postgres assumes that memory resources are
expensive and optimizes avoiding disk I/O. Having more memory available
affects database design e.g. that it can optimize for a working set to be
stored entirely in main memory.

--Stefan

2013/11/17 Martijn van Oosterhout <kleptog@svana.org>

Show quoted text

On Sun, Nov 17, 2013 at 10:33:30PM +0100, Stefan Keller wrote:

I think I have to add, that pure speed of a read-mostly database is the
main scenario I have in mind.
Duration, High-availability and Scaling out are perhaps additional or
separate scenarios.

So, to come back to my question: I think that Postgres could be even

faster

by magnitudes, if the assumption of writing to slow secondary storage

(like

disks) is removed (or replaced).

If your dataset fits in memory then the problem is trivial: any decent
programming language provides you with all the necessary tools to deal
with data purely in memory. There are also quite a lot of databases
that cover this area.

PostgreSQL excels in the area where your data is much larger than your
memory. This is a much more difficult problem and I think one worth
focussing on. Pure in memory databases are just not as interesting.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

He who writes carelessly confesses thereby at the very outset that he

does

not attach much importance to his own thoughts.

-- Arthur Schopenhauer

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iQIVAwUBUolDw0vt++dL5i1EAQiArQ//cDQUz9YGOC+dmHBjsix1w1DdM3VUpAzE
U4yWcVb83tsq+jEuY4+NAGTnPk7Ks4cXACNQiMuS5ISSKdxkuCabt+pi1mHwi2z6
aO8/Fhy4nBIC9qllqCXUpexNrDoarQ3xCSrJF+8AB7Y2dtIpQkEmPszYoF2LzWv+
vOoydD19xiAVAYAlR+AJi7IBl4Z7IH4ODfdoQ75JW7ZJIjlg8BwPU0wfg8oJbzxT
nVZMj+8txD6ozzR49yTVXnDXwzlSxG95Bu15uinvBWHHQSuONvvpAhL/IfI1tPH7
7pz8KR6+SvFPS5MdsCQ31qSxQThWDg1JkG6aNpch8pG7XI0yBX4uK3ViwM07nIZ9
hTuEOZvtWwxA1OipwFxxc784qESunnY3zQ293xIaKlVAYG7f8Eg43wjQXL4Pi2Q/
cXvbh6T3bKQyyEcuStjzGALOXWCM+76P6vk9UhWNx1Gwf2R08MlkcbgwSIxg4CVi
7t0jm13/lMYGPpykUb5D1uFoymVOIOBzfpLkgzYcDcpMUjwpDmJhjaPTBwytil0e
Wh1LzILUC1e+8ojVbh4jY0W/yHdzFVm95zDKdfrLPUigsCte7nCAoC423iblI2VW
GBFJxydK73ttE1o2wBIK5h6j3vn2e7Tb521vi4eR+lTkjavHtVB6m6Mow+ZFvjvi
QS4G2eUy9o0=
=BGV+
-----END PGP SIGNATURE-----

#14Stefan Keller
sfkeller@gmail.com
In reply to: Andreas Brandl (#12)
Re: Postgres as In-Memory Database?

2013/11/18 Andreas Brandl <ml@3.141592654.de> wrote:

What is your use-case?

It's geospatial data from OpenStreetMap stored in a schema optimized for
PostGIS extension (produced by osm2pgsql).

BTW: Having said (to Martijn) that using Postgres is probably more
efficient, than programming an in-memory database in a decent language:
OpenStreetMap has a very, very large Node table which is heavily used by
other tables (like ways) - and becomes rather slow in Postgres. Since it's
of fixed length I'm looking at file_fixed_length_record_fdw extension
[1]: http://wiki.postgresql.org/wiki/Foreign_data_wrappers#file_fixed_length_record_fdw

--Stefan

[1]: http://wiki.postgresql.org/wiki/Foreign_data_wrappers#file_fixed_length_record_fdw
http://wiki.postgresql.org/wiki/Foreign_data_wrappers#file_fixed_length_record_fdw
[2]: https://github.com/adunstan/file_fixed_length_record_fdw

2013/11/18 Andreas Brandl <ml@3.141592654.de>

Show quoted text

Hi Stefan,

How can Postgres be used and configured as an In-Memory Database?

we've put the data directory on our buildserver directly on a ramdisk
(e.g. /dev/shm) to improve build times.

Obviously you then don't care too much about durability here, so one can
switch off all related settings (as has already been pointed out). The only
thing to do on a server reboot would be to re-create a fresh data directory
on the ramdisk.

So if you're able to start from scratch relatively cheap (i.e. on a server
reboot), don't care about durability/crash safety at all and your database
fits into ram that solution is easy to handle.

I've also tried having only a separate tablespace on ramdisk but abandoned
the idea because postgres seemed too surprised to see the tablespace empty
after a reboot (all tables gone).

Overall the above solution works and improves our build times but I think
there are better ways to have in-memory/application caches than using a
postgres.

What is your use-case?

Regards
Andreas

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#15Gavin Flower
GavinFlower@archidevsys.co.nz
In reply to: Stefan Keller (#13)
Re: Postgres as In-Memory Database?

On 18/11/13 12:53, Stefan Keller wrote:

Hi Martijn

2013/11/17 Martijn van Oosterhout <kleptog@svana.org
<mailto:kleptog@svana.org>> wrote:

If your dataset fits in memory then the problem is trivial: any decent
programming language provides you with all the necessary tools to deal
with data purely in memory.

What about Atomicity, Concurrency and about SQL query language and the
extension mechanisms of Postgres? To me, that's not trivial.

There are also quite a lot of databases that cover this area.

Agreed. That's what partially triggered my question, It's notably
Oracle TimesTen, MS SQL Server 2014 (project Hekaton), (distributed)
"MySQL Cluster", SAP HANA or SQLite >3. To me this rather confirms
that an architecture and/or configuration for in-memory could be an
issue also in Postgres.

The actual architecture of Postgres assumes that memory resources are
expensive and optimizes avoiding disk I/O. Having more memory
available affects database design e.g. that it can optimize for a
working set to be stored entirely in main memory.

--Stefan

[...]

It would allow optimised indexes that store memory pointers of
individual records, rather than to a block & then search for the record
- as well as other optimisations that only make sense when data is known
to be in RAM (and RAM is plentiful). As already big severs can have a
TerraByte or more of RAM, that will become more & more common place. I
have 32GB on my development box.

Cheers,
Gavin

#16Edson Richter
edsonrichter@hotmail.com
In reply to: Gavin Flower (#15)
Re: Postgres as In-Memory Database?

Em 17/11/2013 22:02, Gavin Flower escreveu:

On 18/11/13 12:53, Stefan Keller wrote:

Hi Martijn

2013/11/17 Martijn van Oosterhout <kleptog@svana.org
<mailto:kleptog@svana.org>> wrote:

If your dataset fits in memory then the problem is trivial: any decent
programming language provides you with all the necessary tools to deal
with data purely in memory.

What about Atomicity, Concurrency and about SQL query language and
the extension mechanisms of Postgres? To me, that's not trivial.

There are also quite a lot of databases that cover this area.

Agreed. That's what partially triggered my question, It's notably
Oracle TimesTen, MS SQL Server 2014 (project Hekaton), (distributed)
"MySQL Cluster", SAP HANA or SQLite >3. To me this rather confirms
that an architecture and/or configuration for in-memory could be an
issue also in Postgres.

The actual architecture of Postgres assumes that memory resources are
expensive and optimizes avoiding disk I/O. Having more memory
available affects database design e.g. that it can optimize for a
working set to be stored entirely in main memory.

--Stefan

[...]

It would allow optimised indexes that store memory pointers of
individual records, rather than to a block & then search for the
record - as well as other optimisations that only make sense when data
is known to be in RAM (and RAM is plentiful). As already big severs
can have a TerraByte or more of RAM, that will become more & more
common place. I have 32GB on my development box.

Cheers,
Gavin

Yes, those optimizations I was talking about: having database server
store transaction log in high speed solid state disks and consider it
done while background thread will update data in slower disks...

There is no reason to wait for fsync in slow disks to guarantee
consistency... If database server crashes, then it just need to "redo"
log transactions from fast disk into slower data storage and database
server is ready to go (I think this is Sybase/MS SQL strategy for years).

Also, consider to have lazy loading (current?) or eager loading
(perhaps, I just learned a bit about pg_warmcache).

And, of course, indexes that would point to pages in disk to memory
areas when in RAM - as you just mentioned.

Regards,

Edson

#17John R Pierce
pierce@hogranch.com
In reply to: Edson Richter (#16)
Re: Postgres as In-Memory Database?

On 11/17/2013 4:46 PM, Edson Richter wrote:

There is no reason to wait for fsync in slow disks to guarantee
consistency... If database server crashes, then it just need to "redo"
log transactions from fast disk into slower data storage and database
server is ready to go (I think this is Sybase/MS SQL strategy for years).

you need to fsync that slower disk before you can purge the older WAL or
redo log, whatever, from your fast storage. this fsync can, of course,
be quite a ways behind the current commit status.

--
john r pierce 37N 122W
somewhere on the middle of the left coast

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#18Jeff Janes
jeff.janes@gmail.com
In reply to: Stefan Keller (#7)
Re: Postgres as In-Memory Database?

On Sun, Nov 17, 2013 at 1:33 PM, Stefan Keller <sfkeller@gmail.com> wrote:

Hi Edson,

On 2013/11/17 Edson Richter <edsonrichter@hotmail.com> you wrote:

One question: would you please expand your answer and explain how would

this adversely affect async replication?

Is this a question or a hint (or both) :-)? Of course almost all
non-durable settings [1] will delay replication.

I think I have to add, that pure speed of a read-mostly database is the
main scenario I have in mind.
Duration, High-availability and Scaling out are perhaps additional or
separate scenarios.

I think the main bottleneck you will run into is the client-server
architecture. PostgreSQL does not have embedded mode, so every interaction
has to bounce data back and forth between processes.

So, to come back to my question: I think that Postgres could be even
faster by magnitudes, if the assumption of writing to slow secondary
storage (like disks) is removed (or replaced).

I rather doubt that. All the bottlenecks I know about for well cached
read-only workloads are around locking for in-memory concurrency
protection, and have little or nothing to do with secondary storage.

Cheers,

Jeff

#19Jeff Janes
jeff.janes@gmail.com
In reply to: Stefan Keller (#14)
Re: Postgres as In-Memory Database?

On Sun, Nov 17, 2013 at 4:02 PM, Stefan Keller <sfkeller@gmail.com> wrote:

2013/11/18 Andreas Brandl <ml@3.141592654.de> wrote:

What is your use-case?

It's geospatial data from OpenStreetMap stored in a schema optimized for
PostGIS extension (produced by osm2pgsql).

BTW: Having said (to Martijn) that using Postgres is probably more
efficient, than programming an in-memory database in a decent language:
OpenStreetMap has a very, very large Node table which is heavily used by
other tables (like ways) - and becomes rather slow in Postgres.

Do you know why it is slow? I'd give high odds that it would be a specific
implementation detail in the code that is suboptimal, or maybe a design
decision of PostGIS, rather than some high level architectural decision of
PostgreSQL.

Cheers,

Jeff

#20Stefan Keller
sfkeller@gmail.com
In reply to: Jeff Janes (#19)
Re: Postgres as In-Memory Database?

Hi Jeff and Martin

On 18. November 2013 17:44 Jeff Janes <jeff.janes@gmail.com> wrote:

I rather doubt that. All the bottlenecks I know about for well cached

read-only workloads are around

locking for in-memory concurrency protection, and have little or nothing

to do with secondary storage.

Interesting point. But I think this is only partially the case - as
Stonebraker asserts [1]Michael Stonebraker: “The Traditional RDBMS Wisdom is All Wrong”: http://blog.jooq.org/2013/08/24/mit-prof-michael-stonebraker-the-traditional-rdbms-wisdom-is-all-wrong/. While I don't see how to speed-up locking (and
latching), AFAIK there is quite some room for enhancement in buffer pooling
(see also [2]Oracle Database In-Memory Option - A Preview: In-Memory Acceleration for All Applications http://www.oracle.com/us/corporate/features/database-in-memory-option/index.html). Especially in GIS environments there are heavy calculations
and random access operations - so buffer pool will play a role.

To Martin: Stonebraker explicitly supports my hypothesis that in-memory
databases become prevalent in the future and that "elephants" will be
challenged if they don't adapt to new architectures, like in-memory and
column stores.

The specific use case here is a PostGIS query of an OpenStreetMap data of
the whole world (see [3]osm2pgsql benchmark: http://wiki.openstreetmap.org/wiki/Osm2pgsql/benchmarks).

On 2013/11/18 Jeff Janes <jeff.janes@gmail.com> wrote:

On Sun, Nov 17, 2013 at 4:02 PM, Stefan Keller <sfkeller@gmail.com>

wrote:

BTW: Having said (to Martijn) that using Postgres is probably more

efficient, than programming an in-memory

database in a decent language: OpenStreetMap has a very, very large Node

table which is heavily

used by other tables (like ways) - and becomes rather slow in Postgres.

Do you know why it is slow? I'd give high odds that it would be a

specific implementation detail in

the code that is suboptimal, or maybe a design decision of PostGIS,

rather than some high level

architectural decision of PostgreSQL.

Referring to the application is something you can always say - but
shouldn't prevent on enhancing Postgres.
PostGIS extension isn't involved in this use case. In this use case it's
about handling a very huge table with a bigint id and two numbers
representing lat/lon. As I said, an obvious solution is to access the
tupels as fixed length records (which isn't a universal solution - but
exploiting the fact that's in-memory).

You can replicate this use case by trying to load the planet file into
Postgres using osm2pgsql (see [2]Oracle Database In-Memory Option - A Preview: In-Memory Acceleration for All Applications http://www.oracle.com/us/corporate/features/database-in-memory-option/index.html). The record currently is about 20
hours(!) I think with 32GB and SSDs.

--Stefan

[1]: Michael Stonebraker: “The Traditional RDBMS Wisdom is All Wrong”: http://blog.jooq.org/2013/08/24/mit-prof-michael-stonebraker-the-traditional-rdbms-wisdom-is-all-wrong/
http://blog.jooq.org/2013/08/24/mit-prof-michael-stonebraker-the-traditional-rdbms-wisdom-is-all-wrong/
[2]: Oracle Database In-Memory Option - A Preview: In-Memory Acceleration for All Applications http://www.oracle.com/us/corporate/features/database-in-memory-option/index.html
for All Applications
http://www.oracle.com/us/corporate/features/database-in-memory-option/index.html
[3]: osm2pgsql benchmark: http://wiki.openstreetmap.org/wiki/Osm2pgsql/benchmarks
http://wiki.openstreetmap.org/wiki/Osm2pgsql/benchmarks

2013/11/18 Jeff Janes <jeff.janes@gmail.com>

Show quoted text

On Sun, Nov 17, 2013 at 4:02 PM, Stefan Keller <sfkeller@gmail.com> wrote:

2013/11/18 Andreas Brandl <ml@3.141592654.de> wrote:

What is your use-case?

It's geospatial data from OpenStreetMap stored in a schema optimized for
PostGIS extension (produced by osm2pgsql).

BTW: Having said (to Martijn) that using Postgres is probably more
efficient, than programming an in-memory database in a decent language:
OpenStreetMap has a very, very large Node table which is heavily used by
other tables (like ways) - and becomes rather slow in Postgres.

Do you know why it is slow? I'd give high odds that it would be a
specific implementation detail in the code that is suboptimal, or maybe a
design decision of PostGIS, rather than some high level architectural
decision of PostgreSQL.

Cheers,

Jeff

#21Andrew Sullivan
ajs@crankycanuck.ca
In reply to: Stefan Keller (#20)
#22Stefan Keller
sfkeller@gmail.com
In reply to: Andrew Sullivan (#21)
#23Andrew Dunstan
andrew@dunslane.net
In reply to: Stefan Keller (#14)
#24bricklen
bricklen@gmail.com
In reply to: Stefan Keller (#22)
#25Stefan Keller
sfkeller@gmail.com
In reply to: bricklen (#24)
#26Jeff Janes
jeff.janes@gmail.com
In reply to: Edson Richter (#16)
#27Edson Richter
edsonrichter@hotmail.com
In reply to: Jeff Janes (#26)
#28Jeff Janes
jeff.janes@gmail.com
In reply to: Edson Richter (#27)
#29Edson Richter
edsonrichter@hotmail.com
In reply to: Jeff Janes (#28)
#30Bruce Momjian
bruce@momjian.us
In reply to: Michael Paquier (#2)
#31Jeff Janes
jeff.janes@gmail.com
In reply to: Edson Richter (#29)
#32Stefan Keller
sfkeller@gmail.com
In reply to: Bruce Momjian (#30)
#33Jeff Janes
jeff.janes@gmail.com
In reply to: Stefan Keller (#20)
#34Stefan Keller
sfkeller@gmail.com
In reply to: Jeff Janes (#33)
#35Jeff Janes
jeff.janes@gmail.com
In reply to: Stefan Keller (#34)
#36Alban Hertroys
haramrae@gmail.com
In reply to: Jeff Janes (#35)
#37Stefan Keller
sfkeller@gmail.com
In reply to: Jeff Janes (#35)
#38Yeb Havinga
yebhavinga@gmail.com
In reply to: Jeff Janes (#35)
#39Stefan Keller
sfkeller@gmail.com
In reply to: Yeb Havinga (#38)
#40Florian Weimer
fweimer@redhat.com
In reply to: Stefan Keller (#39)
#41Stefan Keller
sfkeller@gmail.com
In reply to: Florian Weimer (#40)
#42Stefan Keller
sfkeller@gmail.com
In reply to: Stefan Keller (#39)
#43Hadi Moshayedi
hadi@citusdata.com
In reply to: Stefan Keller (#42)
#44Stefan Keller
sfkeller@gmail.com
In reply to: Hadi Moshayedi (#43)
#45Andrew Sullivan
ajs@crankycanuck.ca
In reply to: Stefan Keller (#44)
#46Stefan Keller
sfkeller@gmail.com
In reply to: Andrew Sullivan (#45)
#47Andrew Sullivan
ajs@crankycanuck.ca
In reply to: Stefan Keller (#46)