Linux max on shared buffers?
We are trying to throw a lot of memory at PostgreSQL to try to boost
performance. In an attempt to put our entire database into memory, I want to
allocate 2 to 3 GB out of 4 GB on a dual processor server running Red Hat
Linux 7.3 and PostgreSQL 7.2.1. We only expect 4 or 5 concurrent backends.
When I try to allocate 2 GB or more, I get the following error when I try to
start PostgreSQL (after setting kernel.shmall and kernel.shmmax
appropriately):
IpcMemoryCreate: shmat(id=163840) failed: Cannot allocate memory
I can safely allocate a little under 2 GB. Is this a Linux upper bound on
how much memory can be allocated to a single program? Is there another
kernel parameter besides kernel.shmall and kernel.shmmax that can be set to
allow more memory to be allocated?
Import Notes
Reference msg id not found: 20020614201652.56D6E478C36@postgresql.org
How does one increase the SHMMAX? Does it require recompiling the kernel?
Terry Fielder
Network Engineer
Great Gulf Homes / Ashton Woods Homes
terry@greatgulfhomes.com
Show quoted text
-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org]On Behalf Of Martin Dillard
Sent: Wednesday, July 10, 2002 1:45 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Linux max on shared buffers?We are trying to throw a lot of memory at PostgreSQL to try to boost
performance. In an attempt to put our entire database into
memory, I want to
allocate 2 to 3 GB out of 4 GB on a dual processor server
running Red Hat
Linux 7.3 and PostgreSQL 7.2.1. We only expect 4 or 5
concurrent backends.When I try to allocate 2 GB or more, I get the following
error when I try to
start PostgreSQL (after setting kernel.shmall and kernel.shmmax
appropriately):IpcMemoryCreate: shmat(id=163840) failed: Cannot allocate memory
I can safely allocate a little under 2 GB. Is this a Linux
upper bound on
how much memory can be allocated to a single program? Is there another
kernel parameter besides kernel.shmall and kernel.shmmax that
can be set to
allow more memory to be allocated?---------------------------(end of
broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
At 02:17 PM 7/10/2002, terry@greatgulfhomes.com wrote:
How does one increase the SHMMAX? Does it require recompiling the kernel?
To answer both questions:
Increase it with /etc/sysctl.conf entries such as: (Linux 2.2 and up)
kernel.shmall = 1073741824
kernel.shmmax = 1073741824
Or:
echo 1073741824 > /proc/sys/kernel/shmall
etc.
Linux on i386 has a problem with much memory: a 4 gig limit on addressible
memory per process (at least, without using bank switching). The Kernel
usually reserves 1 or 2 gigs for itself, leaving only 2 or 3 gigs for the
process. Hence, it's unlikely you'd ever be able to map a larger shared
memory segment into a process anyway, so I figure the 2gb limit is pretty
reasonable.
On my 8gb system, I give only 256-512 megs to shared memory, and also set
effective_cache_size = 625000
Telling Postgresql that there's probably 5 gigs of data in the OS level
cache. I'm not sure if that does anything, but since there's more like 6.5
or 7 gigs in the OS cache, I figure it can't hurt.
Cheers,
Doug
Show quoted text
Terry Fielder
Network Engineer
Great Gulf Homes / Ashton Woods Homes
terry@greatgulfhomes.com-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org]On Behalf Of Martin Dillard
Sent: Wednesday, July 10, 2002 1:45 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Linux max on shared buffers?We are trying to throw a lot of memory at PostgreSQL to try to boost
performance. In an attempt to put our entire database into
memory, I want to
allocate 2 to 3 GB out of 4 GB on a dual processor server
running Red Hat
Linux 7.3 and PostgreSQL 7.2.1. We only expect 4 or 5
concurrent backends.When I try to allocate 2 GB or more, I get the following
error when I try to
start PostgreSQL (after setting kernel.shmall and kernel.shmmax
appropriately):IpcMemoryCreate: shmat(id=163840) failed: Cannot allocate memory
I can safely allocate a little under 2 GB. Is this a Linux
upper bound on
how much memory can be allocated to a single program? Is there another
kernel parameter besides kernel.shmall and kernel.shmmax that
can be set to
allow more memory to be allocated?---------------------------(end of
broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
By my understanding, there is a limit to that.
Since there can be no more than 4GB memory addressed on a ix86 system
PER process (you can have 16 of these partitions of 4GB with a Xeon
machine) and the kernel allocates 1GB of memory for its own purposes
(can be adjusted up or down during the kernel compilation) it leaves you
with 3GB of memory maximum per process.
Ofcoarse it is possible that the sharedmemory handler has a max of 2GB
it is a "pretty number" ;)
What you should try first is whether you gain any performance from 1GB
to 1.5GB and from 1.5GB to 2GB...
If that doesn't improve any, why bother about the 3GB.
Martin Dillard wrote:
Show quoted text
We are trying to throw a lot of memory at PostgreSQL to try to boost
performance. In an attempt to put our entire database into memory, I want to
allocate 2 to 3 GB out of 4 GB on a dual processor server running Red Hat
Linux 7.3 and PostgreSQL 7.2.1. We only expect 4 or 5 concurrent backends.When I try to allocate 2 GB or more, I get the following error when I try to
start PostgreSQL (after setting kernel.shmall and kernel.shmmax
appropriately):IpcMemoryCreate: shmat(id=163840) failed: Cannot allocate memory
I can safely allocate a little under 2 GB. Is this a Linux upper bound on
how much memory can be allocated to a single program? Is there another
kernel parameter besides kernel.shmall and kernel.shmmax that can be set to
allow more memory to be allocated?
On Wed, 10 Jul 2002, Martin Dillard wrote:
When I try to allocate 2 GB or more....
If I recall correctly, under normal circumstances a process under
Linux has an address space of only 2 GB. Therefore you will never
be able to allocate more memory than that. I think there's a patch
(maybe from SGI?) that lets you increase this to 3 GB, but at any
rate it's always going to be well under 4 GB, no matter what you
do, unless you move to a 64-bit processor.
But really, as discussed just in the last week on this list, you want to
allocate more like 10 MB or so to postgres' shared memory area. Then the
rest of your memory will be used as buffer cache and you will be happy.
If you want to know why, go back though the archives of this list.
cjs
--
Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC
Curt Sampson wrote:
On Wed, 10 Jul 2002, Martin Dillard wrote:
When I try to allocate 2 GB or more....
If I recall correctly, under normal circumstances a process under
Linux has an address space of only 2 GB. Therefore you will never
be able to allocate more memory than that. I think there's a patch
(maybe from SGI?) that lets you increase this to 3 GB, but at any
rate it's always going to be well under 4 GB, no matter what you
do, unless you move to a 64-bit processor.But really, as discussed just in the last week on this list, you want to
allocate more like 10 MB or so to postgres' shared memory area. Then the
rest of your memory will be used as buffer cache and you will be happy.
If you want to know why, go back though the archives of this list.
Woh, 10MB is clearly too low. Remember, there is copying overhead of
moving data from the kernel to the PostgreSQL shared buffers.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
On Wed, 10 Jul 2002, Bruce Momjian wrote:
Woh, 10MB is clearly too low. Remember, there is copying overhead of
moving data from the kernel to the PostgreSQL shared buffers.
Yes, but the cost of copying between a postgres buffer and an OS buffer
is much, much less than the cost of copying between an OS buffer and disk.
However, it all depends on your working set, doesn't it? So I can't make
a strong argument either way. What do you think is better? 20 MB? 100
MB? Do you allocate based on the number of connections, or a proportion
of the machine's memory, or something else? I was estimating based on
the number of connections.
cjs
--
Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC
Curt Sampson wrote:
On Wed, 10 Jul 2002, Bruce Momjian wrote:
Woh, 10MB is clearly too low. Remember, there is copying overhead of
moving data from the kernel to the PostgreSQL shared buffers.Yes, but the cost of copying between a postgres buffer and an OS buffer
is much, much less than the cost of copying between an OS buffer and disk.However, it all depends on your working set, doesn't it? So I can't make
a strong argument either way. What do you think is better? 20 MB? 100
MB? Do you allocate based on the number of connections, or a proportion
of the machine's memory, or something else? I was estimating based on
the number of connections.
If it is a dedicated machine, I would think some percentage of total RAM
would be good, perhaps 25%. If it isn't dedicated, the working set
becomes a major issue, yes.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
What I have found, on a machine that is doing mostly DBMS servering
if you have a small shared buffer you will see that the CPU is
not fully utilized as it is waiting on I/O a lot of the time.
I keep adding buffer space until the CPU utilization goes to
~100%. Adding more buffer after this does little to speed up
querries. I use the "top" display under either Linux or Solaris.
This is purly imperical and based on observation but it makes
sense that if the point of the buffer is to avoid disk I/O that
after disk I/O is no longer the bottle neck then adding more
buffers does little good. On a typial 1 CPU PC after about 200MB
of buffer the CPU becomes the bottle neck
=====
Chris Albertson
Home: 310-376-1029 chrisalbertson90278@yahoo.com
Cell: 310-990-7550
Office: 310-336-5189 Christopher.J.Albertson@aero.org
__________________________________________________
Do You Yahoo!?
Sign up for SBC Yahoo! Dial - First Month Free
http://sbc.yahoo.com
On Wed, 10 Jul 2002, Chris Albertson wrote:
What I have found, on a machine that is doing mostly DBMS servering
if you have a small shared buffer you will see that the CPU is
not fully utilized as it is waiting on I/O a lot of the time.
What OS are you using? Unless it's really strange or really old, you
are probably just removing an operating system buffer for each shared
memory buffer you add. (But this is not good; two smaller buffers in
series are not as good as one big buffer.)
I keep adding buffer space until the CPU utilization goes to
~100%. Adding more buffer after this does little to speed up
querries. I use the "top" display under either Linux or Solaris.
Um...I just don't quite feel up to explaining why I think this is
quite wrong...
cjs
--
Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC
What I have found, on a machine that is doing mostly DBMS servering
if you have a small shared buffer you will see that the CPU is
not fully utilized as it is waiting on I/O a lot of the time.
Just to repeat what curt said :-) : This doesn't make sence to me.
Whatever memory you give to Postgres for caching *could* be used by the
OS to cache. It doesn't seem that changing PG's buffer count should
make any real difference in real disk IO; the disk pages will get cached
one way or the other.
What OS are you using? Unless it's really strange or really old, you
are probably just removing an operating system buffer for each shared
memory buffer you add. (But this is not good; two smaller buffers in
series are not as good as one big buffer.)
When are PG's buffers *not* in series with the OS's? *Any* page that
postgres has read from disk has been in the OS cache. When reading a
page that's not in either cache, you get:
[Disk] => <read> => [OS cache] => <mem copy> => [PG cache]
And when its in only the OS cache, you get:
[OS cache] => <mem copy> => [PG cache]
The bigger your PG cache, the less often you have to ask the OS for a
page. That <mem copy> (plus syscall overhead) that happens between OS
cache and PG cache is expensive compared to not having to do it. Not
horribly so, I realize, but every little bit helps...
Unless I'm just completely missing something here, the bigger the PG
cache the better, within reason. Please don't come back with a remark
about other processes needing memory and possible swap storms, etc.; I
know to avoid that, and my approach would need to be *heavily* curbed on
machines that do things besides run Postgres. But as long as there is
no danger of starving the machine for memory, I don't see what the harm
is in giving more memory to Postgres.
<snip snip>
--Glen Parker
--- Curt Sampson <cjs@cynic.net> wrote:
On Wed, 10 Jul 2002, Chris Albertson wrote:
<SNIP>
I keep adding buffer space until the CPU utilization goes to
~100%. Adding more buffer after this does little to speed up
querries. I use the "top" display under either Linux or Solaris.Um...I just don't quite feel up to explaining why I think this is
quite wrong...
It is easy to explain why I am wrong; You have run timming tests
and have measurments that disagree with what I wrote. If you do
have such data, good then we can find out the diferences in the
test methods.
I am not so interrrested in theroy but possibly the O/S (linux and
Solaris in my case) buffer and PostgreSQL bufers are not
interchanceable
as they are managed using diferent algorithums and the OS level buffer
is shared with other aplications
=====
Chris Albertson
Home: 310-376-1029 chrisalbertson90278@yahoo.com
Cell: 310-990-7550
Office: 310-336-5189 Christopher.J.Albertson@aero.org
__________________________________________________
Do You Yahoo!?
Sign up for SBC Yahoo! Dial - First Month Free
http://sbc.yahoo.com
On Thu, 11 Jul 2002, Chris Albertson wrote:
--- Curt Sampson <cjs@cynic.net> wrote:On Wed, 10 Jul 2002, Chris Albertson wrote:
<SNIP>
I keep adding buffer space until the CPU utilization goes to
~100%. Adding more buffer after this does little to speed up
querries. I use the "top" display under either Linux or Solaris.Um...I just don't quite feel up to explaining why I think this is
quite wrong...It is easy to explain why I am wrong; You have run timming tests
and have measurments that disagree with what I wrote.
You're measuring the wrong thing. The point is not to make your
CPU usage as high as possible; the point is to make your queries
as fast as possible.
I am not so interrrested in theroy but possibly the O/S (linux and
Solaris in my case) buffer and PostgreSQL bufers are not
interchanceable
as they are managed using diferent algorithums and the OS level buffer
is shared with other aplications
They are interchanged; if you use memory for buffering, your OS doesn't.
The OS buffers are always used.
Worse yet, it appears that many systems page shared memory anyway,
so your buffers may be swapped out to disk, which sort of negates the
point of them.
Anyway, maybe one day I'll have the time and energy to write up a
big long explanation of a data block's trip between disk and postgres
buffer, so that people will understand. There seems to be a lot of
confusion here.
cjs
--
Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC
On Thu, 11 Jul 2002, Glen Parker wrote:
When are PG's buffers *not* in series with the OS's?
Never. That's the problem.
[Disk] => <read> => [OS cache] => <mem copy> => [PG cache]
And when its in only the OS cache, you get:
[OS cache] => <mem copy> => [PG cache]
Right.
The bigger your PG cache, the less often you have to ask the OS for a
page. That <mem copy> (plus syscall overhead) that happens between OS
cache and PG cache is expensive compared to not having to do it. Not
horribly so, I realize, but every little bit helps...
Yes, but it's much, much cheaper than reading a page from disk.
Unless I'm just completely missing something here, the bigger the PG
cache the better, within reason.
Let's walk through an example. I have four pages total for caching.
Let's look at a read scenario based on two for postgres and two for the
OS, and one for postgres and three for the OS. Pn is a postgres buffer
and OSn is an OS buffer; the numbers below those show which disk blocks
are in which caches. We'll use an LRU algorithm for both caches and read
the blocks in this order: 1 2 3 2 1 2 3.
OS1 OS2 P1 P2 Action
- - - - Start with all caches empty.
1 - 1 - Read block 1 (replaces -/-). disk + memory copy.
1 2 1 2 Read block 2 (replaces -/-). disk + memory copy.
3 2 3 2 Read block 3 (replaces 1/1). disk + memory copy.
3 2 3 2 Read block 2 (in cache). no copies.
3 1 3 1 Read block 1 (replaces 2/2). disk + memory copy.
2 1 2 1 Read block 2 (replaces 3/3). disk + memory copy.
2 3 2 3 Read block 3 (replaces 1/1). disk + memory copy.
Now with postgres getting one buffer and the OS getting three:
OS1 OS2 OS3 P1 Action
- - - - Start with all caches empty.
1 - - 1 Read block 1 (replaces -/-). disk + memory copy.
1 2 - 2 Read block 2 (replaces -/1). disk + memory copy.
1 2 3 3 Read block 3 (replaces -/2). disk + memory copy.
1 2 3 2 Read block 2 (in cache, replaces 3). memory copy.
1 2 3 1 Read block 1 (in cache, replaces 2). memory copy.
1 2 3 2 Read block 2 (in cache, replaces 1). memory copy.
1 2 3 3 Read block 3 (in cache, replaces 2). memory copy.
So with 2 and 2 buffers for the OS and postgres, we end up doing
six disk reads and six memory copies. With 3 and 1, we do three
disk reads and seven memory copies. This is why you do not want to
take buffers from the OS in order to give them to postgres.
So, you ask, why not devote almost all your memory to postgres cache,
and minimize the OS's caching? Answer: if you can do that without
causing any of your programs to swap, that's fine. But I doubt you can,
unless you have, say, perfect control over how many sorts will ever be
ocurring at once, and so on.
cjs
--
Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC
On Thu, Jul 11, 2002 at 05:07:29PM +0900, Curt Sampson wrote:
Let's walk through an example. I have four pages total for caching.
Let's look at a read scenario based on two for postgres and two for the
OS, and one for postgres and three for the OS. Pn is a postgres buffer
and OSn is an OS buffer; the numbers below those show which disk blocks
are in which caches. We'll use an LRU algorithm for both caches and read
the blocks in this order: 1 2 3 2 1 2 3.
Hmm, what about OS's that swap shared memory to disk. Wouldn't that change
things somewhat? Probably more in favour of giving more memory to the OS.
The other possibility would be to use mmap instead. That way you avoid the
double buffering altogether. Do you have any ideas about that?
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
There are 10 kinds of people in the world, those that can do binary
arithmetic and those that can't.
On Thu, 11 Jul 2002, Martijn van Oosterhout wrote:
Hmm, what about OS's that swap shared memory to disk. Wouldn't that change
things somewhat?
It just makes things worse. Paging a buffer to disk does the I/O you
were trying not to do.
Probably more in favour of giving more memory to the OS.
Yup.
The other possibility would be to use mmap instead. That way you avoid the
double buffering altogether. Do you have any ideas about that?
On this page
http://archives.postgresql.org/pgsql-hackers/2002-06/threads.php
there's a thread about "Buffer management", and I have some posts to that
about using mmap. Unfortunately, I can't point out the exact posts, because
the php that's generating these pages is probably missing an end table
tag, and so the pages render as completely blank on Linux Netscape 4.78.
cjs
--
Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC
On Thu, Jul 11, 2002 at 06:04:20PM +0900, Curt Sampson wrote:
On Thu, 11 Jul 2002, Martijn van Oosterhout wrote:
The other possibility would be to use mmap instead. That way you avoid the
double buffering altogether. Do you have any ideas about that?On this page
http://archives.postgresql.org/pgsql-hackers/2002-06/threads.php
there's a thread about "Buffer management", and I have some posts to that
about using mmap. Unfortunately, I can't point out the exact posts, because
the php that's generating these pages is probably missing an end table
tag, and so the pages render as completely blank on Linux Netscape 4.78.
Yeah, I thought of the writeback issue also. I was thinking that you might
have to keep the current shared memory scheme for written pages but use mmap
for reading them in. Since the number of written pages is much smaller than the
number of read pages, you can avoid a lot of double buffering.
But that does increase complexity and possibly cause portability problems,
since that could make assumptions about how buffers, shared memory and mmaps
are shared.
Not totally straight forward. Needs more thought obviously.
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
There are 10 kinds of people in the world, those that can do binary
arithmetic and those that can't.
Martijn van Oosterhout wrote:
The other possibility would be to use mmap instead. That way you avoid the
double buffering altogether. Do you have any ideas about that?
I ran across a link last night on the R list to the TPIE (Transparent
Parallel I/O Environment) project at Duke University. It looks
interesting and somewhat related to this thread:
It is C++ and GPL'd, but there are links to some related papers.
Joe
Doug Fields schrieb:
At 02:17 PM 7/10/2002, terry@greatgulfhomes.com wrote:
How does one increase the SHMMAX? Does it require recompiling the kernel?
To answer both questions:
Increase it with /etc/sysctl.conf entries such as: (Linux 2.2 and up)
kernel.shmall = 1073741824
kernel.shmmax = 1073741824
Hello,
here on Debian, those settings are done *after* PostgreSQL is started.
$ ls /etc/rc2.d/*{proc,post}* | sort
/etc/rc2.d/S20postgresql
/etc/rc2.d/S20procps.sh
no problem? or should I start reportbug?
Thanks,
Knut S�bert
On Thu, 11 Jul 2002, Martijn van Oosterhout wrote:
Yeah, I thought of the writeback issue also. I was thinking that you
might have to keep the current shared memory scheme for written pages
but use mmap for reading them in. Since the number of written pages
is much smaller than the number of read pages, you can avoid a lot of
double buffering.
Actually, come to think of it, since we write the entire new page
to the WAL, we'd have to copy the page anyway. So this basically
solves the problem; you mark the page as having a "to-write" copy,
copy it, modify it, write the log file, and copy the new page back
to the mmap buffer. Not too difficult at all, really.
If we ever did stop writing the entire page to WAL, it's not too
difficult anyway, becuase we just have to do the actually write to
the page *after* the log entry is on stable storage, as I pointed
out, which means keeping a record of the change to be written. But
we have that record of course, because it's exactly what we just
wrote to the log. And we also, of course, have logic to take that
record and apply it to the table...
...and possibly cause portability
problems, since that could make assumptions about how buffers, shared
memory and mmaps are shared.
No, portability problems are very unlikely. We already use shared
memory and it works. Mmap has very well defined behaviour and, in
my experience, works very consistently across all modern systems.
cjs
--
Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC