Memory Leakage Problem

Started by Kathy Loover 20 years ago22 messagesgeneral

Jump to latest

Kathy Lo

kathy.lo.ky@gmail.com

over 20 years ago

Hi,

I setup a database server using the following configuration.

Redhat 9.0
Postgresql 8.0.3

Then, I setup a client workstation to access this database server with
the following configuration.

Redhat 9.0
unixODBC 2.2.11
psqlodbc-08.01.0101

and write a C++ program to run database query.

In this program, it will access this database server using simple and
complex (joining tables) SQL Select statement and retrieve the matched
rows. For each access, it will connect the database and disconnect it.

I found that the memory of the databaser server nearly used up (total 2G RAM).

After I stop the program, the used memory did not free.

Is there any configuration in postgresql.conf I should set? Currently,
I just set the following in postgresql.conf

listen_addresses = '*'
max_stack_depth = 8100 (when I run "ulimit -s" the max. value that
kernel supports = 8192)
stats_row_level = true

And, I run pg_autovacuum as background job.

--
Kathy Lo

Tom Lane

tgl@sss.pgh.pa.us

over 20 years ago

In reply to: Kathy Lo (#1)

Re: Memory Leakage Problem

Kathy Lo <kathy.lo.ky@gmail.com> writes:

I found that the memory of the databaser server nearly used up (total 2G RAM).
After I stop the program, the used memory did not free.

I see no particular reason to believe that you are describing an actual
memory leak. More likely, you are just seeing the kernel's normal
behavior of eating up unused memory for disk cache space.

Repeat after me: zero free memory is the normal and desirable condition
on Unix-like systems.

regards, tom lane

Scott Marlowe

smarlowe@g2switchworks.com

over 20 years ago

In reply to: Kathy Lo (#1)

Re: Memory Leakage Problem

On Tue, 2005-12-06 at 03:22, Kathy Lo wrote:

Hi,

In this program, it will access this database server using simple and
complex (joining tables) SQL Select statement and retrieve the matched
rows. For each access, it will connect the database and disconnect it.

I found that the memory of the databaser server nearly used up (total 2G RAM).

After I stop the program, the used memory did not free.

Ummmm. What exactly do you mean? Can we see the output of top and / or
free? I'm guessing that what Tom said is right, you're just seeing a
normal state of how unix does things.

If your output of free looks like this:

-bash-2.05b$ free
total used free shared buffers cached
Mem:6096912 6069588 27324 0 260728 5547264
-/+ buffers/cache: 261596 5835316
Swap: 4192880 16320 4176560

Then that's normal.

That's the output of free on a machine with 6 gigs that runs a reporting
database. Note that while it shows almost ALL the memory as used, it is
being used by the kernel, which is a good thing. Note that 5547264 or
about 90% of memory is being used as kernel cache. That's a good thing.

Note you can also get yourself in trouble with top. It's not uncommon
for someone to see a bunch of postgres processes each eating up 50 or
more megs of ram, and panic and think that they're running out of
memory, when, in fact, 44 meg for each of those processes is shared, and
the real usage per backend is 6 megs or less.

Definitely grab yourself a good unix / linux sysadmin guide. The "in a
nutshell" books from O'Reilley (sp?) are a good starting point.

Scott Marlowe

smarlowe@g2switchworks.com

over 20 years ago

In reply to: Kathy Lo (#1)

Re: Memory Leakage Problem

Please keep replies on list, this may help others in the future, and
also, don't top post (i.e. put your responses after my responses...
Thanks)

On Tue, 2005-12-06 at 20:16, Kathy Lo wrote:

For a back-end database server running Postgresql 8.0.3, it's OK. But,
this problem seriously affects the performance of my application
server.

I upgraded my application server from

Redhat 7.3
unixODBC 2.2.4
Postgresql 7.2.1 with ODBC driver

to

Redhat 9.0
unixODBC 2.2.11
Postgresql 8.0.3
psqlodbc-08.01.0101
pg_autovacuum runs as background job

Before upgrading, the application server runs perfectly. After
upgrade, this problem appears.

When the application server receives the request from a client, it
will access the back-end database server using both simple and complex
query. Then, it will create a database locally to store the matched
rows for data processing. After some data processing, it will return
the result to the requested client. If the client finishes browsing
the result, it will drop the local database.

OK, there could be a lot of problems here. Are you actually doing
"create database ..." for each of these things? I'm not sure that's a
real good idea. Even create schema, which would be better, strikes me
as not the best way to handle this.

At the same time, this application server can serve many many clients
so the application server has many many local databases at the same
time.

Are you sure that you're better off with databases on your application
server? You might be better off with either running these temp dbs on
the backend server in the same cluster, or creating a cluster just for
these jobs that is somewhat more conservative in its memory usage. I
would lean towards doing this all on the backend server in one database
using multiple schemas.

After running the application server for a few days, the memory of the
application server nearly used up and start to use the swap memory
and, as a result, the application server runs very very slow and the
users complain.

Could you provide us with your evidence that the memory is "used up?"
What is the problem, and what you perceive as the problem, may not be
the same thing. Is it the output of top / free, and if so, could we see
it, or whatever output is convincing you you're running out of memory?

I tested the application server without accessing the local database
(not store matched rows). The testing program running in the
application server just retrieved rows from the back-end database
server and then returned to the requested client directly. The memory
usage of the application server becomes normally and it can run for a
long time.

Again, what you think is normal, and what normal really are may not be
the same thing. Evidence. Please show us the output of top / free or
whatever that is showing this.

Show quoted text

I found this problem after I upgrading the application server.

On 12/7/05, Scott Marlowe <smarlowe@g2switchworks.com> wrote:

On Tue, 2005-12-06 at 03:22, Kathy Lo wrote:

Hi,

In this program, it will access this database server using simple and
complex (joining tables) SQL Select statement and retrieve the matched
rows. For each access, it will connect the database and disconnect it.

I found that the memory of the databaser server nearly used up (total 2G

RAM).

After I stop the program, the used memory did not free.

Ummmm. What exactly do you mean? Can we see the output of top and / or
free? I'm guessing that what Tom said is right, you're just seeing a
normal state of how unix does things.

If your output of free looks like this:

-bash-2.05b$ free
total used free shared buffers cached
Mem:6096912 6069588 27324 0 260728 5547264
-/+ buffers/cache: 261596 5835316
Swap: 4192880 16320 4176560

Then that's normal.

That's the output of free on a machine with 6 gigs that runs a reporting
database. Note that while it shows almost ALL the memory as used, it is
being used by the kernel, which is a good thing. Note that 5547264 or
about 90% of memory is being used as kernel cache. That's a good thing.

Note you can also get yourself in trouble with top. It's not uncommon
for someone to see a bunch of postgres processes each eating up 50 or
more megs of ram, and panic and think that they're running out of
memory, when, in fact, 44 meg for each of those processes is shared, and
the real usage per backend is 6 megs or less.

Definitely grab yourself a good unix / linux sysadmin guide. The "in a
nutshell" books from O'Reilley (sp?) are a good starting point.

--
Kathy Lo

Import Notes

Reply to msg id not found: c10e7feb0512061816o21305ac7n576694f2624dc6ed@mail.gmail.com

Kathy Lo

kathy.lo.ky@gmail.com

over 20 years ago

In reply to: Scott Marlowe (#4)

Re: Memory Leakage Problem

On 12/8/05, Scott Marlowe <smarlowe@g2switchworks.com> wrote:

Please keep replies on list, this may help others in the future, and
also, don't top post (i.e. put your responses after my responses...
Thanks)

On Tue, 2005-12-06 at 20:16, Kathy Lo wrote:

For a back-end database server running Postgresql 8.0.3, it's OK. But,
this problem seriously affects the performance of my application
server.

I upgraded my application server from

Redhat 7.3
unixODBC 2.2.4
Postgresql 7.2.1 with ODBC driver

to

Redhat 9.0
unixODBC 2.2.11
Postgresql 8.0.3
psqlodbc-08.01.0101
pg_autovacuum runs as background job

Before upgrading, the application server runs perfectly. After
upgrade, this problem appears.

When the application server receives the request from a client, it
will access the back-end database server using both simple and complex
query. Then, it will create a database locally to store the matched
rows for data processing. After some data processing, it will return
the result to the requested client. If the client finishes browsing
the result, it will drop the local database.

OK, there could be a lot of problems here. Are you actually doing
"create database ..." for each of these things? I'm not sure that's a
real good idea. Even create schema, which would be better, strikes me
as not the best way to handle this.

Actually, my program is written using C++ so I use "create database"
SQL to create database. If not the best way, please tell me another
method to create database in C++ program.

At the same time, this application server can serve many many clients
so the application server has many many local databases at the same
time.

Are you sure that you're better off with databases on your application
server? You might be better off with either running these temp dbs on
the backend server in the same cluster, or creating a cluster just for
these jobs that is somewhat more conservative in its memory usage. I
would lean towards doing this all on the backend server in one database
using multiple schemas.

Because the data are distributed in many back-end database servers
(physically, in different hardware machines), I need to use
Application server to temporarily store the data retrieved from
different machines and then do the data processing. And, for security
reason, all the users cannot directly access the back-end database
servers. So, I use the database in application server to keep the
result of data processing.

After running the application server for a few days, the memory of the
application server nearly used up and start to use the swap memory
and, as a result, the application server runs very very slow and the
users complain.

Could you provide us with your evidence that the memory is "used up?"
What is the problem, and what you perceive as the problem, may not be
the same thing. Is it the output of top / free, and if so, could we see
it, or whatever output is convincing you you're running out of memory?

When the user complains the system becomes very slow, I use top to
view the memory statistics.
In top, I cannot find any processes that use so many memory. I just
found that all the memory was used up and the Swap memory nearly used
up.

I said it is the problem because, before upgrading the application
server, no memory problem even running the application server for 1
month. After upgrading the application server, this problem appears
just after running the application server for 1 week. Why having this
BIG difference between postgresql 7.2.1 on Redhat 7.3 and postgresql
8.0.3 on Redhat 9.0? I only upgrade the OS, postgresql, unixODBC and
postgresql ODBC driver. The program I written IS THE SAME.

I tested the application server without accessing the local database
(not store matched rows). The testing program running in the
application server just retrieved rows from the back-end database
server and then returned to the requested client directly. The memory
usage of the application server becomes normally and it can run for a
long time.

Again, what you think is normal, and what normal really are may not be
the same thing. Evidence. Please show us the output of top / free or
whatever that is showing this.

After I received the user's complain, I just use top to view the
memory statistic. I forgot to save the output. But, I am running a
test to get back the problem. So, after running the test, I will give
you the output of the top/free.

I found this problem after I upgrading the application server.

On 12/7/05, Scott Marlowe <smarlowe@g2switchworks.com> wrote:

On Tue, 2005-12-06 at 03:22, Kathy Lo wrote:

Hi,

In this program, it will access this database server using simple and
complex (joining tables) SQL Select statement and retrieve the matched
rows. For each access, it will connect the database and disconnect it.

I found that the memory of the databaser server nearly used up (total

2G

RAM).

After I stop the program, the used memory did not free.

Ummmm. What exactly do you mean? Can we see the output of top and / or
free? I'm guessing that what Tom said is right, you're just seeing a
normal state of how unix does things.

If your output of free looks like this:

-bash-2.05b$ free
total used free shared buffers cached
Mem:6096912 6069588 27324 0 260728 5547264
-/+ buffers/cache: 261596 5835316
Swap: 4192880 16320 4176560

Then that's normal.

That's the output of free on a machine with 6 gigs that runs a reporting
database. Note that while it shows almost ALL the memory as used, it is
being used by the kernel, which is a good thing. Note that 5547264 or
about 90% of memory is being used as kernel cache. That's a good thing.

Note you can also get yourself in trouble with top. It's not uncommon
for someone to see a bunch of postgres processes each eating up 50 or
more megs of ram, and panic and think that they're running out of
memory, when, in fact, 44 meg for each of those processes is shared, and
the real usage per backend is 6 megs or less.

Definitely grab yourself a good unix / linux sysadmin guide. The "in a
nutshell" books from O'Reilley (sp?) are a good starting point.

--
Kathy Lo

--
Kathy Lo

Mike Rylander

mrylander@gmail.com

over 20 years ago

In reply to: Kathy Lo (#5)

Re: Memory Leakage Problem

On 12/8/05, Kathy Lo <kathy.lo.ky@gmail.com> wrote:
[snip]

When the user complains the system becomes very slow, I use top to
view the memory statistics.
In top, I cannot find any processes that use so many memory. I just
found that all the memory was used up and the Swap memory nearly used
up.

Not to add fuel to the fire, but I'm seeing something similar to this
on my 4xOpteron with 32GB of RAM running Pg 8.1RC1 on Linux (kernel
2.6.12). I don't see this happening on a similar box with 16GB of RAM
running Pg 8.0.3. This is a lightly used box (until it goes into
production), so it's not "out of memory", but the memory usage is
climbing without any obvious culprit. To cut to the chase, here are
some numbers for everyone to digest:

total gnu ps resident size
# ps ax -o rss|perl -e '$x += $_ for (<>);print "$x\n";'
5810492

total gnu ps virual size
# ps ax -o vsz|perl -e '$x += $_ for (<>);print "$x\n";'
10585400

total gnu ps "if all pages were dirtied and swapped" size
# ps ax -o size|perl -e '$x += $_ for (<>);print "$x\n";'
1970952

ipcs -m
# ipcs -m

------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x0052e2c1 1802240 postgres 600 176054272 26

(that's the entire ipcs -m output)

and the odd man out, free
# free
total used free shared buffers cached
Mem: 32752268 22498448 10253820 0 329776 8289360
-/+ buffers/cache: 13879312 18872956
Swap: 31248712 136 31248576

I guess dstat is getting it's info from the same source as free, because:

# dstat -m 1
------memory-usage-----
_used _buff _cach _free
13G 322M 8095M 9.8G

Now, I'm not blaming Pg for the apparent discrepancy in calculated vs.
reported-by-free memory usage, but I only noticed this after upgrading
to 8.1. I'll collect any more info that anyone would like to see,
just let me know.

If anyone has any ideas on what is actually happening here I'd love to
hear them!

--
Mike Rylander
mrylander@gmail.com
GPLS -- PINES Development
Database Developer
http://open-ils.org

Tom Lane

tgl@sss.pgh.pa.us

over 20 years ago

In reply to: Mike Rylander (#6)

Re: Memory Leakage Problem

Mike Rylander <mrylander@gmail.com> writes:

To cut to the chase, here are
some numbers for everyone to digest:
total gnu ps resident size
# ps ax -o rss|perl -e '$x += $_ for (<>);print "$x\n";'
5810492
total gnu ps virual size
# ps ax -o vsz|perl -e '$x += $_ for (<>);print "$x\n";'
10585400
total gnu ps "if all pages were dirtied and swapped" size
# ps ax -o size|perl -e '$x += $_ for (<>);print "$x\n";'
1970952

I wouldn't put any faith in those numbers at all, because you'll be
counting the PG shared memory multiple times.

On the Linux versions I've used lately, ps and top report a process'
memory size as including all its private memory, plus all the pages
of shared memory that it has touched since it started. So if you run
say a seqscan over a large table in a freshly-started backend, the
reported memory usage will ramp up from a couple meg to the size of
your shared_buffer arena plus a couple meg --- but in reality the
space used by the process is staying constant at a couple meg.

Now, multiply that effect by N backends doing this at once, and you'll
have a very skewed view of what's happening in your system.

I'd trust the totals reported by free and dstat a lot more than summing
per-process numbers from ps or top.

Now, I'm not blaming Pg for the apparent discrepancy in calculated vs.
reported-by-free memory usage, but I only noticed this after upgrading
to 8.1.

I don't know of any reason to think that 8.1 would act differently from
older PG versions in this respect.

regards, tom lane

Mike Rylander

mrylander@gmail.com

over 20 years ago

In reply to: Tom Lane (#7)

Re: Memory Leakage Problem

On 12/8/05, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Mike Rylander <mrylander@gmail.com> writes:

To cut to the chase, here are
some numbers for everyone to digest:
total gnu ps resident size
# ps ax -o rss|perl -e '$x += $_ for (<>);print "$x\n";'
5810492
total gnu ps virual size
# ps ax -o vsz|perl -e '$x += $_ for (<>);print "$x\n";'
10585400
total gnu ps "if all pages were dirtied and swapped" size
# ps ax -o size|perl -e '$x += $_ for (<>);print "$x\n";'
1970952

I wouldn't put any faith in those numbers at all, because you'll be
counting the PG shared memory multiple times.

On the Linux versions I've used lately, ps and top report a process'
memory size as including all its private memory, plus all the pages
of shared memory that it has touched since it started. So if you run
say a seqscan over a large table in a freshly-started backend, the
reported memory usage will ramp up from a couple meg to the size of
your shared_buffer arena plus a couple meg --- but in reality the
space used by the process is staying constant at a couple meg.

Right, I can definitely see that happening. Some backends are upwards
of 200M, some are just a few since they haven't been touched yet.

Now, multiply that effect by N backends doing this at once, and you'll
have a very skewed view of what's happening in your system.

Absolutely ...

I'd trust the totals reported by free and dstat a lot more than summing
per-process numbers from ps or top.

And there's the part that's confusing me: the numbers for used memory
produced by free and dstat, after subtracting the buffers/cache
amounts, are /larger/ than those that ps and top report. (top says the
same thing as ps, on the whole.)

Now, I'm not blaming Pg for the apparent discrepancy in calculated vs.
reported-by-free memory usage, but I only noticed this after upgrading
to 8.1.

I don't know of any reason to think that 8.1 would act differently from
older PG versions in this respect.

Neither can I, which is why I don't blame it. ;) I'm just reporting
when/where I noticed the issue.

regards, tom lane

--
Mike Rylander
mrylander@gmail.com
GPLS -- PINES Development
Database Developer
http://open-ils.org

Will Glynn

wglynn@freedomhealthcare.org

over 20 years ago

In reply to: Mike Rylander (#8)

Re: Memory Leakage Problem

Mike Rylander wrote:

Right, I can definitely see that happening. Some backends are upwards
of 200M, some are just a few since they haven't been touched yet.

Now, multiply that effect by N backends doing this at once, and you'll
have a very skewed view of what's happening in your system.

Absolutely ...

I'd trust the totals reported by free and dstat a lot more than summing
per-process numbers from ps or top.

And there's the part that's confusing me: the numbers for used memory
produced by free and dstat, after subtracting the buffers/cache
amounts, are /larger/ than those that ps and top report. (top says the
same thing as ps, on the whole.)

I'm seeing the same thing on one of our 8.1 servers. Summing RSS from
`ps` or RES from `top` accounts for about 1 GB, but `free` says:

total used free shared buffers cached
Mem: 4060968 3870328 190640 0 14788 432048
-/+ buffers/cache: 3423492 637476
Swap: 2097144 175680 1921464

That's 3.4 GB/170 MB in RAM/swap, up from 2.7 GB/0 last Thursday, 2.2
GB/0 last Monday, or 1.9 GB after a reboot ten days ago. Stopping
Postgres brings down the number, but not all the way -- it drops to
about 2.7 GB, even though the next most memory-intensive process is
`ntpd` at 5 MB. (Before Postgres starts, there's less than 30 MB of
stuff running.) The only way I've found to get this box back to normal
is to reboot it.

Now, I'm not blaming Pg for the apparent discrepancy in calculated vs.
reported-by-free memory usage, but I only noticed this after upgrading
to 8.1.

I don't know of any reason to think that 8.1 would act differently from
older PG versions in this respect.

Neither can I, which is why I don't blame it. ;) I'm just reporting
when/where I noticed the issue.

I can't offer any explanation for why this server is starting to swap --
where'd the memory go? -- but I know it started after upgrading to
PostgreSQL 8.1. I'm not saying it's something in the PostgreSQL code,
but this server definitely didn't do this in the months under 7.4.

Mike: is your system AMD64, by any chance? The above system is, as is
another similar story I heard.

--Will Glynn
Freedom Healthcare

#10

Mike Rylander

mrylander@gmail.com

over 20 years ago

In reply to: Will Glynn (#9)

Re: Memory Leakage Problem

On 12/12/05, Will Glynn <wglynn@freedomhealthcare.org> wrote:

Mike Rylander wrote:

Right, I can definitely see that happening. Some backends are upwards
of 200M, some are just a few since they haven't been touched yet.

Now, multiply that effect by N backends doing this at once, and you'll
have a very skewed view of what's happening in your system.

Absolutely ...

I'd trust the totals reported by free and dstat a lot more than summing
per-process numbers from ps or top.

And there's the part that's confusing me: the numbers for used memory
produced by free and dstat, after subtracting the buffers/cache
amounts, are /larger/ than those that ps and top report. (top says the
same thing as ps, on the whole.)

I'm seeing the same thing on one of our 8.1 servers. Summing RSS from
`ps` or RES from `top` accounts for about 1 GB, but `free` says:

total used free shared buffers cached
Mem: 4060968 3870328 190640 0 14788 432048
-/+ buffers/cache: 3423492 637476
Swap: 2097144 175680 1921464

That's 3.4 GB/170 MB in RAM/swap, up from 2.7 GB/0 last Thursday, 2.2
GB/0 last Monday, or 1.9 GB after a reboot ten days ago. Stopping
Postgres brings down the number, but not all the way -- it drops to
about 2.7 GB, even though the next most memory-intensive process is
`ntpd` at 5 MB. (Before Postgres starts, there's less than 30 MB of
stuff running.) The only way I've found to get this box back to normal
is to reboot it.

Now, I'm not blaming Pg for the apparent discrepancy in calculated vs.
reported-by-free memory usage, but I only noticed this after upgrading
to 8.1.

I don't know of any reason to think that 8.1 would act differently from
older PG versions in this respect.

Neither can I, which is why I don't blame it. ;) I'm just reporting
when/where I noticed the issue.

I can't offer any explanation for why this server is starting to swap --
where'd the memory go? -- but I know it started after upgrading to
PostgreSQL 8.1. I'm not saying it's something in the PostgreSQL code,
but this server definitely didn't do this in the months under 7.4.

Mike: is your system AMD64, by any chance? The above system is, as is
another similar story I heard.

It sure is. Gentoo with kernel version 2.6.12, built for x86_64.
Looks like we have a contender for the common factor. :)

--Will Glynn
Freedom Healthcare

--
Mike Rylander
mrylander@gmail.com
GPLS -- PINES Development
Database Developer
http://open-ils.org

#11

Tom Lane

tgl@sss.pgh.pa.us

over 20 years ago

In reply to: Mike Rylander (#10)

Re: Memory Leakage Problem

Mike Rylander <mrylander@gmail.com> writes:

On 12/12/05, Will Glynn <wglynn@freedomhealthcare.org> wrote:

Mike: is your system AMD64, by any chance? The above system is, as is
another similar story I heard.

It sure is. Gentoo with kernel version 2.6.12, built for x86_64.
Looks like we have a contender for the common factor. :)

Please tell me you're *not* running a production database on Gentoo.

regards, tom lane

#12

Joshua D. Drake

jd@commandprompt.com

over 20 years ago

In reply to: Tom Lane (#11)

Re: Memory Leakage Problem

It sure is. Gentoo with kernel version 2.6.12, built for x86_64.
Looks like we have a contender for the common factor. :)

Please tell me you're *not* running a production database on Gentoo.

regards, tom lane

You don't even want to know how many companies I know that are doing
this very thing and no, it was not my suggestion.

Joshua D. Drake

Show quoted text

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

#13

John Sidney-Woollett

johnsw@wardbrook.com

over 20 years ago

In reply to: Will Glynn (#9)

Re: Memory Leakage Problem

We're seeing memory problems on one of our postgres databases. We're
using 7.4.6, and I suspect the kernel version is a key factor with this
problem.

One running under Redhat Linux 2.4.18-14smp #1 SMP and the other Debian
Linux 2.6.8.1-4-686-smp #1 SMP

The second Debian server is a replicated slave using Slony.

We NEVER see any problems on the "older" Redhat (our master) DB, whereas
the Debian slave database requires slony and postgres to be stopped
every 2-3 weeks.

This server just consumes more and more memory until it goes swap crazy
and the load averages start jumping through the roof.

Stopping the two services restores the server to some sort of normality
- the load averages drop dramatically and remain low. But the memory is
only fully recovered by a server reboot.

Over time memory gets used up, until you get to the point where those
services require another stop and start.

Just my 2 cents...

John

Will Glynn wrote:

Show quoted text

Mike Rylander wrote:

Right, I can definitely see that happening. Some backends are upwards
of 200M, some are just a few since they haven't been touched yet.

Now, multiply that effect by N backends doing this at once, and you'll
have a very skewed view of what's happening in your system.

Absolutely ...

I'd trust the totals reported by free and dstat a lot more than summing
per-process numbers from ps or top.

And there's the part that's confusing me: the numbers for used memory
produced by free and dstat, after subtracting the buffers/cache
amounts, are /larger/ than those that ps and top report. (top says the
same thing as ps, on the whole.)

I'm seeing the same thing on one of our 8.1 servers. Summing RSS from
`ps` or RES from `top` accounts for about 1 GB, but `free` says:

total used free shared buffers cached
Mem: 4060968 3870328 190640 0 14788 432048
-/+ buffers/cache: 3423492 637476
Swap: 2097144 175680 1921464

That's 3.4 GB/170 MB in RAM/swap, up from 2.7 GB/0 last Thursday, 2.2
GB/0 last Monday, or 1.9 GB after a reboot ten days ago. Stopping
Postgres brings down the number, but not all the way -- it drops to
about 2.7 GB, even though the next most memory-intensive process is
`ntpd` at 5 MB. (Before Postgres starts, there's less than 30 MB of
stuff running.) The only way I've found to get this box back to normal
is to reboot it.

Now, I'm not blaming Pg for the apparent discrepancy in calculated vs.
reported-by-free memory usage, but I only noticed this after upgrading
to 8.1.

I don't know of any reason to think that 8.1 would act differently from
older PG versions in this respect.

Neither can I, which is why I don't blame it. ;) I'm just reporting
when/where I noticed the issue.

I can't offer any explanation for why this server is starting to swap --
where'd the memory go? -- but I know it started after upgrading to
PostgreSQL 8.1. I'm not saying it's something in the PostgreSQL code,
but this server definitely didn't do this in the months under 7.4.

Mike: is your system AMD64, by any chance? The above system is, as is
another similar story I heard.

--Will Glynn
Freedom Healthcare

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

#14

Tom Lane

tgl@sss.pgh.pa.us

over 20 years ago

In reply to: John Sidney-Woollett (#13)

Re: Memory Leakage Problem

John Sidney-Woollett <johnsw@wardbrook.com> writes:

This server just consumes more and more memory until it goes swap crazy
and the load averages start jumping through the roof.

*What* is consuming memory, exactly --- which processes?

regards, tom lane

#15

John Sidney-Woollett

johnsw@wardbrook.com

over 20 years ago

In reply to: Tom Lane (#14)

Re: Memory Leakage Problem

Sorry but I don't know how to determine that.

We stopped and started postgres yesterday so the server is behaving well
at the moment.

top shows

top - 07:51:48 up 34 days, 6 min, 1 user, load average: 0.00, 0.02, 0.00
Tasks: 85 total, 1 running, 84 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.6% us, 0.2% sy, 0.0% ni, 99.1% id, 0.2% wa, 0.0% hi, 0.0% si
Mem: 1035612k total, 1030380k used, 5232k free, 48256k buffers
Swap: 497972k total, 122388k used, 375584k free, 32716k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
27852 postgres 16 0 17020 11m 14m S 1.0 1.2 18:00.34 postmaster
27821 postgres 15 0 16236 6120 14m S 0.3 0.6 1:30.68 postmaster
4367 root 16 0 2040 1036 1820 R 0.3 0.1 0:00.05 top
1 root 16 0 1492 148 1340 S 0.0 0.0 0:04.75 init
2 root RT 0 0 0 0 S 0.0 0.0 0:02.00 migration/0
3 root 34 19 0 0 0 S 0.0 0.0 0:00.01 ksoftirqd/0
4 root RT 0 0 0 0 S 0.0 0.0 0:04.78 migration/1
5 root 34 19 0 0 0 S 0.0 0.0 0:00.04 ksoftirqd/1
6 root RT 0 0 0 0 S 0.0 0.0 0:04.58 migration/2
7 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/2
8 root RT 0 0 0 0 S 0.0 0.0 0:21.28 migration/3
9 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/3
10 root 5 -10 0 0 0 S 0.0 0.0 0:00.14 events/0
11 root 5 -10 0 0 0 S 0.0 0.0 0:00.04 events/1
12 root 5 -10 0 0 0 S 0.0 0.0 0:00.01 events/2
13 root 5 -10 0 0 0 S 0.0 0.0 0:00.00 events/3
14 root 8 -10 0 0 0 S 0.0 0.0 0:00.00 khelper

This server only has postgres and slon running on it. There is also
postfix but it is only used to relay emails from the root account to
another server - it isn't really doing anything (I hope).

ps shows

UID PID PPID C STIME TIME CMD
root 1 0 0 Nov09 00:00:04 init [2]
root 2 1 0 Nov09 00:00:02 [migration/0]
root 3 1 0 Nov09 00:00:00 [ksoftirqd/0]
root 4 1 0 Nov09 00:00:04 [migration/1]
root 5 1 0 Nov09 00:00:00 [ksoftirqd/1]
root 6 1 0 Nov09 00:00:04 [migration/2]
root 7 1 0 Nov09 00:00:00 [ksoftirqd/2]
root 8 1 0 Nov09 00:00:21 [migration/3]
root 9 1 0 Nov09 00:00:00 [ksoftirqd/3]
root 10 1 0 Nov09 00:00:00 [events/0]
root 11 1 0 Nov09 00:00:00 [events/1]
root 12 1 0 Nov09 00:00:00 [events/2]
root 13 1 0 Nov09 00:00:00 [events/3]
root 14 11 0 Nov09 00:00:00 [khelper]
root 15 10 0 Nov09 00:00:00 [kacpid]
root 67 11 0 Nov09 00:17:10 [kblockd/0]
root 68 10 0 Nov09 00:00:52 [kblockd/1]
root 69 11 0 Nov09 00:00:07 [kblockd/2]
root 70 10 0 Nov09 00:00:09 [kblockd/3]
root 82 1 1 Nov09 09:08:14 [kswapd0]
root 83 11 0 Nov09 00:00:00 [aio/0]
root 84 10 0 Nov09 00:00:00 [aio/1]
root 85 11 0 Nov09 00:00:00 [aio/2]
root 86 10 0 Nov09 00:00:00 [aio/3]
root 222 1 0 Nov09 00:00:00 [kseriod]
root 245 1 0 Nov09 00:00:00 [scsi_eh_0]
root 278 1 0 Nov09 00:00:37 [kjournald]
root 359 1 0 Nov09 00:00:00 udevd
root 1226 1 0 Nov09 00:00:00 [kjournald]
root 1229 10 0 Nov09 00:00:16 [reiserfs/0]
root 1230 11 0 Nov09 00:00:08 [reiserfs/1]
root 1231 10 0 Nov09 00:00:00 [reiserfs/2]
root 1232 11 0 Nov09 00:00:00 [reiserfs/3]
root 1233 1 0 Nov09 00:00:00 [kjournald]
root 1234 1 0 Nov09 00:00:13 [kjournald]
root 1235 1 0 Nov09 00:00:24 [kjournald]
root 1583 1 0 Nov09 00:00:00 [pciehpd_event]
root 1598 1 0 Nov09 00:00:00 [shpchpd_event]
root 1669 1 0 Nov09 00:00:00 [khubd]
daemon 2461 1 0 Nov09 00:00:00 /sbin/portmap
root 2726 1 0 Nov09 00:00:10 /sbin/syslogd
root 2737 1 0 Nov09 00:00:00 /sbin/klogd
message 2768 1 0 Nov09 00:00:00 /usr/bin/dbus-daemon-1 --system
root 2802 1 0 Nov09 00:04:38 [nfsd]
root 2804 1 0 Nov09 00:03:32 [nfsd]
root 2803 1 0 Nov09 00:04:58 [nfsd]
root 2806 1 0 Nov09 00:04:40 [nfsd]
root 2807 1 0 Nov09 00:04:41 [nfsd]
root 2805 1 0 Nov09 00:03:51 [nfsd]
root 2808 1 0 Nov09 00:04:36 [nfsd]
root 2809 1 0 Nov09 00:03:20 [nfsd]
root 2811 1 0 Nov09 00:00:00 [lockd]
root 2812 1 0 Nov09 00:00:00 [rpciod]
root 2815 1 0 Nov09 00:00:00 /usr/sbin/rpc.mountd
root 2933 1 0 Nov09 00:00:17 /usr/lib/postfix/master
postfix 2938 2933 0 Nov09 00:00:11 qmgr -l -t fifo -u -c
root 2951 1 0 Nov09 00:00:09 /usr/sbin/sshd
root 2968 1 0 Nov09 00:00:00 /sbin/rpc.statd
root 2969 1 0 Nov09 00:01:41 /usr/sbin/xinetd -pidfile /var/r
root 2980 1 0 Nov09 00:00:07 /usr/sbin/ntpd -p /var/run/ntpd.
root 2991 1 0 Nov09 00:00:01 /sbin/mdadm -F -m root -s
daemon 3002 1 0 Nov09 00:00:00 /usr/sbin/atd
root 3013 1 0 Nov09 00:00:03 /usr/sbin/cron
root 3029 1 0 Nov09 00:00:00 /sbin/getty 38400 tty1
root 3031 1 0 Nov09 00:00:00 /sbin/getty 38400 tty2
root 3032 1 0 Nov09 00:00:00 /sbin/getty 38400 tty3
root 3033 1 0 Nov09 00:00:00 /sbin/getty 38400 tty4
root 3034 1 0 Nov09 00:00:00 /sbin/getty 38400 tty5
root 3035 1 0 Nov09 00:00:00 /sbin/getty 38400 tty6
postgres 27806 1 0 Dec12 00:00:00 /usr/local/pgsql/bin/postmaster
postgres 27809 27806 0 Dec12 00:00:00 postgres: stats buffer process
postgres 27810 27809 0 Dec12 00:00:00 postgres: stats collector proces
postgres 27821 27806 0 Dec12 00:01:30 postgres: postgres bp_live
postgres 27842 1 0 Dec12 00:00:00 /usr/local/pgsql/bin/slon -d 1 b
postgres 27844 27842 0 Dec12 00:00:00 /usr/local/pgsql/bin/slon -d 1 b
postgres 27847 27806 0 Dec12 00:00:50 postgres: postgres bp_live
postgres 27852 27806 1 Dec12 00:18:00 postgres: postgres bp_live
postgres 27853 27806 0 Dec12 00:00:33 postgres: postgres bp_live
postgres 27854 27806 0 Dec12 00:00:18 postgres: postgres bp_live
root 32735 10 0 05:35 00:00:00 [pdflush]
postfix 2894 2933 0 07:04 00:00:00 pickup -l -t fifo -u -c
root 3853 10 0 07:37 00:00:00 [pdflush]

All I know is that stopping postgres brings the server back to
normality. Stopping slon on its own is not enough.

John

Tom Lane wrote:

Show quoted text

John Sidney-Woollett <johnsw@wardbrook.com> writes:

This server just consumes more and more memory until it goes swap crazy
and the load averages start jumping through the roof.

*What* is consuming memory, exactly --- which processes?

regards, tom lane

#16

Jim Nasby

Jim.Nasby@BlueTreble.com

over 20 years ago

In reply to: Joshua D. Drake (#12)

Re: Memory Leakage Problem

On Mon, Dec 12, 2005 at 08:31:52PM -0800, Joshua D. Drake wrote:

It sure is. Gentoo with kernel version 2.6.12, built for x86_64.
Looks like we have a contender for the common factor. :)

Please tell me you're *not* running a production database on Gentoo.

regards, tom lane

You don't even want to know how many companies I know that are doing
this very thing and no, it was not my suggestion.

"Like the annoying teenager next door with a 90hp import sporting a 6
foot tall bolt-on wing, Gentoo users are proof that society is best
served by roving gangs of armed vigilantes, dishing out swift, cold
justice with baseball bats..."
http://funroll-loops.org/

#17

Tom Lane

tgl@sss.pgh.pa.us

over 20 years ago

In reply to: John Sidney-Woollett (#15)

Re: Memory Leakage Problem

John Sidney-Woollett <johnsw@wardbrook.com> writes:

Tom Lane wrote:

*What* is consuming memory, exactly --- which processes?

Sorry but I don't know how to determine that.

Try "ps auxw", or some other incantation if you prefer, so long as it
includes some statistics about process memory use. What you showed us
is certainly not helpful.

regards, tom lane

#18

John Sidney-Woollett

johnsw@wardbrook.com

over 20 years ago

In reply to: Tom Lane (#17)

Re: Memory Leakage Problem

Tom Lane said:

John Sidney-Woollett <johnsw@wardbrook.com> writes:

Tom Lane wrote:

*What* is consuming memory, exactly --- which processes?

Sorry but I don't know how to determine that.

Try "ps auxw", or some other incantation if you prefer, so long as it
includes some statistics about process memory use. What you showed us
is certainly not helpful.

At the moment not one process's VSZ is over 16Mb with the exception of one
of the slon processes which is at 66Mb.

I'll run this over the next few days and especially as the server starts
bogging down to see if it identifies the culprit.

Is it possible to grab memory outsize of a processes space? Or would a
leak always show up by an ever increasing VSZ amount?

Thanks

John

#19

Scott Marlowe

smarlowe@g2switchworks.com

over 20 years ago

In reply to: Tom Lane (#17)

Re: Memory Leakage Problem

On Tue, 2005-12-13 at 09:13, Tom Lane wrote:

John Sidney-Woollett <johnsw@wardbrook.com> writes:

Tom Lane wrote:

*What* is consuming memory, exactly --- which processes?

Sorry but I don't know how to determine that.

Try "ps auxw", or some other incantation if you prefer, so long as it
includes some statistics about process memory use. What you showed us
is certainly not helpful.

Or run top and hit M while it's running, and it'll sort according to
what uses the most memory.

#20

Tom Lane

tgl@sss.pgh.pa.us

over 20 years ago

In reply to: John Sidney-Woollett (#18)

Re: Memory Leakage Problem

"John Sidney-Woollett" <johnsw@wardbrook.com> writes: