backend dies suddenly after a lot of error messages

Started by Mirko Kaffkaover 26 years ago12 messages

mirko@interface-business.de

over 26 years ago

Hi,

We have problems with backend processes that close the channel because of
palloc() failures. When an INSERT statement fails, the backend reports an
error (e.g. `Cannot insert a duplicate key into a unique index') and
allocates a few bytes more memory. The next SQL statement that fails
causes the backend to allocate more memory again, etc. until we have no
more virtual memory left. Is this a bug?
We are using postgres 6.4.2 on FreeBSD 2.2.8.

It also works with psql:

toy=> create table mytable (i integer unique);
NOTICE: CREATE TABLE/UNIQUE will create implicit index mytable_i_key for
table mytable
CREATE
toy=> \q

~ $ # now do a lot of inserts that cause error messages:
~ $ while true; do echo "INSERT INTO mytable VALUES (1);"; done | psql toy
INSERT INTO mytable VALUES (1);
ERROR: Cannot insert a duplicate key into a unique index
...quite a lot of these messages
INSERT INTO mytable VALUES (1);
ERROR: Cannot insert a duplicate key into a unique index
INSERT INTO mytable VALUES (1);

pqReadData() -- backend closed the channel unexpectedly.
This probably means the backend terminated abnormally before or
while processing the request.
We have lost the connection to the backend, so further processing is
impossible. Terminating.

Hmm, why does the backend allocate more and more memory with each failed
INSERT ?
Any clues?

Thanks,
Mirko

Thomas Reinke

reinke@e-softinc.com

over 26 years ago

In reply to: Mirko Kaffka (#1)

Re: [GENERAL] backend dies suddenly after a lot of error messages

A bug report on this was filled out against the 6.3 release as well.
Don't know the status of it, however :(

Mirko Kaffka wrote:

Hi,

We have problems with backend processes that close the channel because of
palloc() failures. When an INSERT statement fails, the backend reports an
error (e.g. `Cannot insert a duplicate key into a unique index') and
allocates a few bytes more memory. The next SQL statement that fails
causes the backend to allocate more memory again, etc. until we have no
more virtual memory left. Is this a bug?
We are using postgres 6.4.2 on FreeBSD 2.2.8.

It also works with psql:

toy=> create table mytable (i integer unique);
NOTICE: CREATE TABLE/UNIQUE will create implicit index mytable_i_key for
table mytable
CREATE
toy=> \q

~ $ # now do a lot of inserts that cause error messages:
~ $ while true; do echo "INSERT INTO mytable VALUES (1);"; done | psql toy
INSERT INTO mytable VALUES (1);
ERROR: Cannot insert a duplicate key into a unique index
...quite a lot of these messages
INSERT INTO mytable VALUES (1);
ERROR: Cannot insert a duplicate key into a unique index
INSERT INTO mytable VALUES (1);

pqReadData() -- backend closed the channel unexpectedly.
This probably means the backend terminated abnormally before or
while processing the request.
We have lost the connection to the backend, so further processing is
impossible. Terminating.

Hmm, why does the backend allocate more and more memory with each failed
INSERT ?
Any clues?

Thanks,
Mirko

--
------------------------------------------------------------
Thomas Reinke Tel: (416) 460-7021
Director of Technology Fax: (416) 598-2319
E-Soft Inc. http://www.e-softinc.com

Tom Lane

tgl@sss.pgh.pa.us

over 26 years ago

In reply to: Thomas Reinke (#2)

Re: [HACKERS] backend dies suddenly after a lot of error messages

Mirko Kaffka <mirko@interface-business.de> writes:

We have problems with backend processes that close the channel because of
palloc() failures. When an INSERT statement fails, the backend reports an
error (e.g. `Cannot insert a duplicate key into a unique index') and
allocates a few bytes more memory. The next SQL statement that fails
causes the backend to allocate more memory again, etc. until we have no
more virtual memory left. Is this a bug?

Yeah, I'd say so --- all the memory used should get freed at transaction
end, but evidently it isn't happening.

We are using postgres 6.4.2 on FreeBSD 2.2.8.

I still see it with 6.5-current sources. Will take a look.

regards, tom lane

Import Notes

Reply to msg id not found: YourmessageofWed12May1999124829+0200Pine.GSO.3.96.990512124217.12545A-100000@taifun.interface-business.de | Resolved by subject fallback

Noname

jwieck@debis.com

over 26 years ago

In reply to: Tom Lane (#3)

Re: [HACKERS] backend dies suddenly after a lot of error messages

Mirko Kaffka <mirko@interface-business.de> writes:

We have problems with backend processes that close the channel because of
palloc() failures. When an INSERT statement fails, the backend reports an
error (e.g. `Cannot insert a duplicate key into a unique index') and
allocates a few bytes more memory. The next SQL statement that fails
causes the backend to allocate more memory again, etc. until we have no
more virtual memory left. Is this a bug?

Yeah, I'd say so --- all the memory used should get freed at transaction
end, but evidently it isn't happening.

We are using postgres 6.4.2 on FreeBSD 2.2.8.

I still see it with 6.5-current sources. Will take a look.

I remember to have taken some but haven't found all the
places. I think there's still something in tcop where the
querytree list is malloc()'d.

Jan

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#======================================== jwieck@debis.com (Jan Wieck) #

Tom Lane

tgl@sss.pgh.pa.us

over 26 years ago

In reply to: Noname (#4)

Re: [HACKERS] backend dies suddenly after a lot of error messages

jwieck@debis.com (Jan Wieck) writes:

Yeah, I'd say so --- all the memory used should get freed at transaction
end, but evidently it isn't happening.

I remember to have taken some but haven't found all the
places. I think there's still something in tcop where the
querytree list is malloc()'d.

I saw that yesterday --- for no particularly good reason, postgres.c
wants to deal with the query list as an array rather than a list;
it goes to great lengths to convert the lists it's given into an array,
which it has to be able to resize, etc etc. I was thinking of ripping
all that out and just using a palloc'd list. At the time I didn't have
any justification for it except code beautification, which isn't a good
enough reason to be changing code late in beta... but a memory leak
is...

However, the leakage being complained of seems to be several kilobytes
per failed command, which is much more than that one malloc usage can
be blamed for. Any other thoughts? I was wondering if maybe a whole
palloc context somewhere is getting lost; not sure where to look though.
One thing I did find was that leakage occurs very early. You can feed
the system commands that will fail in parsing, like say
garbage;
and the memory usage still rises with each one.

regards, tom lane

Import Notes

Reply to msg id not found: YourmessageofWed12May1999183416+0200m10hbxY-000EBeC@orion.SAPserv.Hamburg.dsh.de | Resolved by subject fallback

Tom Lane

tgl@sss.pgh.pa.us

over 26 years ago

In reply to: Tom Lane (#5)

Re: [HACKERS] backend dies suddenly after a lot of error messages

Yeah, I'd say so --- all the memory used should get freed at transaction
end, but evidently it isn't happening.

I still see it with 6.5-current sources. Will take a look.

Ah-ha, I think I see it: AtCommit_Memory releases memory in the blank
portal (by doing EndPortalAllocMode()). AtAbort_Memory forgets to do so.
Will commit this fix momentarily.

I remember to have taken some but haven't found all the
places. I think there's still something in tcop where the
querytree list is malloc()'d.

That is a relatively minor leak, compared to leaking *all* memory
allocated in the failed transaction, which is what it was doing until
now :-(. But I think I will fix it anyway ... the code is awfully
ugly, and it is still a leak.

regards, tom lane

Import Notes

Reply to msg id not found: YourmessageofWed12May1999183416+0200m10hbxY-000EBeC@orion.SAPserv.Hamburg.dsh.de | Resolved by subject fallback

Don Baccus

dhogaza@pacifier.com

over 26 years ago

In reply to: Tom Lane (#6)

Re: [HACKERS] backend dies suddenly after a lot of error messages

At 08:33 PM 5/12/99 -0400, Tom Lane wrote:

That is a relatively minor leak, compared to leaking *all* memory
allocated in the failed transaction, which is what it was doing until
now :-(. But I think I will fix it anyway ... the code is awfully
ugly, and it is still a leak.

I'm a lurker, a compiler writer who has just begun using
Postgres as the database engine behind a bird population
tracking project I'm putting up on the web on my own
time, on a linux box running AOLServer and, for now at
least, postgres.

In my researching postgres vs. paying Oracle (which didn't
seem too bad until I learned about their extra fees for
web sites and multiple-CPU boxes) vs. mySql etc, the one
biggest complaint I've run across when talking to people
running web sites backed by Postgres has been that the
back end starts dying after weeks ... days ... hours
depending on the type of site.

On questioning folks, it seemed pretty clear that in
some of these cases significant memory leaking was
causing the system to run out of memory.

And last week I managed to generate long sequences
of SQL that would eat available memory in about
15 minutes. I've been lurking around a couple of
these postgres lists trying to figure out whether
or not it was a known problem before making noise
about it.

So, imagine my pleasure at seeing this short thread
on the problem and, even better, the solution!

Well, if not the (only) leak, at least one very,
very serious memory leak. Just how many kb were
being leaked for each failed transaction?

I think you may've just slammed a stake through the
heart of a very significant bug causing a lot of
people seemingly unexplainable flakey back-end
behavior...this fix alone may do a lot to erase
the impression some have that postgres is not
reliable enough to support any web site based
on a large database with lots of transactions.

- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, and other goodies at
http://donb.photo.net

Tom Lane

tgl@sss.pgh.pa.us

over 26 years ago

In reply to: Don Baccus (#7)

Re: [HACKERS] backend dies suddenly after a lot of error messages

Don Baccus <dhogaza@pacifier.com> writes:

I think you may've just slammed a stake through the
heart of a very significant bug

Thanks for the compliment :-). You might actually be right;
this bug could go a long way towards explaining why some people
find Postgres very reliable and others don't. The first group's
apps don't tend to provoke any SQL errors, and/or don't try to
continue running with the same backend after an error.

And last week I managed to generate long sequences
of SQL that would eat available memory in about
15 minutes. I've been lurking around a couple of
these postgres lists trying to figure out whether
or not it was a known problem before making noise
about it.

We're aware of a number of memory-leak type problems, although
most of them are just temporary leakage situations (the memory
will eventually be freed, if you have enough memory to complete
the transaction...). I'm hoping that we can make a serious dent
in that class of problem for release 6.6.

I believe that all the Postgres developers have a bedrock commitment
to making the system as stable and bulletproof as we can. But it
takes time to root out the subtler bugs. I got lucky tonight ;-)

Well, if not the (only) leak, at least one very,
very serious memory leak. Just how many kb were
being leaked for each failed transaction?

I was measuring about 4K per cycle for a trivial parsing error,
like feeding "garbage;" to the backend repeatedly. It could be
a *lot* more depending on how much work got done before the error
was detected. Worst case you might lose megabytes...

regards, tom lane

Import Notes

Reply to msg id not found: YourmessageofWed12May1999180910-07003.0.1.32.19990512180910.00dd50e4@mail.pacifier.com | Resolved by subject fallback

Don Baccus

dhogaza@pacifier.com

over 26 years ago

In reply to: Tom Lane (#8)

Re: [HACKERS] backend dies suddenly after a lot of error messages

At 09:39 PM 5/12/99 -0400, Tom Lane wrote:

Thanks for the compliment :-). You might actually be right;
this bug could go a long way towards explaining why some people
find Postgres very reliable and others don't. The first group's
apps don't tend to provoke any SQL errors, and/or don't try to
continue running with the same backend after an error.

AOLServer, in particular, will keep a backend alive
forever unless the site goes idle for (typically)
some minutes. In this way, no overhead for backend
start-up is suffered by a busy site. AOLServer manages
the threads associated with particular http connections,
while the (typically) tcl scripts servicing the connections
ask for, use, and release database handles (the tcl
interpreter runs inside the server) . Each handle
is a connection to the db backend, and these connections
get passed around by the server to various threads as
they're released by tcl "ns_db releasehandle" calls.

So ... ANY permament memory leak by the backend will tear things
down eventually. "Randomly", from the sysadmin's point of view.

Don't feel bad, I know of one very busy Oracle site
that kicks things down once every 24 hrs in the
dead of night for fear of cumulative leaks or, well,
any of a number of imaginable db problems :)

We're aware of a number of memory-leak type problems, although
most of them are just temporary leakage situations (the memory
will eventually be freed, if you have enough memory to complete
the transaction...).

Relatively harmless in the environment I'm describing...

I'm hoping that we can make a serious dent
in that class of problem for release 6.6.

Still worth getting rid of, though!

I believe that all the Postgres developers have a bedrock commitment
to making the system as stable and bulletproof as we can.

Yes, I've gathered that in my reading of this group over the
last three days, and in my reading of older posts.

And y'all have fixed that other horrible bug from the
web service POV: table-level locking. Ugh. I'd given up
on using postgres for my project until I learned that 6.5
doesn't suffer from this limitation.

I was measuring about 4K per cycle for a trivial parsing error,
like feeding "garbage;" to the backend repeatedly. It could be
a *lot* more depending on how much work got done before the error
was detected. Worst case you might lose megabytes...

Memory's cheap, but not THAT cheap :)

OK, I'll go back to lurking again. Keep up the good work,
folks.

- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, and other goodies at
http://donb.photo.net

#10

Bruce Momjian

maillist@candle.pha.pa.us

over 26 years ago

In reply to: Tom Lane (#5)

Re: [HACKERS] backend dies suddenly after a lot of error messages

jwieck@debis.com (Jan Wieck) writes:

Yeah, I'd say so --- all the memory used should get freed at transaction
end, but evidently it isn't happening.

I remember to have taken some but haven't found all the
places. I think there's still something in tcop where the
querytree list is malloc()'d.

I saw that yesterday --- for no particularly good reason, postgres.c
wants to deal with the query list as an array rather than a list;
it goes to great lengths to convert the lists it's given into an array,
which it has to be able to resize, etc etc. I was thinking of ripping
all that out and just using a palloc'd list. At the time I didn't have
any justification for it except code beautification, which isn't a good
enough reason to be changing code late in beta... but a memory leak
is...

I also thought the array usage we very strange,��and I could not figure
out why they used it. I figured as I learned more about the backend, I
would understnd their wisdom, but at this point, I think it was just
sloppy code.

However, the leakage being complained of seems to be several kilobytes
per failed command, which is much more than that one malloc usage can
be blamed for. Any other thoughts? I was wondering if maybe a whole
palloc context somewhere is getting lost; not sure where to look though.
One thing I did find was that leakage occurs very early. You can feed
the system commands that will fail in parsing, like say
garbage;
and the memory usage still rises with each one.

Gee, it garbage doesn't get very far into the parser, does it. It never
makes it out of the grammar. It may be the 8k query buffer?

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#11

Bruce Momjian

maillist@candle.pha.pa.us

over 26 years ago

In reply to: Tom Lane (#8)

Re: [HACKERS] backend dies suddenly after a lot of error messages

I was measuring about 4K per cycle for a trivial parsing error,
like feeding "garbage;" to the backend repeatedly. It could be
a *lot* more depending on how much work got done before the error
was detected. Worst case you might lose megabytes...

The strange thing is that we don't usually hear about crash/leaks very
much. We just started hearing about it more in the past week or so.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#12

Bruce Momjian

maillist@candle.pha.pa.us

over 26 years ago

In reply to: Thomas Reinke (#2)

Re: [GENERAL] backend dies suddenly after a lot of error messages

We have problems with backend processes that close the channel because of
palloc() failures. When an INSERT statement fails, the backend reports an
error (e.g. `Cannot insert a duplicate key into a unique index') and
allocates a few bytes more memory. The next SQL statement that fails
causes the backend to allocate more memory again, etc. until we have no
more virtual memory left. Is this a bug?
We are using postgres 6.4.2 on FreeBSD 2.2.8.

It also works with psql:

toy=> create table mytable (i integer unique);
NOTICE: CREATE TABLE/UNIQUE will create implicit index mytable_i_key for
table mytable
CREATE
toy=> \q

~ $ # now do a lot of inserts that cause error messages:
~ $ while true; do echo "INSERT INTO mytable VALUES (1);"; done | psql toy
INSERT INTO mytable VALUES (1);
ERROR: Cannot insert a duplicate key into a unique index
...quite a lot of these messages
INSERT INTO mytable VALUES (1);
ERROR: Cannot insert a duplicate key into a unique index
INSERT INTO mytable VALUES (1);

pqReadData() -- backend closed the channel unexpectedly.
This probably means the backend terminated abnormally before or
while processing the request.
We have lost the connection to the backend, so further processing is
impossible. Terminating.

Hmm, why does the backend allocate more and more memory with each failed
INSERT ?
Any clues?

There was a bug in pre-6.5 versions that caused elog failure not to
release their memory. There is still a small leak for elogs, but it is
only a few bytes. You should find this is fixed in 6.5.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026