Win32 max connections bug (causing crashes)

Started by Joshua D. Drakeover 19 years ago14 messages
#1Joshua D. Drake
jd@commandprompt.com

Hello,

I had a customer call in today they are running Win2003 with 22 gig of
ram (that may be a mistype on their end, it may be 32gigs of ram).

They cranked up their postgresql max_connections to 500.

When PostgreSQL hits above 400, it dies and I don't mean a slow crawl
type death. A death where all connections close and the database does a
rollback and restart.

I was able to reproduce with a simple pgbench on my own win32 environment.

I wasn't able to go above 300 with mine.

Any thoughts?

Joshua D. Drake

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

#2Joshua D. Drake
jd@commandprompt.com
In reply to: Joshua D. Drake (#1)
Re: Win32 max connections bug (causing crashes)

Joshua D. Drake wrote:

Hello,

I had a customer call in today they are running Win2003 with 22 gig of
ram (that may be a mistype on their end, it may be 32gigs of ram).

They cranked up their postgresql max_connections to 500.

When PostgreSQL hits above 400, it dies and I don't mean a slow crawl
type death. A death where all connections close and the database does a
rollback and restart.

I was able to reproduce with a simple pgbench on my own win32 environment.

I wasn't able to go above 300 with mine.

Further on this with Debug 5:

Client:

DEBUG: name: unnamed; blockState: DEFAULT; state: INPROGR,
xid/subid/cid: 19299/1/0, nestlvl: 1, children: <>
DEBUG: CommitTransaction
DEBUG: name: unnamed; blockState: STARTED; state: INPROGR,
xid/subid/cid: 19299/1/0, nestlvl: 1, children: <>
DEBUG: StartTransactionCommand
DEBUG: StartTransaction
DEBUG: name: unnamed; blockState: DEFAULT; state: INPROGR,
xid/subid/cid: 19300/1/0, nestlvl: 1, children: <>
DEBUG: ProcessUtility
DEBUG: CommitTransactionCommand
DEBUG: CommitTransaction
DEBUG: name: unnamed; blockState: STARTED; state: INPROGR,
xid/subid/cid: 19300/1/0, nestlvl: 1, children: <>
Connection to database 'bench' failed.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
jd@scratch:/usr/local/pgsql/bin$

Server to follow in next message.

Joshua D. Drake

Any thoughts?

Joshua D. Drake

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

#3Joshua D. Drake
jd@commandprompt.com
In reply to: Joshua D. Drake (#2)
Re: Win32 max connections bug (causing crashes)

Server:

Show quoted text

Event Type: Information
Event Source: PostgreSQL
Event Category: None
Event ID: 0
Date: 8/9/2006
Time: 8:36:52 PM
User: N/A
Computer: DAD
Description:
2006-08-09 20:36:52 DEBUG: InitPostgres

Event Type: Information
Event Source: PostgreSQL
Event Category: None
Event ID: 0
Date: 8/9/2006
Time: 8:36:52 PM
User: N/A
Computer: DAD
Description:
2006-08-09 20:36:52 DEBUG: StartTransaction

Event Type: Information
Event Source: PostgreSQL
Event Category: None
Event ID: 0
Date: 8/9/2006
Time: 8:36:52 PM
User: N/A
Computer: DAD
Description:
2006-08-09 20:36:52 DEBUG: name: unnamed; blockState: DEFAULT; state: INPROGR, xid/subid/cid: 19215/1/0, nestlvl: 1, children: <>

Client:

DEBUG: name: unnamed; blockState: DEFAULT; state: INPROGR,
xid/subid/cid: 19299/1/0, nestlvl: 1, children: <>
DEBUG: CommitTransaction
DEBUG: name: unnamed; blockState: STARTED; state: INPROGR,
xid/subid/cid: 19299/1/0, nestlvl: 1, children: <>
DEBUG: StartTransactionCommand
DEBUG: StartTransaction
DEBUG: name: unnamed; blockState: DEFAULT; state: INPROGR,
xid/subid/cid: 19300/1/0, nestlvl: 1, children: <>
DEBUG: ProcessUtility
DEBUG: CommitTransactionCommand
DEBUG: CommitTransaction
DEBUG: name: unnamed; blockState: STARTED; state: INPROGR,
xid/subid/cid: 19300/1/0, nestlvl: 1, children: <>
Connection to database 'bench' failed.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
jd@scratch:/usr/local/pgsql/bin$

Server to follow in next message.

Joshua D. Drake

Any thoughts?

Joshua D. Drake

#4Merlin Moncure
mmoncure@gmail.com
In reply to: Joshua D. Drake (#3)
Re: Win32 max connections bug (causing crashes)

what version postgresql?

merlin

#5Merlin Moncure
mmoncure@gmail.com
In reply to: Merlin Moncure (#4)
Re: Win32 max connections bug (causing crashes)

I confirmed the problem on a fairly recent 8.2devel

merlin

Show quoted text

On 8/10/06, Merlin Moncure <mmoncure@gmail.com> wrote:

what version postgresql?

merlin

#6William ZHANG
uniware@zedware.org
In reply to: Joshua D. Drake (#1)
Re: Win32 max connections bug (causing crashes)

Maybe this article can help:

Windows and the ClearCase process limit: Understanding the desktop heap
http://www-128.ibm.com/developerworks/rational/library/05/1220_marechal/

""Merlin Moncure"" mmoncure@gmail.com

Show quoted text

I confirmed the problem on a fairly recent 8.2devel

merlin

On 8/10/06, Merlin Moncure <mmoncure@gmail.com> wrote:

what version postgresql?

merlin

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: William ZHANG (#6)
Re: Win32 max connections bug (causing crashes)

"William ZHANG" <uniware@zedware.org> writes:

Maybe this article can help:
Windows and the ClearCase process limit: Understanding the desktop heap
http://www-128.ibm.com/developerworks/rational/library/05/1220_marechal/

So the short answer is "get a real operating system"?

I'm not sure I believe that article though, since it claims that the
default maximum number of noninteractive processes is only 79.
I thought from what was said upthread that we could get up to a couple
hundred before seeing a problem.

regards, tom lane

#8Merlin Moncure
mmoncure@gmail.com
In reply to: William ZHANG (#6)
Re: Win32 max connections bug (causing crashes)

On 8/10/06, William ZHANG <uniware@zedware.org> wrote:

Maybe this article can help:

Windows and the ClearCase process limit: Understanding the desktop heap
http://www-128.ibm.com/developerworks/rational/library/05/1220_marechal/

i doubled all my heap settings and was able to roughly double the -c
on pgbench from ~158 (stock) to ~330 (modified). so this is
definately the problem.

windows. meh :)
merlin

#9Merlin Moncure
mmoncure@gmail.com
In reply to: Tom Lane (#7)
Re: Win32 max connections bug (causing crashes)

On 8/10/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"William ZHANG" <uniware@zedware.org> writes:

Maybe this article can help:
Windows and the ClearCase process limit: Understanding the desktop heap
http://www-128.ibm.com/developerworks/rational/library/05/1220_marechal/

So the short answer is "get a real operating system"?

changing a registry setting is not terrible in and of itself, akin to
manually manipluating procfs, but the behavior is in a failure
condition is. other than that, no comment. personally all my servers
are running mixture of gentoo and centos and i'm moving my desktop to
mac os x.

I'm not sure I believe that article though, since it claims that the
default maximum number of noninteractive processes is only 79.
I thought from what was said upthread that we could get up to a couple
hundred before seeing a problem.

that would depend on various factors, especially exactly how many
resources the ibm server software ate up for each connection. pg seems
to be leaner and meaner fwiw. anyways, i confirmed the fix.

merlin

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Merlin Moncure (#9)
Re: Win32 max connections bug (causing crashes)

"Merlin Moncure" <mmoncure@gmail.com> writes:

On 8/10/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:

So the short answer is "get a real operating system"?

changing a registry setting is not terrible in and of itself, akin to
manually manipluating procfs, but the behavior is in a failure
condition is. other than that, no comment.

Right. Nothing wrong with having an upper limit on how many processes
you can run, but reaching the limit should result in "fork failed"
(or local equivalent), not crashes.

Actually ... have any of the win32 hackers tested our win32 code path
that's equivalent to Unix fork failure? Maybe this is just a
garden-variety bug in our own code.

regards, tom lane

#11Joshua D. Drake
jd@commandprompt.com
In reply to: Merlin Moncure (#4)
Re: Win32 max connections bug (causing crashes)

Merlin Moncure wrote:

what version postgresql?

8.1.4

merlin

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

#12Magnus Hagander
mha@sollentuna.net
In reply to: Joshua D. Drake (#1)
Re: Win32 max connections bug (causing crashes)

Hello,

I had a customer call in today they are running Win2003 with 22 gig
of ram (that may be a mistype on their end, it may be 32gigs of
ram).

They cranked up their postgresql max_connections to 500.

When PostgreSQL hits above 400, it dies and I don't mean a slow
crawl type death. A death where all connections close and the
database does a rollback and restart.

I was able to reproduce with a simple pgbench on my own win32
environment.

I wasn't able to go above 300 with mine.

Any thoughts?

A followup question - does this happen both when the server is started
as a service and when it's started manually? Any difference in when it
dies?

//Magnus

#13Magnus Hagander
mha@sollentuna.net
In reply to: Merlin Moncure (#8)
Re: Win32 max connections bug (causing crashes)

Maybe this article can help:

Windows and the ClearCase process limit: Understanding the

desktop

heap
http://www-

128.ibm.com/developerworks/rational/library/05/1220_marecha

l/

i doubled all my heap settings and was able to roughly double the -
c
on pgbench from ~158 (stock) to ~330 (modified). so this is
definately the problem.

If you try decreasing max_files_per_process to a significantly lower
value (say, try 100 instead of 1000), does the number of processes you
can run change noticeably?

(I don't have a box around ATM that I can try to reproduce on. Will try
to set up a VM for it soon.)

//Magnus

#14Merlin Moncure
mmoncure@gmail.com
In reply to: Magnus Hagander (#13)
Re: Win32 max connections bug (causing crashes)

On 8/18/06, Magnus Hagander <mha@sollentuna.net> wrote:

i doubled all my heap settings and was able to roughly double the -
c
on pgbench from ~158 (stock) to ~330 (modified). so this is
definately the problem.

If you try decreasing max_files_per_process to a significantly lower
value (say, try 100 instead of 1000), does the number of processes you
can run change noticeably?

(I don't have a box around ATM that I can try to reproduce on. Will try
to set up a VM for it soon.)

per Magnus's request, I set my machine to 25 max_files (the minimum)
and saw no appreciable gain in the number of connections requred to
make it crash (I tested at 400). The first time I ran it I almost
hosed my machine...it was doing all kinds of irrational beeping and
all the windows were flickering and blinking. It did not do this
following the max_file reduction, although I have no desire to run
this test again on my development box ;)

merlin