BUG #16264: Server closed the connection unexpectedly

Started by PG Bug reporting formabout 6 years ago4 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 16264
Logged by: Robin Duquette
Email address: research@pyxidr.com
PostgreSQL version: 12.2
Operating system: macOS version 10.15.3
Description:

I just installed postgresql 12.2 (from 12.1) and I get many "server closed
the connection unexpectedly" when executing queries in pgAdmin or using
psycopg2 (in Python). I didn't get this issue with 12.1.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: PG Bug reporting form (#1)
Re: BUG #16264: Server closed the connection unexpectedly

PG Bug reporting form <noreply@postgresql.org> writes:

I just installed postgresql 12.2 (from 12.1) and I get many "server closed
the connection unexpectedly" when executing queries in pgAdmin or using
psycopg2 (in Python). I didn't get this issue with 12.1.

This might well be a bug, but there's nothing we can do about it with
no details. There's some advice about submitting useful bug reports
here:

https://wiki.postgresql.org/wiki/Guide_to_reporting_problems

regards, tom lane

#3Robin Duquette
robin.duquette@pyxidr.com
In reply to: Tom Lane (#2)
Re: BUG #16264: Server closed the connection unexpectedly

Thanks Tom for your prompt response and my apologies for the lack of
details (indeed, I didn't give you much to help you!). These are what I can
tell you, let me know if you have any questions.

1. I'm running macOS 10.15.3
2. Everything was working fine with PostgreSQL version 12.1 and since
I have installed 12.2 on two machines (with identical os), many queries
induce server process to get terminated by signal 9 (according to log).
This is the case on the two machines that I'm running 12.2 Unfortunately,
the log doesn't say more than that (see below an extract).
3. I have used the installer certified by EnterpriseDB to
install PostgreSQL
4. I didn't make any changes to the postgresql.conf file
5. The issue shows up with pgAdmin 4 and psycopg2 (Python 3.7).

This an extract of the log file:

*2020-02-19 18:24:57.449 EST [175] LOG: server process (PID 3032) was
terminated by signal 9: Killed: 9*
*2020-02-19 18:24:57.449 EST [175] DETAIL: Failed process was running: set
timezone to 'Europe/Paris';*
* select*
* extract(year from fd.datetime_starting) as year,*
* rfd.market,*
* rfd.fundamental_subtype as prod_type,*
* sum(fd.value) as installed_capacity*
* from*
* edb.tbl_ref_fundamental_data rfd,*
* cwe_uk.tbl_hist_datetime_fundamental_data fd*
* where*
* rfd.id <http://rfd.id&gt; = fd.data_id and*
* rfd.value_type = 'Actual' and*
* rfd.fundamental_type = 'Installed capacity' and*
* rfd.commodity = 'Power' and*
* rfd.market in ('AT', 'BE', 'FR', 'DE', 'NL', 'CH', 'UK')
and*
* rfd.term = 'Annual' and*
* fd.datetime_starting::date >= '2000-01-01' and*
* fd.datetime_starting::date <= '2020-02-18'*
* group by*
* extract(year from fd.datetime_starting),*
* rfd.market,*
* rfd.fundamental_subtype*
* order by*
* extract(year from fd.datetime_starting),*
* rfd.market,*
* rfd.fundamental_subtype;*
*2020-02-19 18:24:57.449 EST [175] LOG: terminating any other active
server processes*
*2020-02-19 18:24:57.449 EST [2986] WARNING: terminating connection
because of crash of another server process*
*2020-02-19 18:24:57.449 EST [2986] DETAIL: The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.*

Best regards,
[image: Pyxidr]

ROBIN DUQUETTE
CEO
E: robin.duquette@pyxidr.com
C: +1 (514) 262-0418 <+1+(514)+262-0418>
[image: Pyxidr Website] <http://pyxidr.com/&gt; [image: Pyxidr Telegram]
<http://t.me/Pyxidr&gt; [image: Robin Duquette LinkedIn]
<https://www.linkedin.com/in/robinduquette/&gt; [image: Robin Duquette Medium]
<https://medium.com/@robinduquette&gt;

On Wed, Feb 19, 2020 at 11:26 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

PG Bug reporting form <noreply@postgresql.org> writes:

I just installed postgresql 12.2 (from 12.1) and I get many "server

closed

the connection unexpectedly" when executing queries in pgAdmin or using
psycopg2 (in Python). I didn't get this issue with 12.1.

This might well be a bug, but there's nothing we can do about it with
no details. There's some advice about submitting useful bug reports
here:

https://wiki.postgresql.org/wiki/Guide_to_reporting_problems

regards, tom lane

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robin Duquette (#3)
Re: BUG #16264: Server closed the connection unexpectedly

Robin Duquette <robin.duquette@pyxidr.com> writes:

1. I'm running macOS 10.15.3
2. Everything was working fine with PostgreSQL version 12.1 and since
I have installed 12.2 on two machines (with identical os), many queries
induce server process to get terminated by signal 9 (according to log).
This is the case on the two machines that I'm running 12.2 Unfortunately,
the log doesn't say more than that (see below an extract).

Huh, interesting. Signal 9 (SIGKILL) is an externally-imposed process
termination, rather than an internal failure. We've seen one similar
report recently:

/messages/by-id/CEF2C288-13E6-4727-81D0-0775F40F313B@arcict.com

and as mentioned there, the most likely theory is that the backend process
is consuming an unreasonable amount of memory and the SIGKILL is coming
from a system-level out-of-memory defense mechanism. I hadn't thought
that macOS did that, but it looks like I'm finding out differently.

That does not get us a whole lot closer to identifying the cause, though.
It's certainly believable that we introduced some kind of memory leak
between 12.1 and 12.2, but that's not enough info to find it.

First things first though. Can you watch the system with "top" or
Activity Monitor and confirm or disprove that there's a memory
consumption issue before the SIGKILL? We ought to be sure about
that before we go spending a lot of time.

If that does seem to be the case, launching the postmaster under a
restrictive ulimit (maybe "ulimit -v 1000000" or so) could be a
second step. That ought to help reduce the problem from a SIGKILL
to a normal out-of-memory error, which not only would make things
a bit more stable for you, but it should allow the failing query
to dump a memory map to the postmaster's stderr, which would give
us a little more to go on about where the leak is.

In the end, though, I'm afraid we might have to ask you to produce
a reproducible test case of a query that consumes excessive memory.
These things can be very hard to identify without digging into it
with a debugger.

regards, tom lane