BUG #18599: server closed the connection unexpectedly

Started by PG Bug reporting formover 1 year ago8 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 18599
Logged by: Karim Chaid
Email address: kchaid@hotmail.com
PostgreSQL version: 16.4
Operating system: rhel 8.9
Description:

I am getting the following message

server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

When I run the following query.

scdpdb-# select
scdpdb-# count(*)
scdpdb-# from scdpq01la_raw.GLTRANS glt
scdpdb-# inner join scdpq01la_raw.GLCONTROL glc on
glc.company=glt.company
scdpdb-# and glc.fiscal_year=glt.fiscal_year
scdpdb-# and glc.acct_period=glt.acct_period
scdpdb-# and glc.r_system=glt.r_system
scdpdb-# and glc.je_type=glt.je_type
scdpdb-# and glc.control_group=glt.control_group
scdpdb-# and glc.je_sequence=glt.je_sequence
scdpdb-# inner join scdpq01la_raw.GLSYSTEM gls on
gls.company=glt.to_company
scdpdb-# left outer join scdpq01la_raw.GLNAMES gln on
gln.company=gls.company and gln.acct_unit=glt.acct_unit
scdpdb-# left outer join scdpq01la_raw.GLMASTER glm on
glm.company=glt.company and glm.account=glt.account
scdpdb-# and glm.sub_account=glt.sub_account
scdpdb-# and glm.acct_unit=glt.acct_unit
scdpdb-# left outer join scdpq01la_raw.GLCHARTDTL gdt on
gdt.chart_name=glm.chart_name
scdpdb-# and gdt.account=glm.account
scdpdb-# and gdt.sub_account=glm.sub_account
scdpdb-# left outer join scdpq01la_raw.GLCHARTSUM gcs on
gcs.chart_name=gdt.chart_name
scdpdb-# and gcs.sumry_acct_id=gdt.sumry_acct_id
scdpdb-# left join scdpq01la_raw.L_HGLC hglc on hglc.l_index=glc.l_index
and hglc.type='1'
scdpdb-# and 1=0
scdpdb-# ;

I have a dev environment and I can share config and reproduce it at will.

Regards

#2David Rowley
dgrowleyml@gmail.com
In reply to: PG Bug reporting form (#1)
Re: BUG #18599: server closed the connection unexpectedly

On Wed, 4 Sept 2024 at 11:58, PG Bug reporting form
<noreply@postgresql.org> wrote:

I am getting the following message

server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

When I run the following query.

Thanks for the report. The query text is not really enough to go on
unless it also comes with the schema and a way to recreate the crash.

I have a dev environment and I can share config and reproduce it at will.

Ideally, if you could use that to get a stack trace include that here.
Alternatively, an even better option would be a self-contained series
of SQL statements that we can run and recreate the issue.

David

#3Thomas Munro
thomas.munro@gmail.com
In reply to: David Rowley (#2)
Re: BUG #18599: server closed the connection unexpectedly

On Wed, Sep 4, 2024 at 5:05 PM David Rowley <dgrowleyml@gmail.com> wrote:

Ideally, if you could use that to get a stack trace include that here.
Alternatively, an even better option would be a self-contained series
of SQL statements that we can run and recreate the issue.

If this is an ARM CPU (like AWS Graviton), and if setting "jit=off"
fixes it, then it could be the known LLVM relocation issue[1]/messages/by-id/CAO6_Xqr63qj=Sx7HY6ZiiQ6R_JbX+-p6sTPwDYwTWZjUmjsYBg@mail.gmail.com, for
which we have a candidate solution pending. I mention this wild guess
because we're seeing a lot of these reports and it'd be much easier to
check that than figure out stack traces etc.

[1]: /messages/by-id/CAO6_Xqr63qj=Sx7HY6ZiiQ6R_JbX+-p6sTPwDYwTWZjUmjsYBg@mail.gmail.com

#4Karim Chaid
kchaid@hotmail.com
In reply to: Thomas Munro (#3)
Re: BUG #18599: server closed the connection unexpectedly

This is a VM based on IBM hardware.
I can dig into the HW side if needed.
I will check on jit setting and see if the issue can be resolved.
As for the stack trace, there is no coredump and the search i did, look ile the pg_backtrace may be the way to go. I have downloaded the sourxe code from GitHub but looking for the make command options for rhel8.9 and any dependencies.
Regards

Sent from my iPhone

Show quoted text

On Sep 4, 2024, at 4:10 PM, Thomas Munro <thomas.munro@gmail.com> wrote:

On Wed, Sep 4, 2024 at 5:05 PM David Rowley <dgrowleyml@gmail.com> wrote:

Ideally, if you could use that to get a stack trace include that here.
Alternatively, an even better option would be a self-contained series
of SQL statements that we can run and recreate the issue.

If this is an ARM CPU (like AWS Graviton), and if setting "jit=off"
fixes it, then it could be the known LLVM relocation issue[1], for
which we have a candidate solution pending. I mention this wild guess
because we're seeing a lot of these reports and it'd be much easier to
check that than figure out stack traces etc.

[1] /messages/by-id/CAO6_Xqr63qj=Sx7HY6ZiiQ6R_JbX+-p6sTPwDYwTWZjUmjsYBg@mail.gmail.com

#5Karim Chaid
kchaid@hotmail.com
In reply to: Karim Chaid (#4)
RE: BUG #18599: server closed the connection unexpectedly
#6Karim Chaid
kchaid@hotmail.com
In reply to: Karim Chaid (#4)
Re: BUG #18599: server closed the connection unexpectedly

One quick test recommended for me is to convert the columnar tables to heap.
Upon doing it, the query worked. It was hung in the first attempt but in the second attempt worked fine.

I would to get this resolved for columnar setup.

Regards

Sent from my iPhone

Show quoted text

On Sep 5, 2024, at 9:31 PM, Karim Chaid <kchaid@hotmail.com> wrote:

This is a VM based on IBM hardware.
I can dig into the HW side if needed.
I will check on jit setting and see if the issue can be resolved.
As for the stack trace, there is no coredump and the search i did, look ile the pg_backtrace may be the way to go. I have downloaded the sourxe code from GitHub but looking for the make command options for rhel8.9 and any dependencies.
Regards

Sent from my iPhone

On Sep 4, 2024, at 4:10 PM, Thomas Munro <thomas.munro@gmail.com> wrote:

On Wed, Sep 4, 2024 at 5:05 PM David Rowley <dgrowleyml@gmail.com> wrote:
Ideally, if you could use that to get a stack trace include that here.
Alternatively, an even better option would be a self-contained series
of SQL statements that we can run and recreate the issue.

If this is an ARM CPU (like AWS Graviton), and if setting "jit=off"
fixes it, then it could be the known LLVM relocation issue[1], for
which we have a candidate solution pending. I mention this wild guess
because we're seeing a lot of these reports and it'd be much easier to
check that than figure out stack traces etc.

[1] /messages/by-id/CAO6_Xqr63qj=Sx7HY6ZiiQ6R_JbX+-p6sTPwDYwTWZjUmjsYBg@mail.gmail.com

#7Magnus Hagander
magnus@hagander.net
In reply to: Karim Chaid (#6)
Re: BUG #18599: server closed the connection unexpectedly

On Sat, Sep 7, 2024, 00:18 Karim Chaid <kchaid@hotmail.com> wrote:

One quick test recommended for me is to convert the columnar tables to
heap.
Upon doing it, the query worked. It was hung in the first attempt but in
the second attempt worked fine.

I would to get this resolved for columnar setup.

There's no such thing as columnar in PostgreSQL. You must be using either a
fork or some pretty complex extension. It seems pretty clear that the
problem is in that code and not in PostgreSQL since it goes away when you
switch off it. So you're probablh better off reporting it to them.

/Magnus

Show quoted text
#8Karim Chaid
kchaid@hotmail.com
In reply to: Magnus Hagander (#7)
Re: BUG #18599: server closed the connection unexpectedly

Thank you all in helping with this issue.
Regards
________________________________
From: Magnus Hagander <magnus@hagander.net>
Sent: Saturday, September 7, 2024 1:22 AM
To: Karim Chaid <kchaid@hotmail.com>
Cc: Thomas Munro <thomas.munro@gmail.com>; David Rowley <dgrowleyml@gmail.com>; PostgreSQL mailing lists <pgsql-bugs@lists.postgresql.org>
Subject: Re: BUG #18599: server closed the connection unexpectedly

On Sat, Sep 7, 2024, 00:18 Karim Chaid <kchaid@hotmail.com<mailto:kchaid@hotmail.com>> wrote:
One quick test recommended for me is to convert the columnar tables to heap.
Upon doing it, the query worked. It was hung in the first attempt but in the second attempt worked fine.

I would to get this resolved for columnar setup.

There's no such thing as columnar in PostgreSQL. You must be using either a fork or some pretty complex extension. It seems pretty clear that the problem is in that code and not in PostgreSQL since it goes away when you switch off it. So you're probablh better off reporting it to them.

/Magnus