Segfault when running postgres inside kubernetes with huge pages

Started by Siegfried Kiermayerover 2 years ago4 messagesbugs

sicaine@gmail.com

over 2 years ago

Hi,

we are using zalando postgres operator and i changed / set huge pages
on kubernetes nodes from something undefined to 1536 (undefined
because i was pretty sure before changing it to 1536 i saw an initial
value of 1024 with 670 in use.

Postgres stoped working after setting it to 1536 and restarting the
node. I was scratching my head why because i did saw huge pages before
and didn't connect it at all.

I found core dumps and this is the output:

Core was generated by `/usr/lib/postgresql/14/bin/postgres -D
/home/postgres/pgdata/pgroot/data --conf'.
Program terminated with signal SIGBUS, Bus error.

warning: Section `.reg-xstate/999' in core file too small.
#0 0x0000558ea5345148 in PGSharedMemoryCreate ()
(gdb) bt
#0 0x0000558ea5345148 in PGSharedMemoryCreate ()
#1 0x0000558ea53c157f in CreateSharedMemoryAndSemaphores ()
#2 0x0000558ea5357240 in PostmasterMain ()
#3 0x0000558ea506777a in main ()

This gave me the first indication that it is related to huge pages
setting on the node itself.

I would go into more detail but honestly I believe this might be easy
to find and I also assume it shouldn't segfault but return an error
message indicating the / a issue.

I'm aware that huge pages and other normal features like swap are not
normal inside kubernetes but fyi in kubernetes 1.28 there will be huge
pages support https://kubernetes.io/docs/tasks/manage-hugepages/scheduling-hugepages/

Thanks,

Sigi

Tom Lane

tgl@sss.pgh.pa.us

over 2 years ago

In reply to: Siegfried Kiermayer (#1)

Re: Segfault when running postgres inside kubernetes with huge pages

Siegfried Kiermayer <sicaine@gmail.com> writes:

we are using zalando postgres operator and i changed / set huge pages
on kubernetes nodes from something undefined to 1536 (undefined
because i was pretty sure before changing it to 1536 i saw an initial
value of 1024 with 670 in use.

This previous discussion might hold the clue:

/messages/by-id/CAFpoUr1ggmGs8qpoKvYxNBO3h-T-n+MNh+JnLRYsYhHurVOiGQ@mail.gmail.com

I would go into more detail but honestly I believe this might be easy
to find and I also assume it shouldn't segfault but return an error
message indicating the / a issue.

There is not that much we can do about operating system bugs.

regards, tom lane

Siegfried Kiermayer

sicaine@gmail.com

over 2 years ago

In reply to: Tom Lane (#2)

Re: Segfault when running postgres inside kubernetes with huge pages

Hi,

we do run kernel 5.8 and the allocation happens basically at start.

I would still expect postgres to fail gracefully at this point?

Is 'throwing an error message' / checking the allocation a performance
issue? is it in a generic hotpath for allocation?

Tx,

Sigi

Show quoted text

On Wed, 8 Nov 2023 at 15:57, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Siegfried Kiermayer <sicaine@gmail.com> writes:

we are using zalando postgres operator and i changed / set huge pages
on kubernetes nodes from something undefined to 1536 (undefined
because i was pretty sure before changing it to 1536 i saw an initial
value of 1024 with 670 in use.

This previous discussion might hold the clue:

/messages/by-id/CAFpoUr1ggmGs8qpoKvYxNBO3h-T-n+MNh+JnLRYsYhHurVOiGQ@mail.gmail.com

I would go into more detail but honestly I believe this might be easy
to find and I also assume it shouldn't segfault but return an error
message indicating the / a issue.

There is not that much we can do about operating system bugs.

regards, tom lane

Andres Freund

andres@anarazel.de

over 2 years ago

In reply to: Siegfried Kiermayer (#3)

Re: Segfault when running postgres inside kubernetes with huge pages

Hi,

On 2023-11-08 16:03:53 +0100, Siegfried Kiermayer wrote:

we do run kernel 5.8 and the allocation happens basically at start.

I would still expect postgres to fail gracefully at this point?

Is 'throwing an error message' / checking the allocation a performance
issue? is it in a generic hotpath for allocation?

It's not like we're ignoring an error and just continuing - we're successfully
allocating the memory. Then the kernel sends SIGBUS when accessing the freshly
allocated memory.

We could try to install a SIGBUS handler and erroring out that way. But doing
that correctly and portably is not exactly trivial.

Greetings,

Andres Freund