AW: Re: BUG #18471: Possible JIT memory leak resulting in signal 11:Segmentation fault on ARM

Started by Nonamealmost 2 years ago2 messagesbugs
Jump to latest
#1Noname
joachim.haecker-becker@arcor.de

<html><head></head><body><div style="font-family: arial,helvetica,sans-serif; font-size: 12px;"><title></title>Hi Dmitry,<br><br>thanks for looking into this.&nbsp;<br><br>Maybe it is a combination of JIT and some other postgres config changes we have in our environment?<br>I will try to reproduce with a blank config and only change the JIT settings.<br><br>This is where the source is &lt;&gt; default:<br>&nbsp;<table border="0" cellpadding="0" cellspacing="0" width="438"><tbody><tr height="19"><td height="19" width="258">name</td><td width="116">setting</td><td width="64">unit</td></tr><tr height="19"><td height="19">autovacuum_analyze_scale_factor</td><td align="right">0.03</td><td>&nbsp;</td></tr><tr height="19"><td height="19">autovacuum_max_workers</td><td align="right">6</td><td>&nbsp;</td></tr><tr height="19"><td height="19">autovacuum_naptime</td><td align="right">300</td><td>s</td></tr><tr height="19"><td height="19">autovacuum_vacuum_insert_scale_factor</td><td align="right">0.05</td><td>&nbsp;</td></tr><tr height="19"><td height="19">autovacuum_vacuum_scale_factor</td><td align="right">0.03</td><td>&nbsp;</td></tr><tr height="19"><td height="19">autovacuum_vacuum_threshold</td><td align="right">1000</td><td>&nbsp;</td></tr><tr height="19"><td height="19">client_connection_check_interval</td><td align="right">30000</td><td>ms</td></tr><tr height="19"><td height="19">default_text_search_config</td><td>pg_catalog.english</td><td>&nbsp;</td></tr><tr height="19"><td height="19">dynamic_shared_memory_type</td><td>posix</td><td>&nbsp;</td></tr><tr height="19"><td height="19">effective_cache_size</td><td align="right">1048576</td><td>8kB</td></tr><tr height="19"><td height="19">enable_partitionwise_aggregate</td><td>on</td><td>&nbsp;</td></tr><tr height="19"><td height="19">enable_partitionwise_join</td><td>on</td><td>&nbsp;</td></tr><tr height="19"><td height="19">hash_mem_multiplier</td><td align="right">1.5</td><td>&nbsp;</td></tr><tr height="19"><td height="19">jit</td><td>on</td><td>&nbsp;</td></tr><tr height="19"><td height="19">jit_above_cost</td><td align="right">1</td><td>&nbsp;</td></tr><tr height="19"><td height="19">jit_inline_above_cost</td><td align="right">1</td><td>&nbsp;</td></tr><tr height="19"><td height="19">jit_optimize_above_cost</td><td align="right">1</td><td>&nbsp;</td></tr><tr height="19"><td height="19">listen_addresses</td><td>*</td><td>&nbsp;</td></tr><tr height="19"><td height="19">log_destination</td><td>jsonlog</td><td>&nbsp;</td></tr><tr height="19"><td height="19">log_file_mode</td><td align="right">640</td><td>&nbsp;</td></tr><tr height="19"><td height="19">log_lock_waits</td><td>on</td><td>&nbsp;</td></tr><tr height="19"><td height="19">log_rotation_size</td><td align="right">102400</td><td>kB</td></tr><tr height="19"><td height="19">log_timezone</td><td>Etc/UTC</td><td>&nbsp;</td></tr><tr height="19"><td height="19">logging_collector</td><td>on</td><td>&nbsp;</td></tr><tr height="19"><td height="19">maintenance_work_mem</td><td align="right">1048576</td><td>kB</td></tr><tr height="19"><td height="19">max_connections</td><td align="right">150</td><td>&nbsp;</td></tr><tr height="19"><td height="19">max_locks_per_transaction</td><td align="right">1024</td><td>&nbsp;</td></tr><tr height="19"><td height="19">max_parallel_workers</td><td align="right">8</td><td>&nbsp;</td></tr><tr height="19"><td height="19">max_parallel_workers_per_gather</td><td align="right">2</td><td>&nbsp;</td></tr><tr height="19"><td height="19">max_wal_size</td><td align="right">2048</td><td>MB</td></tr><tr height="19"><td height="19">min_wal_size</td><td align="right">80</td><td>MB</td></tr><tr height="19"><td height="19">random_page_cost</td><td align="right">1</td><td>&nbsp;</td></tr><tr height="19"><td height="19">shared_buffers</td><td align="right">786432</td><td>8kB</td></tr><tr height="19"><td height="19">TimeZone</td><td>Etc/UTC</td><td>&nbsp;</td></tr><tr height="19"><td height="19">work_mem</td><td align="right">512000</td><td>kB</td></tr></tbody></table><br><br>The docker container has a 6gb shm_size.<br><br>Let me know if there is anything else I can provide to get this resolved.<br><br>&nbsp;<div class="replyHeader" style="line-height:20px;padding:5px;border-top:1px solid #dfdfdf;"><b style="width:80px;display:inline-block;font-size:95%;">Von:</b> Dmitry Dolgov &lt;9erthalion6@gmail.com&gt;<br><b style="width:80px;display:inline-block;font-size:95%;">Gesendet:</b> 21.05.2024 18:08<br><b style="width:80px;display:inline-block;font-size:95%;">An:</b> &lt;joachim.haecker-becker@arcor.de&gt;,&lt;pgsql-bugs@lists.postgresql.org&gt;<br><b style="width:80px;display:inline-block;font-size:95%;">Betreff:</b> Re: BUG #18471: Possible JIT memory leak resulting in signal 11:Segmentation fault on ARM</div>&nbsp;<div>&gt; On Fri, May 17, 2024 at 01:13:06PM +0000, PG Bug reporting form wrote:<br>&gt; The following bug has been logged on the website:<br>&gt;<br>&gt; Bug reference: 18471<br>&gt; Logged by: Joachim Haecker-Becker<br>&gt; Email address: <a href="mailto:joachim.haecker-becker@arcor.de">joachim.haecker-becker@arcor.de</a><br>&gt; PostgreSQL version: 16.3<br>&gt; Operating system: Debian Bookworm<br>&gt; Description:<br>&gt;<br>&gt; We have a reproducible way to force a postgres process to consume more and<br>&gt; more RAM until it crashes on ARM.<br>&gt; The same works on X86 without any issue.<br>&gt; With jit=off it runs on ARM as well.<br>&gt;<br>&gt; We run into this situation in a real-life database situation with a lot of<br>&gt; joins and aggregate functions.<br>&gt; The following code is just a mock to reproduce a similar situation without<br>&gt; needing access to our real data.<br>&gt; This issue blocks us from upgrading or ARM-hosted databases into something<br>&gt; newer than 14.7.<br><br>I think it would be useful to know how much memory difference are we<br>talking about and, just to make everything clear, how exactly postgres<br>crashes (OOM kill I assume)? It's important to differentiate between the<br>case "ARM with jit crashes, ARM without jit doesn't" and "ARM with jit<br>crashes, ARM without jit crashes with even more columns" (the same goes<br>for x86).<br><br>I've tried to reproduce it on an arm64 VM (16.3 build with llvm 17), and<br>although I could observe some difference in memory consumption between<br>JIT on/off, but it wasn't huge (around 10% or so). Running it under<br>valgrind shows only complains about memory allocated for bitcode<br>modules, which is expected -- as far as I recall postgres is somewhat<br>wasteful when it comes to allocating memory for those modules, even more<br>so for parallel workers. This is the case here, where there is growing<br>number of parallel hash workers. This would not explain any difference<br>from x86 of course, but there might be different baseline memory<br>consumption for different architectures.</div></div></body></html>

#2Clemens Eisserer
linuxhippy@gmail.com
In reply to: Noname (#1)
Re: Re: BUG #18471: Possible JIT memory leak resulting in signal 11:Segmentation fault on ARM

Hi Joachim,

We have a reproducible way to force a postgres process to consume more and
more RAM until it crashes on ARM.
The same works on X86 without any issue.
With jit=off it runs on ARM as well.

I ran into a similar problem a few months ago running postgresql on debian
bookworm.
It was discussed here on the list and I've also filed a bug against
debian's postgresql package:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1059476

The issue is caused by a bug in the very old version of llvm (14), debian
links postgresql against (despite newer versions 15/16 are also included in
bookworm).
Despite being based on debian, ubuntu's postgresql links against llvm-15
and there at least my crashes were not reproduceable, despite same
postgresql version + postgresql.conf.

It would be great if you could also leave a comment at the debian bug
report, as they seem rather reluctant to change anything.

Best regards, Clemens