Properly handle OOM death?

Started by Israel Brewsterabout 3 years ago22 messagesgeneral

Israel Brewster

ijbrewster@alaska.edu

about 3 years ago

I’m running a postgresql 13 database on an Ubuntu 20.04 VM that is a bit more memory constrained than I would like, such that every week or so the various processes running on the machine will align badly and the OOM killer will kick in, killing off postgresql, as per the following journalctl output:

Mar 12 04:04:23 novarupta systemd[1]: postgresql@13-main.service: A process of this unit has been killed by the OOM killer.
Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Failed with result 'oom-kill'.
Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Consumed 5d 17h 48min 24.509s CPU time.

And the service is no longer running.

When this happens, I go in and restart the postgresql service, and everything is happy again for the next week or two.

Obviously this is not a good situation. Which leads to two questions:

1) is there some tweaking I can do in the postgresql config itself to prevent the situation from occurring in the first place?
2) My first thought was to simply have systemd restart postgresql whenever it is killed like this, which is easy enough. Then I looked at the default unit file, and found these lines:

# prevent OOM killer from choosing the postmaster (individual backends will
# reset the score to 0)
OOMScoreAdjust=-900
# restarting automatically will prevent "pg_ctlcluster ... stop" from working,
# so we disable it here. Also, the postmaster will restart by itself on most
# problems anyway, so it is questionable if one wants to enable external
# automatic restarts.
#Restart=on-failure

Which seems to imply that the OOM killer should only be killing off individual backends, not the entire cluster to begin with - which should be fine. And also that adding the restart=on-failure option is probably not the greatest idea. Which makes me wonder what is really going on?

Thanks.

---
Israel Brewster
Software Engineer
Alaska Volcano Observatory
Geophysical Institute - UAF
2156 Koyukuk Drive
Fairbanks AK 99775-7320
Work: 907-474-5172
cell: 907-328-9145

adrian.klaver@aklaver.com

about 3 years ago

In reply to: Israel Brewster (#1)

Re: Properly handle OOM death?

On 3/13/23 10:21 AM, Israel Brewster wrote:

I’m running a postgresql 13 database on an Ubuntu 20.04 VM that is a bit
more memory constrained than I would like, such that every week or so
the various processes running on the machine will align badly and the
OOM killer will kick in, killing off postgresql, as per the following
journalctl output:

Mar 12 04:04:23 novarupta systemd[1]: postgresql@13-main.service: A
process of this unit has been killed by the OOM killer.
Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Failed
with result 'oom-kill'.
Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service:
Consumed 5d 17h 48min 24.509s CPU time.

And the service is no longer running.

When this happens, I go in and restart the postgresql service, and
everything is happy again for the next week or two.

Obviously this is not a good situation. Which leads to two questions:

1) is there some tweaking I can do in the postgresql config itself to
prevent the situation from occurring in the first place?
2) My first thought was to simply have systemd restart postgresql
whenever it is killed like this, which is easy enough. Then I looked at
the default unit file, and found these lines:

# prevent OOM killer from choosing the postmaster (individual backends will
# reset the score to 0)
OOMScoreAdjust=-900
# restarting automatically will prevent "pg_ctlcluster ... stop" from
working,
# so we disable it here. Also, the postmaster will restart by itself on most
# problems anyway, so it is questionable if one wants to enable external
# automatic restarts.
#Restart=on-failure

Which seems to imply that the OOM killer should only be killing off
individual backends, not the entire cluster to begin with - which should
be fine. And also that adding the restart=on-failure option is probably
not the greatest idea. Which makes me wonder what is really going on?

You might want to read:

https://www.postgresql.org/docs/current/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT

Thanks.

---
Israel Brewster
Software Engineer
Alaska Volcano Observatory
Geophysical Institute - UAF
2156 Koyukuk Drive
Fairbanks AK 99775-7320
Work: 907-474-5172
cell: 907-328-9145

--
Adrian Klaver
adrian.klaver@aklaver.com

Israel Brewster

ijbrewster@alaska.edu

about 3 years ago

In reply to: Adrian Klaver (#2)

Re: Properly handle OOM death?

On Mar 13, 2023, at 9:28 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

On 3/13/23 10:21 AM, Israel Brewster wrote:

I’m running a postgresql 13 database on an Ubuntu 20.04 VM that is a bit more memory constrained than I would like, such that every week or so the various processes running on the machine will align badly and the OOM killer will kick in, killing off postgresql, as per the following journalctl output:
Mar 12 04:04:23 novarupta systemd[1]: postgresql@13-main.service: A process of this unit has been killed by the OOM killer.
Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Failed with result 'oom-kill'.
Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Consumed 5d 17h 48min 24.509s CPU time.
And the service is no longer running.
When this happens, I go in and restart the postgresql service, and everything is happy again for the next week or two.
Obviously this is not a good situation. Which leads to two questions:
1) is there some tweaking I can do in the postgresql config itself to prevent the situation from occurring in the first place?
2) My first thought was to simply have systemd restart postgresql whenever it is killed like this, which is easy enough. Then I looked at the default unit file, and found these lines:
# prevent OOM killer from choosing the postmaster (individual backends will
# reset the score to 0)
OOMScoreAdjust=-900
# restarting automatically will prevent "pg_ctlcluster ... stop" from working,
# so we disable it here. Also, the postmaster will restart by itself on most
# problems anyway, so it is questionable if one wants to enable external
# automatic restarts.
#Restart=on-failure
Which seems to imply that the OOM killer should only be killing off individual backends, not the entire cluster to begin with - which should be fine. And also that adding the restart=on-failure option is probably not the greatest idea. Which makes me wonder what is really going on?

You might want to read:

https://www.postgresql.org/docs/current/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT

Good information, thanks. One thing there confuses me though. It says:

Another approach, which can be used with or without altering vm.overcommit_memory, is to set the process-specific OOM score adjustment value for the postmaster process to -1000, thereby guaranteeing it will not be targeted by the OOM killer

Isn’t that exactly what the "OOMScoreAdjust=-900” line in the Unit file does though (except with a score of -900 rather than -1000)?

---
Israel Brewster
Software Engineer
Alaska Volcano Observatory
Geophysical Institute - UAF
2156 Koyukuk Drive
Fairbanks AK 99775-7320
Work: 907-474-5172
cell: 907-328-9145

Show quoted text

Thanks.
---
Israel Brewster
Software Engineer
Alaska Volcano Observatory
Geophysical Institute - UAF
2156 Koyukuk Drive
Fairbanks AK 99775-7320
Work: 907-474-5172
cell: 907-328-9145

--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>

mail@joeconway.com

about 3 years ago

In reply to: Israel Brewster (#1)

Re: Properly handle OOM death?

On 3/13/23 13:21, Israel Brewster wrote:

I’m running a postgresql 13 database on an Ubuntu 20.04 VM that is a bit
more memory constrained than I would like, such that every week or so
the various processes running on the machine will align badly and the
OOM killer will kick in, killing off postgresql, as per the following
journalctl output:

Mar 12 04:04:23 novarupta systemd[1]: postgresql@13-main.service: A
process of this unit has been killed by the OOM killer.
Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Failed
with result 'oom-kill'.
Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service:
Consumed 5d 17h 48min 24.509s CPU time.

And the service is no longer running.

When this happens, I go in and restart the postgresql service, and
everything is happy again for the next week or two.

Obviously this is not a good situation. Which leads to two questions:

1) is there some tweaking I can do in the postgresql config itself to
prevent the situation from occurring in the first place?
2) My first thought was to simply have systemd restart postgresql
whenever it is killed like this, which is easy enough. Then I looked at
the default unit file, and found these lines:

# prevent OOM killer from choosing the postmaster (individual backends will
# reset the score to 0)
OOMScoreAdjust=-900
# restarting automatically will prevent "pg_ctlcluster ... stop" from
working,
# so we disable it here. Also, the postmaster will restart by itself on most
# problems anyway, so it is questionable if one wants to enable external
# automatic restarts.
#Restart=on-failure

Which seems to imply that the OOM killer should only be killing off
individual backends, not the entire cluster to begin with - which should
be fine. And also that adding the restart=on-failure option is probably
not the greatest idea. Which makes me wonder what is really going on?

First, are you running with a cgroup memory.limit set (e.g. in a container)?

Assuming no, see:

https://www.postgresql.org/docs/current/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT

That will tell you:
1/ Turn off memory overcommit: "Although this setting will not prevent
the OOM killer from being invoked altogether, it will lower the chances
significantly and will therefore lead to more robust system behavior."

2/ set /proc/self/oom_score_adj to -1000 rather than -900
(OOMScoreAdjust=-1000): the value -1000 is important as it is a "magic"
value which prevents the process from being selected by the OOM killer
(see:
https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/oom.h#L6)
whereas -900 just makes it less likely.

All that said, even if the individual backend gets killed, the
postmaster will still go into crash recovery. So while technically
postgres does not restart, the effect is much the same. So see #1 above
as your best protection.

HTH,

Joe

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Israel Brewster

ijbrewster@alaska.edu

about 3 years ago

In reply to: Joe Conway (#4)

Re: Properly handle OOM death?

On Mar 13, 2023, at 9:36 AM, Joe Conway <mail@joeconway.com> wrote:

On 3/13/23 13:21, Israel Brewster wrote:

I’m running a postgresql 13 database on an Ubuntu 20.04 VM that is a bit more memory constrained than I would like, such that every week or so the various processes running on the machine will align badly and the OOM killer will kick in, killing off postgresql, as per the following journalctl output:
Mar 12 04:04:23 novarupta systemd[1]: postgresql@13-main.service: A process of this unit has been killed by the OOM killer.
Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Failed with result 'oom-kill'.
Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Consumed 5d 17h 48min 24.509s CPU time.
And the service is no longer running.
When this happens, I go in and restart the postgresql service, and everything is happy again for the next week or two.
Obviously this is not a good situation. Which leads to two questions:
1) is there some tweaking I can do in the postgresql config itself to prevent the situation from occurring in the first place?
2) My first thought was to simply have systemd restart postgresql whenever it is killed like this, which is easy enough. Then I looked at the default unit file, and found these lines:
# prevent OOM killer from choosing the postmaster (individual backends will
# reset the score to 0)
OOMScoreAdjust=-900
# restarting automatically will prevent "pg_ctlcluster ... stop" from working,
# so we disable it here. Also, the postmaster will restart by itself on most
# problems anyway, so it is questionable if one wants to enable external
# automatic restarts.
#Restart=on-failure
Which seems to imply that the OOM killer should only be killing off individual backends, not the entire cluster to begin with - which should be fine. And also that adding the restart=on-failure option is probably not the greatest idea. Which makes me wonder what is really going on?

First, are you running with a cgroup memory.limit set (e.g. in a container)?

Not sure, actually. I *think* I had it set it up as a full VM though, not a container. I’ll have to double-check that.

Assuming no, see:

https://www.postgresql.org/docs/current/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT

That will tell you:
1/ Turn off memory overcommit: "Although this setting will not prevent the OOM killer from being invoked altogether, it will lower the chances significantly and will therefore lead to more robust system behavior."

2/ set /proc/self/oom_score_adj to -1000 rather than -900 (OOMScoreAdjust=-1000): the value -1000 is important as it is a "magic" value which prevents the process from being selected by the OOM killer (see: https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/oom.h#L6) whereas -900 just makes it less likely.

..and that answers the question I just sent about the above linked page 😄 Thanks!

All that said, even if the individual backend gets killed, the postmaster will still go into crash recovery. So while technically postgres does not restart, the effect is much the same. So see #1 above as your best protection.

Interesting. Makes sense though. Thanks!

---
Israel Brewster
Software Engineer
Alaska Volcano Observatory
Geophysical Institute - UAF
2156 Koyukuk Drive
Fairbanks AK 99775-7320
Work: 907-474-5172
cell: 907-328-9145

Show quoted text

HTH,

Joe

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com <https://aws.amazon.com/>

Peter J. Holzer

hjp-pgsql@hjp.at

about 3 years ago

In reply to: Israel Brewster (#1)

Re: Properly handle OOM death?

On 2023-03-13 09:21:18 -0800, Israel Brewster wrote:

I’m running a postgresql 13 database on an Ubuntu 20.04 VM that is a bit more
memory constrained than I would like, such that every week or so the various
processes running on the machine will align badly and the OOM killer will kick
in, killing off postgresql, as per the following journalctl output:

Mar 12 04:04:23 novarupta systemd[1]: postgresql@13-main.service: A process of
this unit has been killed by the OOM killer.
Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Failed with
result 'oom-kill'.
Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Consumed 5d
17h 48min 24.509s CPU time.

And the service is no longer running.

I might be misreading this, but it looks to me that systemd detects that
*some* process in the group was killed by the oom killer and stops the
service.

Can you check which process was actually killed? If it's not the
postmaster, setting OOMScoreAdjust is probably useless.

(I tried searching the web for the error messages and didn't find
anything useful)

2) My first thought was to simply have systemd restart postgresql whenever it
is killed like this, which is easy enough. Then I looked at the default unit
file, and found these lines:

# prevent OOM killer from choosing the postmaster (individual backends will
# reset the score to 0)
OOMScoreAdjust=-900
# restarting automatically will prevent "pg_ctlcluster ... stop" from working,
# so we disable it here.

I never call pg_ctlcluster directly, so that probably wouldn't be a good
reason for me.

Also, the postmaster will restart by itself on most
# problems anyway, so it is questionable if one wants to enable external
# automatic restarts.
#Restart=on-failure

So I'd try this despite the comment.

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

Israel Brewster

ijbrewster@alaska.edu

about 3 years ago

In reply to: Peter J. Holzer (#6)

Re: Properly handle OOM death?

On Mar 13, 2023, at 9:43 AM, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:

On 2023-03-13 09:21:18 -0800, Israel Brewster wrote:

I’m running a postgresql 13 database on an Ubuntu 20.04 VM that is a bit more
memory constrained than I would like, such that every week or so the various
processes running on the machine will align badly and the OOM killer will kick
in, killing off postgresql, as per the following journalctl output:

Mar 12 04:04:23 novarupta systemd[1]: postgresql@13-main.service: A process of
this unit has been killed by the OOM killer.
Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Failed with
result 'oom-kill'.
Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Consumed 5d
17h 48min 24.509s CPU time.

And the service is no longer running.

I might be misreading this, but it looks to me that systemd detects that
*some* process in the group was killed by the oom killer and stops the
service.

Can you check which process was actually killed? If it's not the
postmaster, setting OOMScoreAdjust is probably useless.

(I tried searching the web for the error messages and didn't find
anything useful)

Your guess is as good as (if not better than) mine. I can find the PID of the killed process in the system log, but without knowing what the PID of postmaster and the child processes were prior to the kill, I’m not sure that helps much. Though for what it’s worth, I do note the following about all the kill logs:

1) They reference a “Memory cgroup out of memory”, which refers back to the opening comment on Joe Conway’s message - this would imply to me that I *AM* running with a cgroup memory.limit set. Not sure how that changes things?
2) All the entries contain the line "oom_score_adj:0”, which would seem to imply that the postmaster, with its -900 score is not being directly targeted by the OOM killer.

2) My first thought was to simply have systemd restart postgresql whenever it
is killed like this, which is easy enough. Then I looked at the default unit
file, and found these lines:

# prevent OOM killer from choosing the postmaster (individual backends will
# reset the score to 0)
OOMScoreAdjust=-900
# restarting automatically will prevent "pg_ctlcluster ... stop" from working,
# so we disable it here.

I never call pg_ctlcluster directly, so that probably wouldn't be a good
reason for me.

Valid point, unless something under-the-hood needs to call it?

---
Israel Brewster
Software Engineer
Alaska Volcano Observatory
Geophysical Institute - UAF
2156 Koyukuk Drive
Fairbanks AK 99775-7320
Work: 907-474-5172
cell: 907-328-9145

Show quoted text

Also, the postmaster will restart by itself on most
# problems anyway, so it is questionable if one wants to enable external
# automatic restarts.
#Restart=on-failure

So I'd try this despite the comment.

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

mail@joeconway.com

about 3 years ago

In reply to: Israel Brewster (#7)

Re: Properly handle OOM death?

On 3/13/23 13:55, Israel Brewster wrote:

1) They reference a “Memory cgroup out of memory”, which refers back
to the opening comment on Joe Conway’s message - this would imply to
me that I *AM* running with a cgroup memory.limit set. Not sure how
that changes things?

cgroup memory limit is enforced regardless of the actual host level
memory pressure. As an example, if your host VM has 128 GB of memory,
but your cgroup memory limit is 512MB, you will get an OOM kill when the
sum memory usage of all of your postgres processes (and anything else
sharing the same cgroup) exceeds 512 MB, even if the host VM has nothing
else going on consuming memory.

You can check if a memory is set by reading the corresponding virtual
file, e.g:

8<-------------------
# cat
/sys/fs/cgroup/memory/system.slice/postgresql.service/memory.limit_in_bytes
9223372036854710272
8<-------------------

A few notes:
1/ The specific path to memory.limit_in_bytes might vary, but this
example is the default for the RHEL 8 postgresql 10 RPM.

2/ The value above, 9223372036854710272 basically means "no limit" has
been set.

3/ The example assumes cgroup v1. There are very few distro's that
enable cgroup v2 by default, and generally I have not seen much cgroup
v2 usage in the wild (although I strongly recommend it), but if you are
using cgroup v2 the names have changed. You can check by doing:

8<--cgroupv2 enabled-----------------
# stat -fc %T /sys/fs/cgroup/
cgroup2fs
8<--cgroupv1 enabled-----------------
# stat -fc %T /sys/fs/cgroup/
tmpfs
8<-------------------

2) All the entries contain the line "oom_score_adj:0”, which would
seem to imply that the postmaster, with its -900 score is not being
directly targeted by the OOM killer.

Sounds correct

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Israel Brewster

ijbrewster@alaska.edu

about 3 years ago

In reply to: Joe Conway (#8)

Re: Properly handle OOM death?

On Mar 13, 2023, at 10:37 AM, Joe Conway <mail@joeconway.com> wrote:

On 3/13/23 13:55, Israel Brewster wrote:

1) They reference a “Memory cgroup out of memory”, which refers back
to the opening comment on Joe Conway’s message - this would imply to
me that I *AM* running with a cgroup memory.limit set. Not sure how
that changes things?

cgroup memory limit is enforced regardless of the actual host level memory pressure. As an example, if your host VM has 128 GB of memory, but your cgroup memory limit is 512MB, you will get an OOM kill when the sum memory usage of all of your postgres processes (and anything else sharing the same cgroup) exceeds 512 MB, even if the host VM has nothing else going on consuming memory.

You can check if a memory is set by reading the corresponding virtual file, e.g:

8<-------------------
# cat /sys/fs/cgroup/memory/system.slice/postgresql.service/memory.limit_in_bytes
9223372036854710272
8<-------------------

A few notes:
1/ The specific path to memory.limit_in_bytes might vary, but this example is the default for the RHEL 8 postgresql 10 RPM.

Not finding that file specifically (this is probably too much info, but…):

root@novarupta:~# ls /sys/fs/cgroup/system.slice/
-.mount cgroup.threads dev-hugepages.mount memory.events.local memory.swap.events proc-diskstats.mount ssh.service system-postgresql.slice systemd-resolved.service
accounts-daemon.service cgroup.type dev-lxc-console.mount memory.high memory.swap.high proc-loadavg.mount sys-devices-system-cpu-online.mount systemd-initctl.socket systemd-sysctl.service
cgroup.controllers console-getty.service dev-lxc-tty1.mount memory.low memory.swap.max proc-meminfo.mount sys-devices-virtual-net.mount systemd-journal-flush.service systemd-sysusers.service
cgroup.events console-setup.service dev-lxc-tty2.mount memory.max networkd-dispatcher.service proc-stat.mount sys-fs-fuse-connections.mount systemd-journald-audit.socket systemd-tmpfiles-setup-dev.service
cgroup.freeze cpu.pressure dev-mqueue.mount memory.min pids.current proc-swaps.mount sys-kernel-debug.mount systemd-journald-dev-log.socket systemd-tmpfiles-setup.service
cgroup.max.depth cpu.stat dev-ptmx.mount memory.numa_stat pids.events proc-sys-kernel-random-boot_id.mount syslog.socket systemd-journald.service systemd-update-utmp.service
cgroup.max.descendants cron.service io.pressure memory.oom.group pids.max proc-sys-net.mount sysstat.service systemd-journald.socket systemd-user-sessions.service
cgroup.procs data.mount keyboard-setup.service memory.pressure pool.mount 'proc-sysrq\x2dtrigger.mount' 'system-container\x2dgetty.slice' systemd-logind.service ufw.service
cgroup.stat dbus.service memory.current memory.stat postfix.service proc-uptime.mount system-modprobe.slice systemd-networkd.service uuidd.socket
cgroup.subtree_control dbus.socket memory.events memory.swap.current proc-cpuinfo.mount rsyslog.service system-postfix.slice systemd-remount-fs.service

root@novarupta:~# ls /sys/fs/cgroup/system.slice/system-postgresql.slice/
cgroup.controllers cgroup.max.depth cgroup.stat cgroup.type io.pressure memory.events.local memory.max memory.oom.group memory.swap.current memory.swap.max pids.max
cgroup.events cgroup.max.descendants cgroup.subtree_control cpu.pressure memory.current memory.high memory.min memory.pressure memory.swap.events pids.current postgresql@13-main.service
cgroup.freeze cgroup.procs cgroup.threads cpu.stat memory.events memory.low memory.numa_stat memory.stat memory.swap.high pids.events

root@novarupta:~# ls /sys/fs/cgroup/system.slice/system-postgresql.slice/postgresql@13-main.service/
cgroup.controllers cgroup.max.depth cgroup.stat cgroup.type io.pressure memory.events.local memory.max memory.oom.group memory.swap.current memory.swap.max pids.max
cgroup.events cgroup.max.descendants cgroup.subtree_control cpu.pressure memory.current memory.high memory.min memory.pressure memory.swap.events pids.current
cgroup.freeze cgroup.procs cgroup.threads cpu.stat memory.events memory.low memory.numa_stat memory.stat memory.swap.high pids.events

2/ The value above, 9223372036854710272 basically means "no limit" has been set.

3/ The example assumes cgroup v1. There are very few distro's that enable cgroup v2 by default, and generally I have not seen much cgroup v2 usage in the wild (although I strongly recommend it), but if you are using cgroup v2 the names have changed. You can check by doing:

8<--cgroupv2 enabled-----------------
# stat -fc %T /sys/fs/cgroup/
cgroup2fs
8<--cgroupv1 enabled-----------------
# stat -fc %T /sys/fs/cgroup/
tmpfs
8<-------------------

Looks like V2:

root@novarupta:~# stat -fc %T /sys/fs/cgroup/
cgroup2fs
root@novarupta:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.3 LTS
Release: 20.04
Codename: focal

---
Israel Brewster
Software Engineer
Alaska Volcano Observatory
Geophysical Institute - UAF
2156 Koyukuk Drive
Fairbanks AK 99775-7320
Work: 907-474-5172
cell: 907-328-9145

Show quoted text

2) All the entries contain the line "oom_score_adj:0”, which would
seem to imply that the postmaster, with its -900 score is not being
directly targeted by the OOM killer.

Sounds correct

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

mail@joeconway.com

about 3 years ago

In reply to: Israel Brewster (#9)

Re: Properly handle OOM death?

On 3/13/23 14:50, Israel Brewster wrote:

Looks like V2:

root@novarupta:~# stat -fc %T /sys/fs/cgroup/
cgroup2fs

Interesting -- it does indeed look like you are using cgroup v2

So the file you want to look at in that case is:
8<-----------
cat
/sys/fs/cgroup/system.slice/system-postgresql.slice/postgresql@14.service/memory.max
4294967296

cat
/sys/fs/cgroup/system.slice/system-postgresql.slice/postgresql@14.service/memory.high
3221225472
8<-----------
If the value comes back as "max" it means no limit is set.

In this example (on my Linux Mint machine with a custom systemd unit
file) I have memory.max set to 4G and memory.high set to 3G.

The value of memory.max determines when the OOM killer will strike. The
value of memory.high will determine when the kernel goes into aggressive
memory reclaim (trying to avoid memory.max and thus an OOM kill).

The corresponding/relevant systemd unit file parameters are:
8<-----------
MemoryAccounting=yes
MemoryHigh=3G
MemoryMax=4G
8<-----------

There are other ways that memory.max may get set, but it seems most
likely that the systemd unit file is doing it (if it is in fact set).

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Israel Brewster

ijbrewster@alaska.edu

about 3 years ago

In reply to: Joe Conway (#10)

Re: Properly handle OOM death?

On Mar 13, 2023, at 11:10 AM, Joe Conway <mail@joeconway.com> wrote:

On 3/13/23 14:50, Israel Brewster wrote:

Looks like V2:
root@novarupta:~# stat -fc %T /sys/fs/cgroup/
cgroup2fs

Interesting -- it does indeed look like you are using cgroup v2

So the file you want to look at in that case is:
8<-----------
cat /sys/fs/cgroup/system.slice/system-postgresql.slice/postgresql@14.service/memory.max
4294967296

cat /sys/fs/cgroup/system.slice/system-postgresql.slice/postgresql@14.service/memory.high
3221225472
8<-----------
If the value comes back as "max" it means no limit is set.

This does, in fact, appear to be the case here:

root@novarupta:~# cat /sys/fs/cgroup/system.slice/system-postgresql.slice/postgresql@13-main.service/memory.max
max
root@novarupta:~# cat /sys/fs/cgroup/system.slice/system-postgresql.slice/postgresql@13-main.service/memory.high
max
root@novarupta:~#

which would presumably indicate that it’s a system level limit being exceeded, rather than a postgresql specific one? The syslog specifically says "Memory cgroup out of memory”, if that means something (this is my first exposure to cgroups, if you couldn’t tell).
---
Israel Brewster
Software Engineer
Alaska Volcano Observatory
Geophysical Institute - UAF
2156 Koyukuk Drive
Fairbanks AK 99775-7320
Work: 907-474-5172
cell: 907-328-9145

Show quoted text

In this example (on my Linux Mint machine with a custom systemd unit file) I have memory.max set to 4G and memory.high set to 3G.

The value of memory.max determines when the OOM killer will strike. The value of memory.high will determine when the kernel goes into aggressive memory reclaim (trying to avoid memory.max and thus an OOM kill).

The corresponding/relevant systemd unit file parameters are:
8<-----------
MemoryAccounting=yes
MemoryHigh=3G
MemoryMax=4G
8<-----------

There are other ways that memory.max may get set, but it seems most likely that the systemd unit file is doing it (if it is in fact set).

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

noloader@gmail.com

about 3 years ago

In reply to: Israel Brewster (#1)

Re: Properly handle OOM death?

On Mon, Mar 13, 2023 at 1:21 PM Israel Brewster <ijbrewster@alaska.edu> wrote:

I’m running a postgresql 13 database on an Ubuntu 20.04 VM that is a bit more memory constrained than I would like, such that every week or so the various processes running on the machine will align badly and the OOM killer will kick in, killing off postgresql, as per the following journalctl output:

Mar 12 04:04:23 novarupta systemd[1]: postgresql@13-main.service: A process of this unit has been killed by the OOM killer.
Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Failed with result 'oom-kill'.
Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Consumed 5d 17h 48min 24.509s CPU time.

And the service is no longer running.

When this happens, I go in and restart the postgresql service, and everything is happy again for the next week or two.

Obviously this is not a good situation. Which leads to two questions:

1) is there some tweaking I can do in the postgresql config itself to prevent the situation from occurring in the first place?
2) My first thought was to simply have systemd restart postgresql whenever it is killed like this, which is easy enough. Then I looked at the default unit file, and found these lines:

# prevent OOM killer from choosing the postmaster (individual backends will
# reset the score to 0)
OOMScoreAdjust=-900
# restarting automatically will prevent "pg_ctlcluster ... stop" from working,
# so we disable it here. Also, the postmaster will restart by itself on most
# problems anyway, so it is questionable if one wants to enable external
# automatic restarts.
#Restart=on-failure

Which seems to imply that the OOM killer should only be killing off individual backends, not the entire cluster to begin with - which should be fine. And also that adding the restart=on-failure option is probably not the greatest idea. Which makes me wonder what is really going on?

Related, we (a FOSS project) used to have a Linux server with a LAMP
stack on GoDaddy. The machine provided a website and wiki. It was very
low-end. I think it had 512MB or 1 GB RAM and no swap file. And no way
to enable a swap file (part of an upsell). We paid about $2 a month
for it.

MySQL was killed several times a week. It corrupted the database on a
regular basis. We had to run the database repair tools daily. We
eventually switched to Ionos for hosting. We got a VM with more memory
and a swap file for about $5 a month. No more OOM kills.

If possible, you might want to add more memory (or a swap file) to the
machine. It will help sidestep the OOM problem.

You can also add vm.overcommit_memory = 2 to stop Linux from
oversubscribing memory. The machine will act like a Solaris box rather
than a Linux box (which takes some getting used to). Also see
https://serverfault.com/questions/606185/how-does-vm-overcommit-memory-work
.

Jeff

mail@joeconway.com

about 3 years ago

In reply to: Israel Brewster (#11)

Re: Properly handle OOM death?

On 3/13/23 15:18, Israel Brewster wrote:

root@novarupta:~# cat /sys/fs/cgroup/system.slice/system-postgresql.slice/postgresql@13-main.service/memory.max
max
root@novarupta:~# cat /sys/fs/cgroup/system.slice/system-postgresql.slice/postgresql@13-main.service/memory.high
max
root@novarupta:~#

which would presumably indicate that it’s a system level limit being
exceeded, rather than a postgresql specific one?

Yep

The syslog specifically says "Memory cgroup out of memory”, if that means
something (this is my first exposure to cgroups, if you couldn’t
tell).

I am not entirely sure, but without actually testing it I suspect that
since memory.max = high (that is, the limit is whatever the host has
available) the OOM kill is technically a cgroup OOM kill even though it
is effectively a host level memory pressure event.

Did you try setting "vm.overcommit_memory=2"?

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Peter J. Holzer

hjp-pgsql@hjp.at

about 3 years ago

In reply to: Israel Brewster (#7)

Re: Properly handle OOM death?

On 2023-03-13 09:55:50 -0800, Israel Brewster wrote:

On Mar 13, 2023, at 9:43 AM, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:

On 2023-03-13 09:21:18 -0800, Israel Brewster wrote:

I’m running a postgresql 13 database on an Ubuntu 20.04 VM that is a bit more
memory constrained than I would like, such that every week or so the various
processes running on the machine will align badly and the OOM killer will kick
in, killing off postgresql, as per the following journalctl output:

Mar 12 04:04:23 novarupta systemd[1]: postgresql@13-main.service: A process of
this unit has been killed by the OOM killer.
Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Failed with
result 'oom-kill'.
Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Consumed 5d
17h 48min 24.509s CPU time.

And the service is no longer running.

I might be misreading this, but it looks to me that systemd detects that
*some* process in the group was killed by the oom killer and stops the
service.

Can you check which process was actually killed? If it's not the
postmaster, setting OOMScoreAdjust is probably useless.

(I tried searching the web for the error messages and didn't find
anything useful)

Your guess is as good as (if not better than) mine. I can find the PID
of the killed process in the system log, but without knowing what the
PID of postmaster and the child processes were prior to the kill, I’m
not sure that helps much.

The syslog should contain a list of all tasks prior to the kill. For
example, I just provoked an OOM kill on my laptop and the syslog
contains (among lots of others) these lines:

Mar 13 21:00:36 trintignant kernel: [112024.084117] [ 2721] 126 2721 54563 2042 163840 555 -900 postgres
Mar 13 21:00:36 trintignant kernel: [112024.084123] [ 2873] 126 2873 18211 85 114688 594 0 postgres
Mar 13 21:00:36 trintignant kernel: [112024.084128] [ 2941] 126 2941 54592 1231 147456 565 0 postgres
Mar 13 21:00:36 trintignant kernel: [112024.084134] [ 2942] 126 2942 54563 535 143360 550 0 postgres
Mar 13 21:00:36 trintignant kernel: [112024.084139] [ 2943] 126 2943 54563 1243 139264 548 0 postgres
Mar 13 21:00:36 trintignant kernel: [112024.084145] [ 2944] 126 2944 54798 561 147456 545 0 postgres
Mar 13 21:00:36 trintignant kernel: [112024.084150] [ 2945] 126 2945 54563 215 131072 551 0 postgres
Mar 13 21:00:36 trintignant kernel: [112024.084156] [ 2956] 126 2956 18718 506 122880 553 0 postgres
Mar 13 21:00:36 trintignant kernel: [112024.084161] [ 2957] 126 2957 54672 269 139264 546 0 postgres

That's less helpful than it could be since all the postgres processes
are just listed as "postgres" without arguments. However, it is very
likely that the first one is actually the postmaster, because it has the
lowest pid (and the other pids follow closely) and it has an OOM score
of -900 as set in the systemd service file.

So I could compare the PID of the killed process with this list (in my
case the killed process wasn't one of them but a test program which just
allocates lots of memory).

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

Israel Brewster

ijbrewster@alaska.edu

about 3 years ago

In reply to: Joe Conway (#13)

Re: Properly handle OOM death?

On Mar 13, 2023, at 11:42 AM, Joe Conway <mail@joeconway.com> wrote:

On 3/13/23 15:18, Israel Brewster wrote:

The syslog specifically says "Memory cgroup out of memory”, if that means
something (this is my first exposure to cgroups, if you couldn’t
tell).

I am not entirely sure, but without actually testing it I suspect that since memory.max = high (that is, the limit is whatever the host has available) the OOM kill is technically a cgroup OOM kill even though it is effectively a host level memory pressure event.

That would make sense.

Did you try setting "vm.overcommit_memory=2"?

Yeah:

root@novarupta:~# sysctl -w vm.overcommit_memory=2
sysctl: setting key "vm.overcommit_memory", ignoring: Read-only file system

I’m thinking I wound up with a container rather than a full VM after all - and as such, the best solution may be to migrate to a full VM with some swap space available to avoid the issue in the first place. I’ll have to get in touch with the sys admin for that though.
---
Israel Brewster
Software Engineer
Alaska Volcano Observatory
Geophysical Institute - UAF
2156 Koyukuk Drive
Fairbanks AK 99775-7320
Work: 907-474-5172
cell: 907-328-9145

Show quoted text

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

mail@joeconway.com

about 3 years ago

In reply to: Israel Brewster (#15)

Re: Properly handle OOM death?

On 3/13/23 16:18, Israel Brewster wrote:

On Mar 13, 2023, at 11:42 AM, Joe Conway <mail@joeconway.com> wrote:
I am not entirely sure, but without actually testing it I suspect
that since memory.max = high (that is, the limit is whatever the
host has available) the OOM kill is technically a cgroup OOM kill
even though it is effectively a host level memory pressure event.

Sorry, actually meant "memory.max = max" here

Did you try setting "vm.overcommit_memory=2"?

root@novarupta:~# sysctl -w vm.overcommit_memory=2
sysctl: setting key "vm.overcommit_memory", ignoring: Read-only file system

I’m thinking I wound up with a container rather than a full VM after
all - and as such, the best solution may be to migrate to a full VM
with some swap space available to avoid the issue in the first place.
I’ll have to get in touch with the sys admin for that though.

Hmm, well big +1 for having swap turned on, but I recommend setting
"vm.overcommit_memory=2" even so.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Israel Brewster

ijbrewster@alaska.edu

about 3 years ago

In reply to: Peter J. Holzer (#14)

Re: Properly handle OOM death?

On Mar 13, 2023, at 12:16 PM, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:

On 2023-03-13 09:55:50 -0800, Israel Brewster wrote:

On Mar 13, 2023, at 9:43 AM, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:

The syslog should contain a list of all tasks prior to the kill. For
example, I just provoked an OOM kill on my laptop and the syslog
contains (among lots of others) these lines:

Mar 13 21:00:36 trintignant kernel: [112024.084117] [ 2721] 126 2721 54563 2042 163840 555 -900 postgres
Mar 13 21:00:36 trintignant kernel: [112024.084123] [ 2873] 126 2873 18211 85 114688 594 0 postgres
Mar 13 21:00:36 trintignant kernel: [112024.084128] [ 2941] 126 2941 54592 1231 147456 565 0 postgres
Mar 13 21:00:36 trintignant kernel: [112024.084134] [ 2942] 126 2942 54563 535 143360 550 0 postgres
Mar 13 21:00:36 trintignant kernel: [112024.084139] [ 2943] 126 2943 54563 1243 139264 548 0 postgres
Mar 13 21:00:36 trintignant kernel: [112024.084145] [ 2944] 126 2944 54798 561 147456 545 0 postgres
Mar 13 21:00:36 trintignant kernel: [112024.084150] [ 2945] 126 2945 54563 215 131072 551 0 postgres
Mar 13 21:00:36 trintignant kernel: [112024.084156] [ 2956] 126 2956 18718 506 122880 553 0 postgres
Mar 13 21:00:36 trintignant kernel: [112024.084161] [ 2957] 126 2957 54672 269 139264 546 0 postgres

That's less helpful than it could be since all the postgres processes
are just listed as "postgres" without arguments. However, it is very
likely that the first one is actually the postmaster, because it has the
lowest pid (and the other pids follow closely) and it has an OOM score
of -900 as set in the systemd service file.

So I could compare the PID of the killed process with this list (in my
case the killed process wasn't one of them but a test program which just
allocates lots of memory).

Oh, interesting. I had just greped for ‘Killed process’, so I didn’t see those preceding lines 😛 Looking at that, I see two things:
1) The entries in my syslog all refer to an R process, not a postgresql process at all
2) The ‘Killed process’ entry *does* actually have the process name in it - it’s just since the process name was “R”, I wasn’t making the connection 😄

Show quoted text

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at <mailto:hjp@hjp.at> | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

Israel Brewster

ijbrewster@alaska.edu

about 3 years ago

In reply to: Joe Conway (#16)

Re: Properly handle OOM death?

On Mar 13, 2023, at 12:25 PM, Joe Conway <mail@joeconway.com> wrote:

On 3/13/23 16:18, Israel Brewster wrote:

Did you try setting "vm.overcommit_memory=2"?

root@novarupta:~# sysctl -w vm.overcommit_memory=2
sysctl: setting key "vm.overcommit_memory", ignoring: Read-only file system

I’m thinking I wound up with a container rather than a full VM after
all - and as such, the best solution may be to migrate to a full VM
with some swap space available to avoid the issue in the first place.
I’ll have to get in touch with the sys admin for that though.

Hmm, well big +1 for having swap turned on, but I recommend setting "vm.overcommit_memory=2" even so.

Makes sense. Presumably with a full VM I won’t get the “Read-only file system” error when trying to do so.

Thanks!

---
Israel Brewster
Software Engineer
Alaska Volcano Observatory
Geophysical Institute - UAF
2156 Koyukuk Drive
Fairbanks AK 99775-7320
Work: 907-474-5172
cell: 907-328-9145

Show quoted text

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

tpo2@sourcepole.ch

about 3 years ago

In reply to: Joe Conway (#16)

Re: Properly handle OOM death?

On 13.03.23 21:25, Joe Conway wrote:

Hmm, well big +1 for having swap turned on, but I recommend setting
"vm.overcommit_memory=2" even so.

I've snipped out the context here, since my advice is very unspecific:
do use swap only as a safety net. Once your system starts swapping
performance goes down the toilet.
*t

noloader@gmail.com

about 3 years ago

In reply to: Tomas Pospisek (#19)

Re: Properly handle OOM death?

On Sat, Mar 18, 2023 at 6:02 PM Tomas Pospisek <tpo2@sourcepole.ch> wrote:

On 13.03.23 21:25, Joe Conway wrote:

Hmm, well big +1 for having swap turned on, but I recommend setting
"vm.overcommit_memory=2" even so.

I've snipped out the context here, since my advice is very unspecific:
do use swap only as a safety net. Once your system starts swapping
performance goes down the toilet.

To use swap as a safety net, set swappiness to a low value, like 2.
Two will keep most data in RAM and reduce (but not eliminate) spilling
to the file system.

I have a bunch of old ARM dev boards that are resource constrained.
They use SDcards, which have a limited lifetime based on writes. I
give the boards a 1 GB swap file to avoid OOM kills when running the
compiler on C++ programs. And I configure them with a swappiness of 2
to reduce swapping.

Jeff

mail@joeconway.com

about 3 years ago

In reply to: Tomas Pospisek (#19)

pryzby@telsasoft.com

over 2 years ago

In reply to: Peter J. Holzer (#6)