PostgreSQL & latest Mac OS Sonoma, a possible bug / configuration issue

Started by Arnd Baranowskiabout 2 years ago7 messagesbugs
Jump to latest
#1Arnd Baranowski
baranowski@oculeus.com

Hello,

I have a MacBook Pro 16 inch with M3 Max in the base configuration (48 gig RAM, 1 terabyte HD). Operating System macOS the latest Sonoma 14.3. I use the latest Postgres 14 installed via brew in the standard configuration. From time to time (every second or third week) I use to reboot the Mac and since several weeks now (the last 4 reboots at least) I lose data of the database when rebooting and I fall back to a state of several days ahead of the reboot. This affects structure and data added. I cover this via backups and it looks that data is kept in memory rather than written to the database. Beside this Mac and Postgres run fine.

Regards

Arnd Baranowski

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Arnd Baranowski (#1)
Re: PostgreSQL & latest Mac OS Sonoma, a possible bug / configuration issue

Arnd Baranowski <baranowski@oculeus.com> writes:

I have a MacBook Pro 16 inch with M3 Max in the base configuration (48 gig RAM, 1 terabyte HD). Operating System macOS the latest Sonoma 14.3. I use the latest Postgres 14 installed via brew in the standard configuration. From time to time (every second or third week) I use to reboot the Mac and since several weeks now (the last 4 reboots at least) I lose data of the database when rebooting and I fall back to a state of several days ahead of the reboot. This affects structure and data added. I cover this via backups and it looks that data is kept in memory rather than written to the database. Beside this Mac and Postgres run fine.

Hmm, what have you got the fsync and wal_sync_method GUCs set to?
What was the last macOS version that was stable for you?

regards, tom lane

#3Arnd Baranowski
baranowski@oculeus.com
In reply to: Tom Lane (#2)
Re: PostgreSQL & latest Mac OS Sonoma, a possible bug / configuration issue

Correction fsync is „On" and the wal_sync_method is set to „open_datasync“

Show quoted text

Am 05.02.2024 um 21:44 schrieb Tom Lane <tgl@sss.pgh.pa.us>:

Arnd Baranowski <baranowski@oculeus.com> writes:

I have a MacBook Pro 16 inch with M3 Max in the base configuration (48 gig RAM, 1 terabyte HD). Operating System macOS the latest Sonoma 14.3. I use the latest Postgres 14 installed via brew in the standard configuration. From time to time (every second or third week) I use to reboot the Mac and since several weeks now (the last 4 reboots at least) I lose data of the database when rebooting and I fall back to a state of several days ahead of the reboot. This affects structure and data added. I cover this via backups and it looks that data is kept in memory rather than written to the database. Beside this Mac and Postgres run fine.

Hmm, what have you got the fsync and wal_sync_method GUCs set to?
What was the last macOS version that was stable for you?

regards, tom lane

#4Arnd Baranowski
baranowski@oculeus.com
In reply to: Tom Lane (#2)
Re: PostgreSQL & latest Mac OS Sonoma, a possible bug / configuration issue

The latest stable version seemed to be 14.1. I do not know and it might have been a coincidence. Recently I got forced to upgrade my Postgres by Brew. Postgres moved from 14.7 to 14.10. This was about the same time my problem started

Show quoted text

Am 05.02.2024 um 21:44 schrieb Tom Lane <tgl@sss.pgh.pa.us>:

Arnd Baranowski <baranowski@oculeus.com> writes:

I have a MacBook Pro 16 inch with M3 Max in the base configuration (48 gig RAM, 1 terabyte HD). Operating System macOS the latest Sonoma 14.3. I use the latest Postgres 14 installed via brew in the standard configuration. From time to time (every second or third week) I use to reboot the Mac and since several weeks now (the last 4 reboots at least) I lose data of the database when rebooting and I fall back to a state of several days ahead of the reboot. This affects structure and data added. I cover this via backups and it looks that data is kept in memory rather than written to the database. Beside this Mac and Postgres run fine.

Hmm, what have you got the fsync and wal_sync_method GUCs set to?
What was the last macOS version that was stable for you?

regards, tom lane

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Arnd Baranowski (#3)
Re: PostgreSQL & latest Mac OS Sonoma, a possible bug / configuration issue

Arnd Baranowski <baranowski@oculeus.com> writes:

Correction fsync is „On" and the wal_sync_method is set to „open_datasync“

That's what they should be.

I tried to reproduce this by selecting "Restart..." immediately after
creating/populating a table on my own MacBook running Sonoma 14.3.
After the reboot, the table was there with the expected contents.
Now, this test doesn't actually prove a heck of a lot about PG's
crash recovery, because I see in the postmaster log

2024-02-05 21:00:30.322 EST [1148] LOG: database system was shut down at 2024-02-05 20:58:46 EST
2024-02-05 21:00:30.327 EST [1144] LOG: database system is ready to accept connections

which indicates that Postgres had time to perform a clean shutdown
before the system rebooted. (That is the expected scenario for an
OS reboot, assuming that the kernel delivers us SIGTERM as it's
required to do by POSIX and then gives us enough time to nail the
windows shut, which it's not required to do.)

The facts as you've presented them indicate that (1) checkpoints
weren't working, (2) we didn't get SIGTERM at system shutdown, *and*
(3) WAL wasn't written out to disk as it's supposed to be. It's
a bit hard to credit that so many things are broken and nobody has
noticed. I'm inclined to wonder if something is wrong with your
disk drive.

It would be interesting to know what appears in the first few lines
of your postmaster log after a data-losing restart. Also, try
running with log_checkpoints = on for awhile, and see if there are
log entries claiming successful checkpoint completion.

A different line of thought is that maybe the corruption is happening
because you have two postmasters started in the same data directory.
We have interlocks that are supposed to defend against that, but it'd
be a lot easier to credit that those aren't working than that all the
rest of this stuff broke.

regards, tom lane

#6Arnd Baranowski
baranowski@oculeus.com
In reply to: Tom Lane (#5)
Re: PostgreSQL & latest Mac OS Sonoma, a possible bug / configuration issue

Hi Tom,

Thanks for the feedback and insights. I will follow your advice, observe and report if I find something which could explain this behavior

Regard

Arnd

Show quoted text

Am 06.02.2024 um 03:18 schrieb Tom Lane <tgl@sss.pgh.pa.us>:

Arnd Baranowski <baranowski@oculeus.com> writes:

Correction fsync is „On" and the wal_sync_method is set to „open_datasync“

That's what they should be.

I tried to reproduce this by selecting "Restart..." immediately after
creating/populating a table on my own MacBook running Sonoma 14.3.
After the reboot, the table was there with the expected contents.
Now, this test doesn't actually prove a heck of a lot about PG's
crash recovery, because I see in the postmaster log

2024-02-05 21:00:30.322 EST [1148] LOG: database system was shut down at 2024-02-05 20:58:46 EST
2024-02-05 21:00:30.327 EST [1144] LOG: database system is ready to accept connections

which indicates that Postgres had time to perform a clean shutdown
before the system rebooted. (That is the expected scenario for an
OS reboot, assuming that the kernel delivers us SIGTERM as it's
required to do by POSIX and then gives us enough time to nail the
windows shut, which it's not required to do.)

The facts as you've presented them indicate that (1) checkpoints
weren't working, (2) we didn't get SIGTERM at system shutdown, *and*
(3) WAL wasn't written out to disk as it's supposed to be. It's
a bit hard to credit that so many things are broken and nobody has
noticed. I'm inclined to wonder if something is wrong with your
disk drive.

It would be interesting to know what appears in the first few lines
of your postmaster log after a data-losing restart. Also, try
running with log_checkpoints = on for awhile, and see if there are
log entries claiming successful checkpoint completion.

A different line of thought is that maybe the corruption is happening
because you have two postmasters started in the same data directory.
We have interlocks that are supposed to defend against that, but it'd
be a lot easier to credit that those aren't working than that all the
rest of this stuff broke.

regards, tom lane

#7Arnd Baranowski
baranowski@oculeus.com
In reply to: Tom Lane (#5)
Re: PostgreSQL & latest Mac OS Sonoma, a possible bug / configuration issue

Hi Tom,

I completely deleted my Mac installation of Postgres and Brew. Reinstalled everything from scratch and moved to PostgreSQL16. The issue is gone. It looks like a screwed PostgreSQL14 installation caused the problem.

Regards

Arnd

---

Hi Tom,

Thanks for the feedback and insights. I will follow your advice, observe and report if I find something which could explain this behavior

Regard

Arnd

Show quoted text

Am 06.02.2024 um 03:18 schrieb Tom Lane <tgl@sss.pgh.pa.us>:

Arnd Baranowski <baranowski@oculeus.com> writes:

Correction fsync is „On" and the wal_sync_method is set to „open_datasync“

That's what they should be.

I tried to reproduce this by selecting "Restart..." immediately after
creating/populating a table on my own MacBook running Sonoma 14.3.
After the reboot, the table was there with the expected contents.
Now, this test doesn't actually prove a heck of a lot about PG's
crash recovery, because I see in the postmaster log

2024-02-05 21:00:30.322 EST [1148] LOG: database system was shut down at 2024-02-05 20:58:46 EST
2024-02-05 21:00:30.327 EST [1144] LOG: database system is ready to accept connections

which indicates that Postgres had time to perform a clean shutdown
before the system rebooted. (That is the expected scenario for an
OS reboot, assuming that the kernel delivers us SIGTERM as it's
required to do by POSIX and then gives us enough time to nail the
windows shut, which it's not required to do.)

The facts as you've presented them indicate that (1) checkpoints
weren't working, (2) we didn't get SIGTERM at system shutdown, *and*
(3) WAL wasn't written out to disk as it's supposed to be. It's
a bit hard to credit that so many things are broken and nobody has
noticed. I'm inclined to wonder if something is wrong with your
disk drive.

It would be interesting to know what appears in the first few lines
of your postmaster log after a data-losing restart. Also, try
running with log_checkpoints = on for awhile, and see if there are
log entries claiming successful checkpoint completion.

A different line of thought is that maybe the corruption is happening
because you have two postmasters started in the same data directory.
We have interlocks that are supposed to defend against that, but it'd
be a lot easier to credit that those aren't working than that all the
rest of this stuff broke.

regards, tom lane