postgres on a non-journaling filesystem
hello,
I'm Maayan, I'm in a DBA team that uses postgresql.
I saw in the documentation on wals:
https://www.postgresql.org/docs/10/wal-intro.html
In the tip box that, it's better not to use a journaling filesystem. and I
wanted to ask how it works?
can't we get corruption that we can't recover from?
I mean what if postgres in the middle of a write to a wal and there is a
crash, and it didn't finish.
I'm assuming it will detect it when we will start postgres and write that
it was rolled back, am I right?
and how does it work in the data level? if some of the 8k block is written
but not all of it, and then there is a crash, how postgres deals with it?
Thanks in advance
On 23/01/2019 01:03, maayan mordehai wrote:
hello,
I'm Maayan, I'm in a DBA team that uses postgresql.
I saw in the documentation on wals:
https://www.postgresql.org/docs/10/wal-intro.html
In the tip box that, it's better not to use a journaling filesystem. and I
wanted to ask how it works?
can't we get corruption that we can't recover from?
I mean what if postgres in the middle of a write to a wal and there is a
crash, and it didn't finish.
I'm assuming it will detect it when we will start postgres and write that
it was rolled back, am I right?
Yep, any half-written transactions will be rolled back.
and how does it work in the data level? if some of the 8k block is written
but not all of it, and then there is a crash, how postgres deals with it?
The first time a block is modified after a checkpoint, a copy of the
block is written to the WAL. At crash recovery, the block is restored
from the WAL. This mechanism is called "full page writes".
The WAL works just like the journal in a journaling filesystem. That's
why it's not necessary to have journaling at the filesystem level.
- Heikki
Thank you!!
On Wed, Jan 23, 2019, 2:20 PM Heikki Linnakangas <hlinnaka@iki.fi wrote:
Show quoted text
On 23/01/2019 01:03, maayan mordehai wrote:
hello,
I'm Maayan, I'm in a DBA team that uses postgresql.
I saw in the documentation on wals:
https://www.postgresql.org/docs/10/wal-intro.html
In the tip box that, it's better not to use a journaling filesystem.and I
wanted to ask how it works?
can't we get corruption that we can't recover from?
I mean what if postgres in the middle of a write to a wal and there is a
crash, and it didn't finish.
I'm assuming it will detect it when we will start postgres and write that
it was rolled back, am I right?Yep, any half-written transactions will be rolled back.
and how does it work in the data level? if some of the 8k block is
written
but not all of it, and then there is a crash, how postgres deals with it?
The first time a block is modified after a checkpoint, a copy of the
block is written to the WAL. At crash recovery, the block is restored
from the WAL. This mechanism is called "full page writes".The WAL works just like the journal in a journaling filesystem. That's
why it's not necessary to have journaling at the filesystem level.- Heikki
On 2019-01-23 14:20:52 +0200, Heikki Linnakangas wrote:
On 23/01/2019 01:03, maayan mordehai wrote:
hello,
I'm Maayan, I'm in a DBA team that uses postgresql.
I saw in the documentation on wals:
https://www.postgresql.org/docs/10/wal-intro.html
In the tip box that, it's better not to use a journaling filesystem. and I
wanted to ask how it works?
can't we get corruption that we can't recover from?
I mean what if postgres in the middle of a write to a wal and there is a
crash, and it didn't finish.
I'm assuming it will detect it when we will start postgres and write that
it was rolled back, am I right?Yep, any half-written transactions will be rolled back.
and how does it work in the data level? if some of the 8k block is written
but not all of it, and then there is a crash, how postgres deals with it?The first time a block is modified after a checkpoint, a copy of the block
is written to the WAL. At crash recovery, the block is restored from the
WAL. This mechanism is called "full page writes".The WAL works just like the journal in a journaling filesystem. That's why
it's not necessary to have journaling at the filesystem level.
But note not having journaling on the FS level often makes OS start
after a crash *painfully* slow, because fsck or similar will be run. And
that's often necessary for the internal FS consistency.
Note that even with journaling enabled, most filesystem by default don't
journal data, so you can get those partial writes anyway.
Greetings,
Andres Freund