ZFS filesystem - supported ?
Hi,
Given an upcoming server upgrade, I'm contemplating moving away from XFS to ZFS (specifically the ZoL flavour via Debian 11). BTRFS seems to be falling away (e.g. with Redhat deprecating it etc.), hence my preference for ZFS.
However, somewhere in the back of my mind I seem to have a recollection of reading about what could be described as a "strong encouragement" to stick with more traditional options such as ext4 or xfs.
A brief search of the docs for "xfs" didn't come up with anything, hence the question here.
Thanks !
Laura
On 10/23/21 07:29, Laura Smith wrote:
Hi,
Given an upcoming server upgrade, I'm contemplating moving away from XFS to ZFS (specifically the ZoL flavour via Debian 11). BTRFS seems to be falling away (e.g. with Redhat deprecating it etc.), hence my preference for ZFS.
However, somewhere in the back of my mind I seem to have a recollection of reading about what could be described as a "strong encouragement" to stick with more traditional options such as ext4 or xfs.
A brief search of the docs for "xfs" didn't come up with anything, hence the question here.
Thanks !
Laura
Hi Laura,
May I ask why would you like to change file systems? Probably because of
the snapshot capability? However, ZFS performance leaves much to be
desired. Please see the following article:
https://www.phoronix.com/scan.php?page=article&item=ubuntu1910-ext4-zfs&num=1
This is relatively new, from 2019. On the page 3 there are tests with
SQLite, Cassandra and RocksDB. Ext4 is much faster in all of them.
Finally, there is another article about relational databases and ZFS:
https://blog.docbert.org/oracle-on-zfs/
In other words, I would test very thoroughly because your performance is
likely to suffer. As for the supported part, that's not a problem.
Postgres supports all modern file systems. It uses Posix system calls to
manipulate, read and write files. Furthermore, if you need snapshots,
disk arrays like NetApp, Hitachi or EMC can always provide that.
Regards
--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com
On Saturday, October 23rd, 2021 at 14:03, Mladen Gogala <gogala.mladen@gmail.com> wrote:
On 10/23/21 07:29, Laura Smith wrote:
Hi,
Given an upcoming server upgrade, I'm contemplating moving away from XFS to ZFS (specifically the ZoL flavour via Debian 11). BTRFS seems to be falling away (e.g. with Redhat deprecating it etc.), hence my preference for ZFS.
However, somewhere in the back of my mind I seem to have a recollection of reading about what could be described as a "strong encouragement" to stick with more traditional options such as ext4 or xfs.
A brief search of the docs for "xfs" didn't come up with anything, hence the question here.
Thanks !
Laura
Hi Laura,
May I ask why would you like to change file systems? Probably because of
the snapshot capability? However, ZFS performance leaves much to be
desired. Please see the following article:
https://www.phoronix.com/scan.php?page=article&item=ubuntu1910-ext4-zfs&num=1
This is relatively new, from 2019. On the page 3 there are tests with
SQLite, Cassandra and RocksDB. Ext4 is much faster in all of them.
Finally, there is another article about relational databases and ZFS:
https://blog.docbert.org/oracle-on-zfs/
In other words, I would test very thoroughly because your performance is
likely to suffer. As for the supported part, that's not a problem.
Postgres supports all modern file systems. It uses Posix system calls to
manipulate, read and write files. Furthermore, if you need snapshots,
disk arrays like NetApp, Hitachi or EMC can always provide that.
Regards
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
Hi Mladen,
Yes indeed, snapshots is the primary reason, closely followed by zfssend/receive.
I'm no stranger to using LVM snapshots with ext4/xfs but it requires a custom shell script to manage the whole process around backups. I feel the whole thing could well be a lot cleaner with zfs.
Thank you for the links, I will take a look.
Laura
On 10/23/21 09:37, Laura Smith wrote:
Hi Mladen,
Yes indeed, snapshots is the primary reason, closely followed by zfssend/receive.
I'm no stranger to using LVM snapshots with ext4/xfs but it requires a custom shell script to manage the whole process around backups. I feel the whole thing could well be a lot cleaner with zfs.
Thank you for the links, I will take a look.
Laura
Yes, ZFS is extremely convenient. It's a volume manager and a file
system, all rolled into one, with some additiional convenient tools.
However, performance is a major concern. If your application is OLTP,
ZFS might be a tad too slow for your performance requirements. On the
other hand, snapshots can save you a lot of time with backups,
especially if you have some commercial backup capable of multiple
readers. If your application is OLTP, ZFS might be a tad too slow for
your performance requirements. The only way to find out is to test. The
ideal tool for testing is pgio:
For those who do not know, Kevin Closson was the technical architect who
has built both Exadata and EMC XTRemIO. He is now the principal engineer
of the Amazon RDS. This part is intended only for those who would tell
him that "Oracle has it is not good enough" if he ever decided to post here.
--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com
On 2021-10-24 06:48, Mladen Gogala wrote:
On 10/23/21 09:37, Laura Smith wrote:
Hi Mladen,
Yes indeed, snapshots is the primary reason, closely followed by
zfssend/receive.I'm no stranger to using LVM snapshots with ext4/xfs but it requires a
custom shell script to manage the whole process around backups. I
feel the whole thing could well be a lot cleaner with zfs.Thank you for the links, I will take a look.
Laura
Yes, ZFS is extremely convenient. It's a volume manager and a file
system, all rolled into one, with some additiional convenient tools.
However, performance is a major concern. If your application is OLTP,
ZFS might be a tad too slow for your performance requirements. On the
other hand, snapshots can save you a lot of time with backups,
especially if you have some commercial backup capable of multiple
readers. If your application is OLTP, ZFS might be a tad too slow for
your performance requirements. The only way to find out is to test. The
ideal tool for testing is pgio:For those who do not know, Kevin Closson was the technical architect
who has built both Exadata and EMC XTRemIO. He is now the principal
engineer of the Amazon RDS. This part is intended only for those who
would tell him that "Oracle has it is not good enough" if he ever
decided to post here.
Interesting subject... I'm working on a migration from PG 9.2 to PG 14
and was wondering which File System should I use. Looking at this
thread, looks like I should keep using ext4.
I don't know where you have your database deployed, but in my case is in
AWS EC2 instances. The way I handle backups is at the block storage
level, performing EBS snapshots.
This has proven to work very well for me. I had to restore a few backups
already and it always worked. The bad part is that I need to stop the
database before performing the Snapshot, for data integrity, so that
means that I have a hot-standby server only for these snapshots.
Lucas
On 10/23/21 23:12, Lucas wrote:
I don't know where you have your database deployed, but in my case is
in AWS EC2 instances. The way I handle backups is at the block storage
level, performing EBS snapshots.
Yes, Amazon uses SAN equipment that supports snapshots.
This has proven to work very well for me. I had to restore a few
backups already and it always worked. The bad part is that I need to
stop the database before performing the Snapshot, for data integrity,
so that means that I have a hot-standby server only for these snapshots.
Lucas
Actually, you don't need to stop the database. You need to execute
pg_start_backup() before taking a snapshot and then pg_stop_backup()
when the snapshot is done. You will need to recover the database when
you finish the restore but you will not lose any data. I know that
pg_begin_backup() and pg_stop_backup() are deprecated but since
PostgreSQL doesn't have any API for storage or file system snapshots,
that's the only thing that can help you use storage snapshots as
backups. To my knowledge,the only database that does have API for
storage snapshots is DB2. The API is called "Advanced Copy Services" or
ACS. It's documented here:
https://www.ibm.com/docs/en/db2/11.1?topic=recovery-db2-advanced-copy-services-acs
For Postgres, the old begin/stop backup functions should be sufficient.
--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Saturday, October 23rd, 2021 at 18:48, Mladen Gogala <gogala.mladen@gmail.com> wrote:
On 10/23/21 09:37, Laura Smith wrote:
Hi Mladen,
Yes indeed, snapshots is the primary reason, closely followed by zfssend/receive.
I'm no stranger to using LVM snapshots with ext4/xfs but it requires a custom shell script to manage the whole process around backups. I feel the whole thing could well be a lot cleaner with zfs.
Thank you for the links, I will take a look.
Laura
Yes, ZFS is extremely convenient. It's a volume manager and a file
system, all rolled into one, with some additiional convenient tools.
However, performance is a major concern. If your application is OLTP,
ZFS might be a tad too slow for your performance requirements. On the
other hand, snapshots can save you a lot of time with backups,
especially if you have some commercial backup capable of multiple
readers. If your application is OLTP, ZFS might be a tad too slow for
your performance requirements. The only way to find out is to test. The
ideal tool for testing is pgio:
For those who do not know, Kevin Closson was the technical architect who
has built both Exadata and EMC XTRemIO. He is now the principal engineer
of the Amazon RDS. This part is intended only for those who would tell
him that "Oracle has it is not good enough" if he ever decided to post here.
Thank you Mladen for your very useful food for thought.
I think my plan going forward will be to stick to the old XFS+LVM setup and (maybe) when I have some more time on my hands fire up a secondary instance with ZFS and do some experimentation with pgio.
Thanks again !
On Sat, 2021-10-23 at 11:29 +0000, Laura Smith wrote:
Given an upcoming server upgrade, I'm contemplating moving away from XFS to ZFS
(specifically the ZoL flavour via Debian 11).
BTRFS seems to be falling away (e.g. with Redhat deprecating it etc.), hence my preference for ZFS.However, somewhere in the back of my mind I seem to have a recollection of reading
about what could be described as a "strong encouragement" to stick with more traditional options such as ext4 or xfs.
ZFS is probably reliable, so you can use it with PostgreSQL.
However, I have seen reports of performance tests that were not favorable for ZFS.
So you should test if the performance is good enough for your use case.
Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com
Greetings,
* Mladen Gogala (gogala.mladen@gmail.com) wrote:
On 10/23/21 23:12, Lucas wrote:
This has proven to work very well for me. I had to restore a few backups
already and it always worked. The bad part is that I need to stop the
database before performing the Snapshot, for data integrity, so that means
that I have a hot-standby server only for these snapshots.
LucasActually, you don't need to stop the database. You need to execute
pg_start_backup() before taking a snapshot and then pg_stop_backup() when
the snapshot is done. You will need to recover the database when you finish
the restore but you will not lose any data. I know that pg_begin_backup()
and pg_stop_backup() are deprecated but since PostgreSQL doesn't have any
API for storage or file system snapshots, that's the only thing that can
help you use storage snapshots as backups. To my knowledge,the only database
that does have API for storage snapshots is DB2. The API is called "Advanced
Copy Services" or ACS. It's documented here:https://www.ibm.com/docs/en/db2/11.1?topic=recovery-db2-advanced-copy-services-acs
For Postgres, the old begin/stop backup functions should be sufficient.
No, it's not- you must also be sure to archive any WAL that's generated
between the pg_start_backup and pg_stop_backup and then to be sure and
add into the snapshot the appropriate signal files or recovery.conf,
depending on PG version, to indicate that you're restoring from a backup
and make sure that the WAL is made available via restore_command.
Just doing stat/stop backup is *not* enough and you run the risk of
having an invalid backup or corruption when you restore.
If the entire system is on a single volume then you could possibly just
take a snapshot of it (without any start/stop backup stuff) but it's
very risky to do that and then try to do PITR with it because we don't
know where consistency is reached in such a case (we *must* play all the
way through to the end of the WAL which existed at the time of the
snapshot in order to reach consistency).
In the end though, really, it's much, much, much better to use a proper
backup and archiving tool that's written specifically for PG than to try
and roll your own, using snapshots or not.
Thanks,
Stephen
On Mon, Oct 25, 2021 at 10:18 AM Laurenz Albe <laurenz.albe@cybertec.at>
wrote:
On Sat, 2021-10-23 at 11:29 +0000, Laura Smith wrote:
Given an upcoming server upgrade, I'm contemplating moving away from XFS
to ZFS
(specifically the ZoL flavour via Debian 11).
BTRFS seems to be falling away (e.g. with Redhat deprecating it etc.),hence my preference for ZFS.
However, somewhere in the back of my mind I seem to have a recollection
of reading
about what could be described as a "strong encouragement" to stick with
more traditional options such as ext4 or xfs.
ZFS is probably reliable, so you can use it with PostgreSQL.
However, I have seen reports of performance tests that were not favorable
for ZFS.
So you should test if the performance is good enough for your use case.
It very much depends on lots of factors.
On the whole ZFS on spinning disks is going to have some performance...
rough corners..... And it is a lot harder to reason about a lot of things
including capacity and performance when you are doing copy on write on both
the db and FS level, and have compression in the picture. And there are
other areas of complexity, such as how you handle partial page writes.
On the whole I think for small dbs it might perform well enough. On large
or high velocity dbs I think you will have more problems than expected.
Having worked with PostgreSQL on ZFS I wouldn't generally recommend it as a
general tool.
Best Wishes,
Chris Travers
Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com
--
Best Wishes,
Chris Travers
Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor
lock-in.
http://www.efficito.com/learn_more
On 10/25/21 13:13, Stephen Frost wrote:
No, it's not- you must also be sure to archive any WAL that's generated
between the pg_start_backup and pg_stop_backup and then to be sure and
add into the snapshot the appropriate signal files or recovery.conf,
depending on PG version, to indicate that you're restoring from a backup
and make sure that the WAL is made available via restore_command.Just doing stat/stop backup is*not* enough and you run the risk of
having an invalid backup or corruption when you restore.If the entire system is on a single volume then you could possibly just
take a snapshot of it (without any start/stop backup stuff) but it's
very risky to do that and then try to do PITR with it because we don't
know where consistency is reached in such a case (we*must* play all the
way through to the end of the WAL which existed at the time of the
snapshot in order to reach consistency).In the end though, really, it's much, much, much better to use a proper
backup and archiving tool that's written specifically for PG than to try
and roll your own, using snapshots or not.Thanks,
Stephen
Stephen, thank you for correcting me. You, of course, are right. I have
erroneously thought that backup of WAL logs is implied because I always
back that up. And yes, that needs to be made clear.
Regards
--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com
On 10/25/2021 10:13 AM, Stephen Frost wrote:
Greetings,
* Mladen Gogala (gogala.mladen@gmail.com) wrote:
On 10/23/21 23:12, Lucas wrote:
This has proven to work very well for me. I had to restore a few backups
already and it always worked. The bad part is that I need to stop the
database before performing the Snapshot, for data integrity, so that means
that I have a hot-standby server only for these snapshots.
LucasActually, you don't need to stop the database. You need to execute
pg_start_backup() before taking a snapshot and then pg_stop_backup() when
the snapshot is done. You will need to recover the database when you finish
the restore but you will not lose any data. I know that pg_begin_backup()
and pg_stop_backup() are deprecated but since PostgreSQL doesn't have any
API for storage or file system snapshots, that's the only thing that can
help you use storage snapshots as backups. To my knowledge,the only database
that does have API for storage snapshots is DB2. The API is called "Advanced
Copy Services" or ACS. It's documented here:https://www.ibm.com/docs/en/db2/11.1?topic=recovery-db2-advanced-copy-services-acs
For Postgres, the old begin/stop backup functions should be sufficient.
No, it's not- you must also be sure to archive any WAL that's generated
between the pg_start_backup and pg_stop_backup and then to be sure and
add into the snapshot the appropriate signal files or recovery.conf,
depending on PG version, to indicate that you're restoring from a backup
and make sure that the WAL is made available via restore_command.Just doing stat/stop backup is *not* enough and you run the risk of
having an invalid backup or corruption when you restore.If the entire system is on a single volume then you could possibly just
take a snapshot of it (without any start/stop backup stuff) but it's
very risky to do that and then try to do PITR with it because we don't
know where consistency is reached in such a case (we *must* play all the
way through to the end of the WAL which existed at the time of the
snapshot in order to reach consistency).In the end though, really, it's much, much, much better to use a proper
backup and archiving tool that's written specifically for PG than to try
and roll your own, using snapshots or not.Thanks,
Stephen
what about BTRFS since it's the successor of ZFS?
--
E-BLOKOS
On 26/10/2021, at 6:13 AM, Stephen Frost <sfrost@snowman.net> wrote:
Greetings,
* Mladen Gogala (gogala.mladen@gmail.com) wrote:
On 10/23/21 23:12, Lucas wrote:
This has proven to work very well for me. I had to restore a few backups
already and it always worked. The bad part is that I need to stop the
database before performing the Snapshot, for data integrity, so that means
that I have a hot-standby server only for these snapshots.
LucasActually, you don't need to stop the database. You need to execute
pg_start_backup() before taking a snapshot and then pg_stop_backup() when
the snapshot is done. You will need to recover the database when you finish
the restore but you will not lose any data. I know that pg_begin_backup()
and pg_stop_backup() are deprecated but since PostgreSQL doesn't have any
API for storage or file system snapshots, that's the only thing that can
help you use storage snapshots as backups. To my knowledge,the only database
that does have API for storage snapshots is DB2. The API is called "Advanced
Copy Services" or ACS. It's documented here:https://www.ibm.com/docs/en/db2/11.1?topic=recovery-db2-advanced-copy-services-acs
For Postgres, the old begin/stop backup functions should be sufficient.
No, it's not- you must also be sure to archive any WAL that's generated
between the pg_start_backup and pg_stop_backup and then to be sure and
add into the snapshot the appropriate signal files or recovery.conf,
depending on PG version, to indicate that you're restoring from a backup
and make sure that the WAL is made available via restore_command.Just doing stat/stop backup is *not* enough and you run the risk of
having an invalid backup or corruption when you restore.If the entire system is on a single volume then you could possibly just
take a snapshot of it (without any start/stop backup stuff) but it's
very risky to do that and then try to do PITR with it because we don't
know where consistency is reached in such a case (we *must* play all the
way through to the end of the WAL which existed at the time of the
snapshot in order to reach consistency).In the end though, really, it's much, much, much better to use a proper
backup and archiving tool that's written specifically for PG than to try
and roll your own, using snapshots or not.Thanks,
Stephen
When I create a snapshot, the script gets the latest WAL file applied from [1] and adds that information to the Snapshot Tags in AWS. I then use that information in the future when restoring the snapshot. The script will read the tag and it will download 50 WAL Files before that and all the WAL files after that required.
The WAL files are being backed up to S3.
I had to restore the database to a PITR state many times, and it always worked very well.
I also create slaves using the snapshot method. So, I don’t mind having to stop/start the Database for the snapshot process, as it’s proven to work fine for the last 5 years.
Lucas
On 10/25/21 15:43, E-BLOKOS wrote:
what about BTRFS since it's the successor of ZFS?
BTRFS is NOT the successor to ZFS. It never was. It was completely new
file system developed by Oracle Corp. For some reason, Oracle seems to
have lost interest in it. Red Hat has deprecated and, in all likelihood,
BTRFS will go the way of Solaris and SPARC chips: ride into the glorious
history of the computer science. However, BTRFS has never been widely
used, not even among Fedora users like me. BTRFS was suffering from
problems with corruption and performance. This is probably not the place
to discuss the inner workings of snapshots, but it is worth knowing that
snapshots drastically increase the IO rate on the file system - for
every snapshot. That's where the slowness comes from.
--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com
On 10/25/21 1:40 PM, Mladen Gogala wrote:
This is probably not the place
to discuss the inner workings of snapshots, but it is worth knowing that
snapshots drastically increase the IO rate on the file system - for
every snapshot. That's where the slowness comes from.
I have recent anecdotal experience of this. I experiment with using
Btrfs for a 32 TB backup system that has five 8 TB spinning disks.
There's an average of 8 MBps of writes scattered around the disks, which
isn't super high, obviously.
The results were vaguely acceptable until I created a snapshot of it, at
which point it became completely unusable. Even having one snapshot
present caused hundreds of btrfs-related kernel threads to thrash in the
"D" state almost constantly, and it never stopped doing that even when
left for many hours.
I then experimented with adding a bcache layer on top of Btrfs to see if
it would help. I added a 2 TB SSD using bcache, partitioned as 1900 GB
read cache and 100 GB write cache. It made very little difference and
was still unusable as soon as a snapshot was taken.
I did play with the various btrfs and bcache tuning knobs quite a bit
and couldn't improve it.
Since that test was a failure, I then decided to try the same setup with
OpenZFS on a lark, with the same set of disks in a "raidz" array, with
the 2 TB SSD as an l2arc read cache (no write cache). It easily handles
the same load, even with 72 hourly snapshots present, with the default
settings. I'm actually quite impressed with it.
I'm sure that the RAID, snapshots and copy-on-write reduce the maximum
performance considerably, compared to ext4. But on the other hand, it
did provide the performance I expected to be possible given the setup.
Btrfs *definitely* didn't; I was surprised at how badly it performed.
--
Robert L Mathews, Tiger Technologies, http://www.tigertech.net/
In my opinion, ext4 will solve any and all problems without a very deep
understanding of file system architecture. In short, i would stick with
ext4 unless you have a good reason not to. Maybe there is one. I have done
this a long time and never thought twice about which file system should
support my servers.
On Mon, Oct 25, 2021, 6:01 PM Robert L Mathews <lists@tigertech.com> wrote:
Show quoted text
On 10/25/21 1:40 PM, Mladen Gogala wrote:
This is probably not the place
to discuss the inner workings of snapshots, but it is worth knowing that
snapshots drastically increase the IO rate on the file system - for
every snapshot. That's where the slowness comes from.I have recent anecdotal experience of this. I experiment with using
Btrfs for a 32 TB backup system that has five 8 TB spinning disks.
There's an average of 8 MBps of writes scattered around the disks, which
isn't super high, obviously.The results were vaguely acceptable until I created a snapshot of it, at
which point it became completely unusable. Even having one snapshot
present caused hundreds of btrfs-related kernel threads to thrash in the
"D" state almost constantly, and it never stopped doing that even when
left for many hours.I then experimented with adding a bcache layer on top of Btrfs to see if
it would help. I added a 2 TB SSD using bcache, partitioned as 1900 GB
read cache and 100 GB write cache. It made very little difference and
was still unusable as soon as a snapshot was taken.I did play with the various btrfs and bcache tuning knobs quite a bit
and couldn't improve it.Since that test was a failure, I then decided to try the same setup with
OpenZFS on a lark, with the same set of disks in a "raidz" array, with
the 2 TB SSD as an l2arc read cache (no write cache). It easily handles
the same load, even with 72 hourly snapshots present, with the default
settings. I'm actually quite impressed with it.I'm sure that the RAID, snapshots and copy-on-write reduce the maximum
performance considerably, compared to ext4. But on the other hand, it
did provide the performance I expected to be possible given the setup.
Btrfs *definitely* didn't; I was surprised at how badly it performed.--
Robert L Mathews, Tiger Technologies, http://www.tigertech.net/
We have some users of our software who have had a good experience with
postgresql on zfs/zol. Two features which have proved useful are the
native encryption (less fiddly than luks) and compression. Interestingly,
many of our users are stuck with quite old and slow disks. Using
compression (even together with encryption) on the slow disks gives quite a
significant performance boost. Trading cpu for disk bandwidth. Also they
often dont have infinite access to more disk, so the storage efficiency is
welcomed.
We are interested in the snapshots but a little wary of potential data
integrity issues.
We have a disturbance in our database structure (usually nightly) where
large tables are dropped and recreated. snapshots of a gradually
increasing size database probably work very well. I think these massive
deletions probably make the snapshots quite heavy. Also creating a
challenge for incremental backups, replication etc but that is another (not
quite unrelated) issue.
Regards
Bob
On Tue, 26 Oct 2021 at 01:18, Benedict Holland <benedict.m.holland@gmail.com>
wrote:
Show quoted text
In my opinion, ext4 will solve any and all problems without a very deep
understanding of file system architecture. In short, i would stick with
ext4 unless you have a good reason not to. Maybe there is one. I have done
this a long time and never thought twice about which file system should
support my servers.On Mon, Oct 25, 2021, 6:01 PM Robert L Mathews <lists@tigertech.com>
wrote:On 10/25/21 1:40 PM, Mladen Gogala wrote:
This is probably not the place
to discuss the inner workings of snapshots, but it is worth knowingthat
snapshots drastically increase the IO rate on the file system - for
every snapshot. That's where the slowness comes from.I have recent anecdotal experience of this. I experiment with using
Btrfs for a 32 TB backup system that has five 8 TB spinning disks.
There's an average of 8 MBps of writes scattered around the disks, which
isn't super high, obviously.The results were vaguely acceptable until I created a snapshot of it, at
which point it became completely unusable. Even having one snapshot
present caused hundreds of btrfs-related kernel threads to thrash in the
"D" state almost constantly, and it never stopped doing that even when
left for many hours.I then experimented with adding a bcache layer on top of Btrfs to see if
it would help. I added a 2 TB SSD using bcache, partitioned as 1900 GB
read cache and 100 GB write cache. It made very little difference and
was still unusable as soon as a snapshot was taken.I did play with the various btrfs and bcache tuning knobs quite a bit
and couldn't improve it.Since that test was a failure, I then decided to try the same setup with
OpenZFS on a lark, with the same set of disks in a "raidz" array, with
the 2 TB SSD as an l2arc read cache (no write cache). It easily handles
the same load, even with 72 hourly snapshots present, with the default
settings. I'm actually quite impressed with it.I'm sure that the RAID, snapshots and copy-on-write reduce the maximum
performance considerably, compared to ext4. But on the other hand, it
did provide the performance I expected to be possible given the setup.
Btrfs *definitely* didn't; I was surprised at how badly it performed.--
Robert L Mathews, Tiger Technologies, http://www.tigertech.net/
Sent with ProtonMail Secure Email.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Tuesday, October 26th, 2021 at 01:18, Benedict Holland <benedict.m.holland@gmail.com> wrote:
In my opinion, ext4 will solve any and all problems without a very deep understanding of file system architecture. In short, i would stick with ext4 unless you have a good reason not to. Maybe there is one. I have done this a long time and never thought twice about which file system should support my servers.
Curious, when it comes to "traditional" filesystems, why ext4 and not xfs ? AFAIK the legacy issues associated with xfs are long gone ?
On 10/26/2021 2:35 AM, Laura Smith wrote:
Sent with ProtonMail Secure Email.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Tuesday, October 26th, 2021 at 01:18, Benedict Holland <benedict.m.holland@gmail.com> wrote:
In my opinion, ext4 will solve any and all problems without a very deep understanding of file system architecture. In short, i would stick with ext4 unless you have a good reason not to. Maybe there is one. I have done this a long time and never thought twice about which file system should support my servers.
Curious, when it comes to "traditional" filesystems, why ext4 and not xfs ? AFAIK the legacy issues associated with xfs are long gone ?
XFS is indeed for me the most stable and performant for postgresql
today. EXT4 was good too, but less performant.
--
E-BLOKOS
Greetings,
* Lucas (root@sud0.nz) wrote:
On 26/10/2021, at 6:13 AM, Stephen Frost <sfrost@snowman.net> wrote:
* Mladen Gogala (gogala.mladen@gmail.com) wrote:
On 10/23/21 23:12, Lucas wrote:
This has proven to work very well for me. I had to restore a few backups
already and it always worked. The bad part is that I need to stop the
database before performing the Snapshot, for data integrity, so that means
that I have a hot-standby server only for these snapshots.
LucasActually, you don't need to stop the database. You need to execute
pg_start_backup() before taking a snapshot and then pg_stop_backup() when
the snapshot is done. You will need to recover the database when you finish
the restore but you will not lose any data. I know that pg_begin_backup()
and pg_stop_backup() are deprecated but since PostgreSQL doesn't have any
API for storage or file system snapshots, that's the only thing that can
help you use storage snapshots as backups. To my knowledge,the only database
that does have API for storage snapshots is DB2. The API is called "Advanced
Copy Services" or ACS. It's documented here:https://www.ibm.com/docs/en/db2/11.1?topic=recovery-db2-advanced-copy-services-acs
For Postgres, the old begin/stop backup functions should be sufficient.
No, it's not- you must also be sure to archive any WAL that's generated
between the pg_start_backup and pg_stop_backup and then to be sure and
add into the snapshot the appropriate signal files or recovery.conf,
depending on PG version, to indicate that you're restoring from a backup
and make sure that the WAL is made available via restore_command.Just doing stat/stop backup is *not* enough and you run the risk of
having an invalid backup or corruption when you restore.If the entire system is on a single volume then you could possibly just
take a snapshot of it (without any start/stop backup stuff) but it's
very risky to do that and then try to do PITR with it because we don't
know where consistency is reached in such a case (we *must* play all the
way through to the end of the WAL which existed at the time of the
snapshot in order to reach consistency).In the end though, really, it's much, much, much better to use a proper
backup and archiving tool that's written specifically for PG than to try
and roll your own, using snapshots or not.When I create a snapshot, the script gets the latest WAL file applied from [1] and adds that information to the Snapshot Tags in AWS. I then use that information in the future when restoring the snapshot. The script will read the tag and it will download 50 WAL Files before that and all the WAL files after that required.
The WAL files are being backed up to S3.I had to restore the database to a PITR state many times, and it always worked very well.
I also create slaves using the snapshot method. So, I don’t mind having to stop/start the Database for the snapshot process, as it’s proven to work fine for the last 5 years.
I have to say that the process used here isn't terribly clear to me (you
cleanly shut down the database ... and also copy the WAL files?), so I
don't really want to comment on if it's actually correct or not because
I can't say one way or the other if it is or isn't.
I do want to again stress that I don't recommend writing your own tools
for doing backup/restore/PITR and I would caution people against people
trying to use this approach you've suggested. Also, being able to tell
when such a process *doesn't* work is non-trivial (look at how long it
took us to discover the issues around fsync..), so saying that it seems
to have worked for a long time for you isn't really enough to make me
feel comfortable with it.
Thanks,
Stephen