ZFS filesystem - supported ?

Started by Laura Smithover 4 years ago37 messagesgeneral
Jump to latest
#1Laura Smith
n5d9xq3ti233xiyif2vp@protonmail.ch

Hi,

Given an upcoming server upgrade, I'm contemplating moving away from XFS to ZFS (specifically the ZoL flavour via Debian 11). BTRFS seems to be falling away (e.g. with Redhat deprecating it etc.), hence my preference for ZFS.

However, somewhere in the back of my mind I seem to have a recollection of reading about what could be described as a "strong encouragement" to stick with more traditional options such as ext4 or xfs.

A brief search of the docs for "xfs" didn't come up with anything, hence the question here.

Thanks !

Laura

#2Mladen Gogala
gogala.mladen@gmail.com
In reply to: Laura Smith (#1)
Re: ZFS filesystem - supported ?

On 10/23/21 07:29, Laura Smith wrote:

Hi,

Given an upcoming server upgrade, I'm contemplating moving away from XFS to ZFS (specifically the ZoL flavour via Debian 11). BTRFS seems to be falling away (e.g. with Redhat deprecating it etc.), hence my preference for ZFS.

However, somewhere in the back of my mind I seem to have a recollection of reading about what could be described as a "strong encouragement" to stick with more traditional options such as ext4 or xfs.

A brief search of the docs for "xfs" didn't come up with anything, hence the question here.

Thanks !

Laura

Hi Laura,

May I ask why would you like to change file systems? Probably because of
the snapshot capability? However, ZFS performance leaves much to be
desired. Please see the following article:

https://www.phoronix.com/scan.php?page=article&item=ubuntu1910-ext4-zfs&num=1

This is relatively new, from 2019. On the page 3 there are tests with
SQLite, Cassandra and RocksDB. Ext4 is much faster in all of them.
Finally, there is another article about relational databases and ZFS:

https://blog.docbert.org/oracle-on-zfs/

In other words, I would test very thoroughly because your performance is
likely to suffer. As for the supported part, that's not a problem.
Postgres supports all modern file systems. It uses Posix system calls to
manipulate, read and write files. Furthermore, if you need snapshots,
disk arrays like NetApp, Hitachi or EMC can always provide that.

Regards

--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com

#3Laura Smith
n5d9xq3ti233xiyif2vp@protonmail.ch
In reply to: Mladen Gogala (#2)
Re: ZFS filesystem - supported ?

On Saturday, October 23rd, 2021 at 14:03, Mladen Gogala <gogala.mladen@gmail.com> wrote:

On 10/23/21 07:29, Laura Smith wrote:

Hi,

Given an upcoming server upgrade, I'm contemplating moving away from XFS to ZFS (specifically the ZoL flavour via Debian 11). BTRFS seems to be falling away (e.g. with Redhat deprecating it etc.), hence my preference for ZFS.

However, somewhere in the back of my mind I seem to have a recollection of reading about what could be described as a "strong encouragement" to stick with more traditional options such as ext4 or xfs.

A brief search of the docs for "xfs" didn't come up with anything, hence the question here.

Thanks !

Laura

Hi Laura,

May I ask why would you like to change file systems? Probably because of

the snapshot capability? However, ZFS performance leaves much to be

desired. Please see the following article:

https://www.phoronix.com/scan.php?page=article&amp;item=ubuntu1910-ext4-zfs&amp;num=1

This is relatively new, from 2019. On the page 3 there are tests with

SQLite, Cassandra and RocksDB. Ext4 is much faster in all of them.

Finally, there is another article about relational databases and ZFS:

https://blog.docbert.org/oracle-on-zfs/

In other words, I would test very thoroughly because your performance is

likely to suffer. As for the supported part, that's not a problem.

Postgres supports all modern file systems. It uses Posix system calls to

manipulate, read and write files. Furthermore, if you need snapshots,

disk arrays like NetApp, Hitachi or EMC can always provide that.

Regards

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Mladen Gogala

Database Consultant

Tel: (347) 321-1217

https://dbwhisperer.wordpress.com

Hi Mladen,

Yes indeed, snapshots is the primary reason, closely followed by zfssend/receive.

I'm no stranger to using LVM snapshots with ext4/xfs but it requires a custom shell script to manage the whole process around backups. I feel the whole thing could well be a lot cleaner with zfs.

Thank you for the links, I will take a look.

Laura

#4Mladen Gogala
gogala.mladen@gmail.com
In reply to: Laura Smith (#3)
Re: ZFS filesystem - supported ?

On 10/23/21 09:37, Laura Smith wrote:

Hi Mladen,

Yes indeed, snapshots is the primary reason, closely followed by zfssend/receive.

I'm no stranger to using LVM snapshots with ext4/xfs but it requires a custom shell script to manage the whole process around backups. I feel the whole thing could well be a lot cleaner with zfs.

Thank you for the links, I will take a look.

Laura

Yes, ZFS is extremely convenient. It's a volume manager and a file
system, all rolled into one, with some additiional convenient tools.
However, performance is a major concern. If your application is OLTP,
ZFS might be a tad too slow for your performance requirements. On the
other hand, snapshots can save you  a lot of time with backups,
especially if you have some commercial backup capable of multiple
readers. If your application is OLTP, ZFS might be a tad too slow for
your performance requirements. The only way to find out is to test. The
ideal tool for testing is pgio:

https://kevinclosson.net/2019/09/21/announcing-pgio-the-slob-method-for-postgresql-is-released-under-apache-2-0-and-available-at-github/

For those who do not know, Kevin Closson was the technical architect who
has built both Exadata and EMC XTRemIO. He is now the principal engineer
of the Amazon RDS. This part is intended only for those who would tell
him that "Oracle has it is not good enough" if he ever decided to post here.

--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com

#5Lucas
root@sud0.nz
In reply to: Mladen Gogala (#4)
Re: ZFS filesystem - supported ?

On 2021-10-24 06:48, Mladen Gogala wrote:

On 10/23/21 09:37, Laura Smith wrote:

Hi Mladen,

Yes indeed, snapshots is the primary reason, closely followed by
zfssend/receive.

I'm no stranger to using LVM snapshots with ext4/xfs but it requires a
custom shell script to manage the whole process around backups. I
feel the whole thing could well be a lot cleaner with zfs.

Thank you for the links, I will take a look.

Laura

Yes, ZFS is extremely convenient. It's a volume manager and a file
system, all rolled into one, with some additiional convenient tools.
However, performance is a major concern. If your application is OLTP,
ZFS might be a tad too slow for your performance requirements. On the
other hand, snapshots can save you a lot of time with backups,
especially if you have some commercial backup capable of multiple
readers. If your application is OLTP, ZFS might be a tad too slow for
your performance requirements. The only way to find out is to test. The
ideal tool for testing is pgio:

https://kevinclosson.net/2019/09/21/announcing-pgio-the-slob-method-for-postgresql-is-released-under-apache-2-0-and-available-at-github/

For those who do not know, Kevin Closson was the technical architect
who has built both Exadata and EMC XTRemIO. He is now the principal
engineer of the Amazon RDS. This part is intended only for those who
would tell him that "Oracle has it is not good enough" if he ever
decided to post here.

Interesting subject... I'm working on a migration from PG 9.2 to PG 14
and was wondering which File System should I use. Looking at this
thread, looks like I should keep using ext4.

I don't know where you have your database deployed, but in my case is in
AWS EC2 instances. The way I handle backups is at the block storage
level, performing EBS snapshots.

This has proven to work very well for me. I had to restore a few backups
already and it always worked. The bad part is that I need to stop the
database before performing the Snapshot, for data integrity, so that
means that I have a hot-standby server only for these snapshots.

Lucas

#6Mladen Gogala
gogala.mladen@gmail.com
In reply to: Lucas (#5)
Re: ZFS filesystem - supported ?

On 10/23/21 23:12, Lucas wrote:

I don't know where you have your database deployed, but in my case is
in AWS EC2 instances. The way I handle backups is at the block storage
level, performing EBS snapshots.

Yes, Amazon uses SAN equipment that supports snapshots.

This has proven to work very well for me. I had to restore a few
backups already and it always worked. The bad part is that I need to
stop the database before performing the Snapshot, for data integrity,
so that means that I have a hot-standby server only for these snapshots.
Lucas

Actually, you don't need to stop the database. You need to execute
pg_start_backup() before taking a snapshot and then pg_stop_backup()
when the snapshot is done. You will need to recover the database when
you finish the restore but you will not lose any data. I know that
pg_begin_backup() and pg_stop_backup() are deprecated but since
PostgreSQL doesn't have any API for storage or file system snapshots,
that's the only thing that can help you use storage snapshots as
backups. To my knowledge,the only database that does have API for
storage snapshots is DB2. The API is called "Advanced Copy Services" or
ACS. It's documented here:

https://www.ibm.com/docs/en/db2/11.1?topic=recovery-db2-advanced-copy-services-acs

For Postgres, the old begin/stop backup functions should be sufficient.

--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com

#7Laura Smith
n5d9xq3ti233xiyif2vp@protonmail.ch
In reply to: Mladen Gogala (#4)
Re: ZFS filesystem - supported ?

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Saturday, October 23rd, 2021 at 18:48, Mladen Gogala <gogala.mladen@gmail.com> wrote:

On 10/23/21 09:37, Laura Smith wrote:

Hi Mladen,

Yes indeed, snapshots is the primary reason, closely followed by zfssend/receive.

I'm no stranger to using LVM snapshots with ext4/xfs but it requires a custom shell script to manage the whole process around backups. I feel the whole thing could well be a lot cleaner with zfs.

Thank you for the links, I will take a look.

Laura

Yes, ZFS is extremely convenient. It's a volume manager and a file

system, all rolled into one, with some additiional convenient tools.

However, performance is a major concern. If your application is OLTP,

ZFS might be a tad too slow for your performance requirements. On the

other hand, snapshots can save you  a lot of time with backups,

especially if you have some commercial backup capable of multiple

readers. If your application is OLTP, ZFS might be a tad too slow for

your performance requirements. The only way to find out is to test. The

ideal tool for testing is pgio:

https://kevinclosson.net/2019/09/21/announcing-pgio-the-slob-method-for-postgresql-is-released-under-apache-2-0-and-available-at-github/

For those who do not know, Kevin Closson was the technical architect who

has built both Exadata and EMC XTRemIO. He is now the principal engineer

of the Amazon RDS. This part is intended only for those who would tell

him that "Oracle has it is not good enough" if he ever decided to post here.

Thank you Mladen for your very useful food for thought.

I think my plan going forward will be to stick to the old XFS+LVM setup and (maybe) when I have some more time on my hands fire up a secondary instance with ZFS and do some experimentation with pgio.

Thanks again !

#8Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Laura Smith (#1)
Re: ZFS filesystem - supported ?

On Sat, 2021-10-23 at 11:29 +0000, Laura Smith wrote:

Given an upcoming server upgrade, I'm contemplating moving away from XFS to ZFS
(specifically the ZoL flavour via Debian 11).
BTRFS seems to be falling away (e.g. with Redhat deprecating it etc.), hence my preference for ZFS.

However, somewhere in the back of my mind I seem to have a recollection of reading
about what could be described as a "strong encouragement" to stick with more traditional options such as ext4 or xfs.

ZFS is probably reliable, so you can use it with PostgreSQL.

However, I have seen reports of performance tests that were not favorable for ZFS.
So you should test if the performance is good enough for your use case.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

#9Stephen Frost
sfrost@snowman.net
In reply to: Mladen Gogala (#6)
Re: ZFS filesystem - supported ?

Greetings,

* Mladen Gogala (gogala.mladen@gmail.com) wrote:

On 10/23/21 23:12, Lucas wrote:

This has proven to work very well for me. I had to restore a few backups
already and it always worked. The bad part is that I need to stop the
database before performing the Snapshot, for data integrity, so that means
that I have a hot-standby server only for these snapshots.
Lucas

Actually, you don't need to stop the database. You need to execute
pg_start_backup() before taking a snapshot and then pg_stop_backup() when
the snapshot is done. You will need to recover the database when you finish
the restore but you will not lose any data. I know that pg_begin_backup()
and pg_stop_backup() are deprecated but since PostgreSQL doesn't have any
API for storage or file system snapshots, that's the only thing that can
help you use storage snapshots as backups. To my knowledge,the only database
that does have API for storage snapshots is DB2. The API is called "Advanced
Copy Services" or ACS. It's documented here:

https://www.ibm.com/docs/en/db2/11.1?topic=recovery-db2-advanced-copy-services-acs

For Postgres, the old begin/stop backup functions should be sufficient.

No, it's not- you must also be sure to archive any WAL that's generated
between the pg_start_backup and pg_stop_backup and then to be sure and
add into the snapshot the appropriate signal files or recovery.conf,
depending on PG version, to indicate that you're restoring from a backup
and make sure that the WAL is made available via restore_command.

Just doing stat/stop backup is *not* enough and you run the risk of
having an invalid backup or corruption when you restore.

If the entire system is on a single volume then you could possibly just
take a snapshot of it (without any start/stop backup stuff) but it's
very risky to do that and then try to do PITR with it because we don't
know where consistency is reached in such a case (we *must* play all the
way through to the end of the WAL which existed at the time of the
snapshot in order to reach consistency).

In the end though, really, it's much, much, much better to use a proper
backup and archiving tool that's written specifically for PG than to try
and roll your own, using snapshots or not.

Thanks,

Stephen

#10Chris Travers
chris.travers@gmail.com
In reply to: Laurenz Albe (#8)
Re: ZFS filesystem - supported ?

On Mon, Oct 25, 2021 at 10:18 AM Laurenz Albe <laurenz.albe@cybertec.at>
wrote:

On Sat, 2021-10-23 at 11:29 +0000, Laura Smith wrote:

Given an upcoming server upgrade, I'm contemplating moving away from XFS

to ZFS

(specifically the ZoL flavour via Debian 11).
BTRFS seems to be falling away (e.g. with Redhat deprecating it etc.),

hence my preference for ZFS.

However, somewhere in the back of my mind I seem to have a recollection

of reading

about what could be described as a "strong encouragement" to stick with

more traditional options such as ext4 or xfs.

ZFS is probably reliable, so you can use it with PostgreSQL.

However, I have seen reports of performance tests that were not favorable
for ZFS.
So you should test if the performance is good enough for your use case.

It very much depends on lots of factors.

On the whole ZFS on spinning disks is going to have some performance...
rough corners..... And it is a lot harder to reason about a lot of things
including capacity and performance when you are doing copy on write on both
the db and FS level, and have compression in the picture. And there are
other areas of complexity, such as how you handle partial page writes.

On the whole I think for small dbs it might perform well enough. On large
or high velocity dbs I think you will have more problems than expected.

Having worked with PostgreSQL on ZFS I wouldn't generally recommend it as a
general tool.

Best Wishes,
Chris Travers

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

--
Best Wishes,
Chris Travers

Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor
lock-in.
http://www.efficito.com/learn_more

#11Mladen Gogala
gogala.mladen@gmail.com
In reply to: Stephen Frost (#9)
Re: ZFS filesystem - supported ?

On 10/25/21 13:13, Stephen Frost wrote:

No, it's not- you must also be sure to archive any WAL that's generated
between the pg_start_backup and pg_stop_backup and then to be sure and
add into the snapshot the appropriate signal files or recovery.conf,
depending on PG version, to indicate that you're restoring from a backup
and make sure that the WAL is made available via restore_command.

Just doing stat/stop backup is*not* enough and you run the risk of
having an invalid backup or corruption when you restore.

If the entire system is on a single volume then you could possibly just
take a snapshot of it (without any start/stop backup stuff) but it's
very risky to do that and then try to do PITR with it because we don't
know where consistency is reached in such a case (we*must* play all the
way through to the end of the WAL which existed at the time of the
snapshot in order to reach consistency).

In the end though, really, it's much, much, much better to use a proper
backup and archiving tool that's written specifically for PG than to try
and roll your own, using snapshots or not.

Thanks,

Stephen

Stephen, thank you for correcting me. You, of course, are right. I have
erroneously thought that backup of WAL logs is implied because I always
back that up. And yes, that needs to be made clear.

Regards

--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com

#12Support
admin@e-blokos.com
In reply to: Stephen Frost (#9)
Re: ZFS filesystem - supported ?

On 10/25/2021 10:13 AM, Stephen Frost wrote:

Greetings,

* Mladen Gogala (gogala.mladen@gmail.com) wrote:

On 10/23/21 23:12, Lucas wrote:

This has proven to work very well for me. I had to restore a few backups
already and it always worked. The bad part is that I need to stop the
database before performing the Snapshot, for data integrity, so that means
that I have a hot-standby server only for these snapshots.
Lucas

Actually, you don't need to stop the database. You need to execute
pg_start_backup() before taking a snapshot and then pg_stop_backup() when
the snapshot is done. You will need to recover the database when you finish
the restore but you will not lose any data. I know that pg_begin_backup()
and pg_stop_backup() are deprecated but since PostgreSQL doesn't have any
API for storage or file system snapshots, that's the only thing that can
help you use storage snapshots as backups. To my knowledge,the only database
that does have API for storage snapshots is DB2. The API is called "Advanced
Copy Services" or ACS. It's documented here:

https://www.ibm.com/docs/en/db2/11.1?topic=recovery-db2-advanced-copy-services-acs

For Postgres, the old begin/stop backup functions should be sufficient.

No, it's not- you must also be sure to archive any WAL that's generated
between the pg_start_backup and pg_stop_backup and then to be sure and
add into the snapshot the appropriate signal files or recovery.conf,
depending on PG version, to indicate that you're restoring from a backup
and make sure that the WAL is made available via restore_command.

Just doing stat/stop backup is *not* enough and you run the risk of
having an invalid backup or corruption when you restore.

If the entire system is on a single volume then you could possibly just
take a snapshot of it (without any start/stop backup stuff) but it's
very risky to do that and then try to do PITR with it because we don't
know where consistency is reached in such a case (we *must* play all the
way through to the end of the WAL which existed at the time of the
snapshot in order to reach consistency).

In the end though, really, it's much, much, much better to use a proper
backup and archiving tool that's written specifically for PG than to try
and roll your own, using snapshots or not.

Thanks,

Stephen

what about BTRFS since it's the successor of ZFS?

--
E-BLOKOS

#13Lucas
root@sud0.nz
In reply to: Stephen Frost (#9)
Re: ZFS filesystem - supported ?

On 26/10/2021, at 6:13 AM, Stephen Frost <sfrost@snowman.net> wrote:

Greetings,

* Mladen Gogala (gogala.mladen@gmail.com) wrote:

On 10/23/21 23:12, Lucas wrote:

This has proven to work very well for me. I had to restore a few backups
already and it always worked. The bad part is that I need to stop the
database before performing the Snapshot, for data integrity, so that means
that I have a hot-standby server only for these snapshots.
Lucas

Actually, you don't need to stop the database. You need to execute
pg_start_backup() before taking a snapshot and then pg_stop_backup() when
the snapshot is done. You will need to recover the database when you finish
the restore but you will not lose any data. I know that pg_begin_backup()
and pg_stop_backup() are deprecated but since PostgreSQL doesn't have any
API for storage or file system snapshots, that's the only thing that can
help you use storage snapshots as backups. To my knowledge,the only database
that does have API for storage snapshots is DB2. The API is called "Advanced
Copy Services" or ACS. It's documented here:

https://www.ibm.com/docs/en/db2/11.1?topic=recovery-db2-advanced-copy-services-acs

For Postgres, the old begin/stop backup functions should be sufficient.

No, it's not- you must also be sure to archive any WAL that's generated
between the pg_start_backup and pg_stop_backup and then to be sure and
add into the snapshot the appropriate signal files or recovery.conf,
depending on PG version, to indicate that you're restoring from a backup
and make sure that the WAL is made available via restore_command.

Just doing stat/stop backup is *not* enough and you run the risk of
having an invalid backup or corruption when you restore.

If the entire system is on a single volume then you could possibly just
take a snapshot of it (without any start/stop backup stuff) but it's
very risky to do that and then try to do PITR with it because we don't
know where consistency is reached in such a case (we *must* play all the
way through to the end of the WAL which existed at the time of the
snapshot in order to reach consistency).

In the end though, really, it's much, much, much better to use a proper
backup and archiving tool that's written specifically for PG than to try
and roll your own, using snapshots or not.

Thanks,

Stephen

When I create a snapshot, the script gets the latest WAL file applied from [1] and adds that information to the Snapshot Tags in AWS. I then use that information in the future when restoring the snapshot. The script will read the tag and it will download 50 WAL Files before that and all the WAL files after that required.
The WAL files are being backed up to S3.

I had to restore the database to a PITR state many times, and it always worked very well.

I also create slaves using the snapshot method. So, I don’t mind having to stop/start the Database for the snapshot process, as it’s proven to work fine for the last 5 years.

Lucas

#14Mladen Gogala
gogala.mladen@gmail.com
In reply to: Support (#12)
Re: ZFS filesystem - supported ?

On 10/25/21 15:43, E-BLOKOS wrote:

what about BTRFS since it's the successor of ZFS?

BTRFS is NOT the successor to ZFS. It never was. It was completely new
file system developed by Oracle Corp. For some reason, Oracle seems to
have lost interest in it. Red Hat has deprecated and, in all likelihood,
BTRFS will go the way of Solaris and SPARC chips: ride into the glorious
history of the computer science. However, BTRFS has never been widely
used, not even among Fedora users like me. BTRFS was suffering from
problems with corruption and performance. This is probably not the place
to discuss the inner workings of snapshots, but it is worth knowing that
snapshots drastically increase the IO rate on the file system - for
every snapshot. That's where the slowness comes from.

--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com

#15Robert L Mathews
lists@tigertech.com
In reply to: Mladen Gogala (#14)
Re: ZFS filesystem - supported ?

On 10/25/21 1:40 PM, Mladen Gogala wrote:

This is probably not the place
to discuss the inner workings of snapshots, but it is worth knowing that
snapshots drastically increase the IO rate on the file system - for
every snapshot. That's where the slowness comes from.

I have recent anecdotal experience of this. I experiment with using
Btrfs for a 32 TB backup system that has five 8 TB spinning disks.
There's an average of 8 MBps of writes scattered around the disks, which
isn't super high, obviously.

The results were vaguely acceptable until I created a snapshot of it, at
which point it became completely unusable. Even having one snapshot
present caused hundreds of btrfs-related kernel threads to thrash in the
"D" state almost constantly, and it never stopped doing that even when
left for many hours.

I then experimented with adding a bcache layer on top of Btrfs to see if
it would help. I added a 2 TB SSD using bcache, partitioned as 1900 GB
read cache and 100 GB write cache. It made very little difference and
was still unusable as soon as a snapshot was taken.

I did play with the various btrfs and bcache tuning knobs quite a bit
and couldn't improve it.

Since that test was a failure, I then decided to try the same setup with
OpenZFS on a lark, with the same set of disks in a "raidz" array, with
the 2 TB SSD as an l2arc read cache (no write cache). It easily handles
the same load, even with 72 hourly snapshots present, with the default
settings. I'm actually quite impressed with it.

I'm sure that the RAID, snapshots and copy-on-write reduce the maximum
performance considerably, compared to ext4. But on the other hand, it
did provide the performance I expected to be possible given the setup.
Btrfs *definitely* didn't; I was surprised at how badly it performed.

--
Robert L Mathews, Tiger Technologies, http://www.tigertech.net/

#16Benedict Holland
benedict.m.holland@gmail.com
In reply to: Robert L Mathews (#15)
Re: ZFS filesystem - supported ?

In my opinion, ext4 will solve any and all problems without a very deep
understanding of file system architecture. In short, i would stick with
ext4 unless you have a good reason not to. Maybe there is one. I have done
this a long time and never thought twice about which file system should
support my servers.

On Mon, Oct 25, 2021, 6:01 PM Robert L Mathews <lists@tigertech.com> wrote:

Show quoted text

On 10/25/21 1:40 PM, Mladen Gogala wrote:

This is probably not the place
to discuss the inner workings of snapshots, but it is worth knowing that
snapshots drastically increase the IO rate on the file system - for
every snapshot. That's where the slowness comes from.

I have recent anecdotal experience of this. I experiment with using
Btrfs for a 32 TB backup system that has five 8 TB spinning disks.
There's an average of 8 MBps of writes scattered around the disks, which
isn't super high, obviously.

The results were vaguely acceptable until I created a snapshot of it, at
which point it became completely unusable. Even having one snapshot
present caused hundreds of btrfs-related kernel threads to thrash in the
"D" state almost constantly, and it never stopped doing that even when
left for many hours.

I then experimented with adding a bcache layer on top of Btrfs to see if
it would help. I added a 2 TB SSD using bcache, partitioned as 1900 GB
read cache and 100 GB write cache. It made very little difference and
was still unusable as soon as a snapshot was taken.

I did play with the various btrfs and bcache tuning knobs quite a bit
and couldn't improve it.

Since that test was a failure, I then decided to try the same setup with
OpenZFS on a lark, with the same set of disks in a "raidz" array, with
the 2 TB SSD as an l2arc read cache (no write cache). It easily handles
the same load, even with 72 hourly snapshots present, with the default
settings. I'm actually quite impressed with it.

I'm sure that the RAID, snapshots and copy-on-write reduce the maximum
performance considerably, compared to ext4. But on the other hand, it
did provide the performance I expected to be possible given the setup.
Btrfs *definitely* didn't; I was surprised at how badly it performed.

--
Robert L Mathews, Tiger Technologies, http://www.tigertech.net/

#17Bob Jolliffe
bobjolliffe@gmail.com
In reply to: Benedict Holland (#16)
Re: ZFS filesystem - supported ?

We have some users of our software who have had a good experience with
postgresql on zfs/zol. Two features which have proved useful are the
native encryption (less fiddly than luks) and compression. Interestingly,
many of our users are stuck with quite old and slow disks. Using
compression (even together with encryption) on the slow disks gives quite a
significant performance boost. Trading cpu for disk bandwidth. Also they
often dont have infinite access to more disk, so the storage efficiency is
welcomed.

We are interested in the snapshots but a little wary of potential data
integrity issues.

We have a disturbance in our database structure (usually nightly) where
large tables are dropped and recreated. snapshots of a gradually
increasing size database probably work very well. I think these massive
deletions probably make the snapshots quite heavy. Also creating a
challenge for incremental backups, replication etc but that is another (not
quite unrelated) issue.

Regards
Bob

On Tue, 26 Oct 2021 at 01:18, Benedict Holland <benedict.m.holland@gmail.com>
wrote:

Show quoted text

In my opinion, ext4 will solve any and all problems without a very deep
understanding of file system architecture. In short, i would stick with
ext4 unless you have a good reason not to. Maybe there is one. I have done
this a long time and never thought twice about which file system should
support my servers.

On Mon, Oct 25, 2021, 6:01 PM Robert L Mathews <lists@tigertech.com>
wrote:

On 10/25/21 1:40 PM, Mladen Gogala wrote:

This is probably not the place
to discuss the inner workings of snapshots, but it is worth knowing

that

snapshots drastically increase the IO rate on the file system - for
every snapshot. That's where the slowness comes from.

I have recent anecdotal experience of this. I experiment with using
Btrfs for a 32 TB backup system that has five 8 TB spinning disks.
There's an average of 8 MBps of writes scattered around the disks, which
isn't super high, obviously.

The results were vaguely acceptable until I created a snapshot of it, at
which point it became completely unusable. Even having one snapshot
present caused hundreds of btrfs-related kernel threads to thrash in the
"D" state almost constantly, and it never stopped doing that even when
left for many hours.

I then experimented with adding a bcache layer on top of Btrfs to see if
it would help. I added a 2 TB SSD using bcache, partitioned as 1900 GB
read cache and 100 GB write cache. It made very little difference and
was still unusable as soon as a snapshot was taken.

I did play with the various btrfs and bcache tuning knobs quite a bit
and couldn't improve it.

Since that test was a failure, I then decided to try the same setup with
OpenZFS on a lark, with the same set of disks in a "raidz" array, with
the 2 TB SSD as an l2arc read cache (no write cache). It easily handles
the same load, even with 72 hourly snapshots present, with the default
settings. I'm actually quite impressed with it.

I'm sure that the RAID, snapshots and copy-on-write reduce the maximum
performance considerably, compared to ext4. But on the other hand, it
did provide the performance I expected to be possible given the setup.
Btrfs *definitely* didn't; I was surprised at how badly it performed.

--
Robert L Mathews, Tiger Technologies, http://www.tigertech.net/

#18Laura Smith
n5d9xq3ti233xiyif2vp@protonmail.ch
In reply to: Benedict Holland (#16)
Re: ZFS filesystem - supported ?

Sent with ProtonMail Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Tuesday, October 26th, 2021 at 01:18, Benedict Holland <benedict.m.holland@gmail.com> wrote:

In my opinion, ext4 will solve any and all problems without a very deep understanding of file system architecture. In short, i would stick with ext4 unless you have a good reason not to. Maybe there is one. I have done this a long time and never thought twice about which file system should support my servers.

Curious, when it comes to "traditional" filesystems, why ext4 and not xfs ? AFAIK the legacy issues associated with xfs are long gone ?

#19Support
admin@e-blokos.com
In reply to: Laura Smith (#18)
Re: ZFS filesystem - supported ?

On 10/26/2021 2:35 AM, Laura Smith wrote:

Sent with ProtonMail Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Tuesday, October 26th, 2021 at 01:18, Benedict Holland <benedict.m.holland@gmail.com> wrote:

In my opinion, ext4 will solve any and all problems without a very deep understanding of file system architecture. In short, i would stick with ext4 unless you have a good reason not to. Maybe there is one. I have done this a long time and never thought twice about which file system should support my servers.

Curious, when it comes to "traditional" filesystems, why ext4 and not xfs ? AFAIK the legacy issues associated with xfs are long gone ?

XFS is indeed for me  the most stable and performant for postgresql
today. EXT4 was good too, but less performant.

--
E-BLOKOS

#20Stephen Frost
sfrost@snowman.net
In reply to: Lucas (#13)
Re: ZFS filesystem - supported ?

Greetings,

* Lucas (root@sud0.nz) wrote:

On 26/10/2021, at 6:13 AM, Stephen Frost <sfrost@snowman.net> wrote:

* Mladen Gogala (gogala.mladen@gmail.com) wrote:

On 10/23/21 23:12, Lucas wrote:

This has proven to work very well for me. I had to restore a few backups
already and it always worked. The bad part is that I need to stop the
database before performing the Snapshot, for data integrity, so that means
that I have a hot-standby server only for these snapshots.
Lucas

Actually, you don't need to stop the database. You need to execute
pg_start_backup() before taking a snapshot and then pg_stop_backup() when
the snapshot is done. You will need to recover the database when you finish
the restore but you will not lose any data. I know that pg_begin_backup()
and pg_stop_backup() are deprecated but since PostgreSQL doesn't have any
API for storage or file system snapshots, that's the only thing that can
help you use storage snapshots as backups. To my knowledge,the only database
that does have API for storage snapshots is DB2. The API is called "Advanced
Copy Services" or ACS. It's documented here:

https://www.ibm.com/docs/en/db2/11.1?topic=recovery-db2-advanced-copy-services-acs

For Postgres, the old begin/stop backup functions should be sufficient.

No, it's not- you must also be sure to archive any WAL that's generated
between the pg_start_backup and pg_stop_backup and then to be sure and
add into the snapshot the appropriate signal files or recovery.conf,
depending on PG version, to indicate that you're restoring from a backup
and make sure that the WAL is made available via restore_command.

Just doing stat/stop backup is *not* enough and you run the risk of
having an invalid backup or corruption when you restore.

If the entire system is on a single volume then you could possibly just
take a snapshot of it (without any start/stop backup stuff) but it's
very risky to do that and then try to do PITR with it because we don't
know where consistency is reached in such a case (we *must* play all the
way through to the end of the WAL which existed at the time of the
snapshot in order to reach consistency).

In the end though, really, it's much, much, much better to use a proper
backup and archiving tool that's written specifically for PG than to try
and roll your own, using snapshots or not.

When I create a snapshot, the script gets the latest WAL file applied from [1] and adds that information to the Snapshot Tags in AWS. I then use that information in the future when restoring the snapshot. The script will read the tag and it will download 50 WAL Files before that and all the WAL files after that required.
The WAL files are being backed up to S3.

I had to restore the database to a PITR state many times, and it always worked very well.

I also create slaves using the snapshot method. So, I don’t mind having to stop/start the Database for the snapshot process, as it’s proven to work fine for the last 5 years.

I have to say that the process used here isn't terribly clear to me (you
cleanly shut down the database ... and also copy the WAL files?), so I
don't really want to comment on if it's actually correct or not because
I can't say one way or the other if it is or isn't.

I do want to again stress that I don't recommend writing your own tools
for doing backup/restore/PITR and I would caution people against people
trying to use this approach you've suggested. Also, being able to tell
when such a process *doesn't* work is non-trivial (look at how long it
took us to discover the issues around fsync..), so saying that it seems
to have worked for a long time for you isn't really enough to make me
feel comfortable with it.

Thanks,

Stephen

#21Bruce Momjian
bruce@momjian.us
In reply to: Chris Travers (#10)
#22Lucas
root@sud0.nz
In reply to: Stephen Frost (#20)
#23Mladen Gogala
gogala.mladen@gmail.com
In reply to: Laura Smith (#18)
#24Support
admin@e-blokos.com
In reply to: Mladen Gogala (#23)
#25Mladen Gogala
gogala.mladen@gmail.com
In reply to: Support (#24)
#26Imre Samu
pella.samu@gmail.com
In reply to: Mladen Gogala (#23)
#27Mladen Gogala
gogala.mladen@gmail.com
In reply to: Imre Samu (#26)
#28Benedict Holland
benedict.m.holland@gmail.com
In reply to: Mladen Gogala (#27)
#29Ron
ronljohnsonjr@gmail.com
In reply to: Mladen Gogala (#27)
#30Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Lucas (#22)
#31Stephen Frost
sfrost@snowman.net
In reply to: Lucas (#22)
#32Ron
ronljohnsonjr@gmail.com
In reply to: Stephen Frost (#31)
#33Lucas
root@sud0.nz
In reply to: Stephen Frost (#31)
#34Stephen Frost
sfrost@snowman.net
In reply to: Lucas (#33)
#35Lucas
root@sud0.nz
In reply to: Stephen Frost (#34)
#36Mladen Gogala
gogala.mladen@gmail.com
In reply to: Stephen Frost (#34)
#37Benedict Holland
benedict.m.holland@gmail.com
In reply to: Mladen Gogala (#36)