Error:could not extend file " with FileFallocate(): No space left on device

Started by Pecsök Jánover 1 year ago11 messagesgeneral
Jump to latest
#1Pecsök Ján
jan.pecsok@profinit.eu

Dear community,

After upgrade of Posgres from version 13.5 to 16.2 we experience following error:
could not extend file "pg_tblspc/16401/PG_16_202307071/17820/3968302971" with FileFallocate(): No space left on device

We cannot easily replicate problem. It happens at randomly every 1-2 weeks of intensive query computation.
Was there some changes in space allocation from Posgres 13.5 to Posgres 16.2?

Database has size 91TB and has 27TB more space available.

#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Pecsök Ján (#1)
Re: Error:could not extend file " with FileFallocate(): No space left on device

Hello Ján,

On 2024-Sep-10, Pecsök Ján wrote:

After upgrade of Posgres from version 13.5 to 16.2 we experience following error:
could not extend file "pg_tblspc/16401/PG_16_202307071/17820/3968302971" with FileFallocate(): No space left on device

We cannot easily replicate problem. It happens at randomly every 1-2 weeks of intensive query computation.
Was there some changes in space allocation from Posgres 13.5 to Posgres 16.2?

Database has size 91TB and has 27TB more space available.

Yes, there were some changes in that area. I have a report from
somebody running EPAS 16 which has a problem that looks pretty much the
same as yours -- and the code is essentially identical. I gave them the
attached patch, hoping that it would shed some light ... but so far,
we've been unable to capture any useful intel.

I'm going to propose this patch for the next set of minors, but that's
in November, so if you're in a hurry and want to risk rebuilding
Postgres and see if you get any better error messages with it than with
the original, here it is. (Note that the patch doesn't change any
behavior, it just report more things when a problem occurs.)

I'm CCing Thomas Munro and Andres Freund, who authored the new code.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"El sudor es la mejor cura para un pensamiento enfermo" (Bardia)

Attachments:

0001-Add-some-debugging-around-mdzeroextend.patchtext/x-diff; charset=utf-8Download+15-5
#3Thomas Munro
thomas.munro@gmail.com
In reply to: Alvaro Herrera (#2)
Re: Error:could not extend file " with FileFallocate(): No space left on device

On Wed, Sep 11, 2024 at 9:56 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2024-Sep-10, Pecsök Ján wrote:

After upgrade of Posgres from version 13.5 to 16.2 we experience following error:
could not extend file "pg_tblspc/16401/PG_16_202307071/17820/3968302971" with FileFallocate(): No space left on device

We cannot easily replicate problem. It happens at randomly every 1-2 weeks of intensive query computation.
Was there some changes in space allocation from Posgres 13.5 to Posgres 16.2?

Database has size 91TB and has 27TB more space available.

Yes, there were some changes in that area. I have a report from
somebody running EPAS 16 which has a problem that looks pretty much the
same as yours -- and the code is essentially identical. I gave them the
attached patch, hoping that it would shed some light ... but so far,
we've been unable to capture any useful intel.

What kernel version and file system are these running on (both cases)?

#4Pecsök Ján
jan.pecsok@profinit.eu
In reply to: Thomas Munro (#3)
RE: Error:could not extend file " with FileFallocate(): No space left on device

In our case:
Kernel: Linux version 4.18.0-513.18.1.el8_9.ppc64le (mockbuild@ppc-hv-13.build.eng.rdu2.redhat.com) (gcc version 8.5.0 20210514 (Red Hat 8.5.0-20) (GCC)) #1 SMP Thu Feb 1 02:52:53 EST 2024
File systém type:xfs

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com>
Sent: Wednesday, September 11, 2024 12:45 PM
To: Alvaro Herrera <alvherre@alvh.no-ip.org>
Cc: Pecsök Ján <jan.pecsok@profinit.eu>; pgsql-general@lists.postgresql.org; Andres Freund <andres@anarazel.de>
Subject: Re: Error:could not extend file " with FileFallocate(): No space left on device

On Wed, Sep 11, 2024 at 9:56 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2024-Sep-10, Pecsök Ján wrote:

After upgrade of Posgres from version 13.5 to 16.2 we experience following error:
could not extend file
"pg_tblspc/16401/PG_16_202307071/17820/3968302971" with
FileFallocate(): No space left on device

We cannot easily replicate problem. It happens at randomly every 1-2 weeks of intensive query computation.
Was there some changes in space allocation from Posgres 13.5 to Posgres 16.2?

Database has size 91TB and has 27TB more space available.

Yes, there were some changes in that area. I have a report from
somebody running EPAS 16 which has a problem that looks pretty much
the same as yours -- and the code is essentially identical. I gave
them the attached patch, hoping that it would shed some light ... but
so far, we've been unable to capture any useful intel.

What kernel version and file system are these running on (both cases)?

#5Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Pecsök Ján (#4)
Re: Error:could not extend file " with FileFallocate(): No space left on device

On 2024-Sep-11, Pecsök Ján wrote:

In our case:
Kernel: Linux version 4.18.0-513.18.1.el8_9.ppc64le (mockbuild@ppc-hv-13.build.eng.rdu2.redhat.com) (gcc version 8.5.0 20210514 (Red Hat 8.5.0-20) (GCC)) #1 SMP Thu Feb 1 02:52:53 EST 2024
File systém type:xfs

Can you please share the output of xfs_info for the filesystem(s) used?

Apparently, it's possible for allocation groups to be suboptimally laid
out in a way that leads to ENOSPC with space still available.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"Pensar que el espectro que vemos es ilusorio no lo despoja de espanto,
sólo le suma el nuevo terror de la locura" (Perelandra, C.S. Lewis)

#6Pecsök Ján
jan.pecsok@profinit.eu
In reply to: Alvaro Herrera (#5)
RE: Error:could not extend file " with FileFallocate(): No space left on device

Output of xfs_info:
[]# xfs_info /data/aisgamp1/pgdata_system
meta-data=/dev/mapper/dataamp1vg-lv_aisgamp1_pgsys isize=512 agcount=118, agsize=134217720 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=0
= reflink=1 bigtime=0 inobtcount=0
data = bsize=8192 blocks=15703474176, imaxpct=1
= sunit=8 swidth=32 blks
naming =version 2 bsize=8192 ascii-ci=0, ftype=1
log =internal log bsize=8192 blocks=260864, version=2
= sectsz=512 sunit=8 blks, lazy-count=1
realtime =none extsz=8192 blocks=0, rtextents=0

It is also interesting, that there are over 1 milion files in ll /data/aisgamp1/pgdata_system/aisgamp1/PG_16_202307071/17820/

# ll /data/aisgamp1/pgdata_system/aisgamp1/PG_16_202307071/17820/ | wc -l
1129340

df -h /data/aisgamp1/pgdata_system/aisgamp1/PG_16_202307071/17820 /data/aisgamp1/pgdata_system/temp/PG_16_202307071/17820
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/dataamp1vg-lv_aisgamp1_pgsys 117T 91T 27T 78% /data/aisgamp1/pgdata_system
/dev/mapper/dataamp1vg-lv_aisgamp1_pgsys 117T 91T 27T 78% /data/aisgamp1/pgdata_system

-----Original Message-----
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Sent: Wednesday, September 11, 2024 2:39 PM
To: Pecsök Ján <jan.pecsok@profinit.eu>
Cc: Thomas Munro <thomas.munro@gmail.com>; pgsql-general@lists.postgresql.org; Andres Freund <andres@anarazel.de>
Subject: Re: Error:could not extend file " with FileFallocate(): No space left on device

On 2024-Sep-11, Pecsök Ján wrote:

In our case:
Kernel: Linux version 4.18.0-513.18.1.el8_9.ppc64le
(mockbuild@ppc-hv-13.build.eng.rdu2.redhat.com) (gcc version 8.5.0
20210514 (Red Hat 8.5.0-20) (GCC)) #1 SMP Thu Feb 1 02:52:53 EST 2024
File systém type:xfs

Can you please share the output of xfs_info for the filesystem(s) used?

Apparently, it's possible for allocation groups to be suboptimally laid out in a way that leads to ENOSPC with space still available.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"Pensar que el espectro que vemos es ilusorio no lo despoja de espanto, sólo le suma el nuevo terror de la locura" (Perelandra, C.S. Lewis)

#7Thomas Munro
thomas.munro@gmail.com
In reply to: Alvaro Herrera (#5)
Re: Error:could not extend file " with FileFallocate(): No space left on device

On Thu, Sep 12, 2024 at 12:39 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2024-Sep-11, Pecsök Ján wrote:
In our case:
Kernel: Linux version 4.18.0-513.18.1.el8_9.ppc64le (mockbuild@ppc-hv-13.build.eng.rdu2.redhat.com) (gcc version 8.5.0 20210514 (Red Hat 8.5.0-20) (GCC)) #1 SMP Thu Feb 1 02:52:53 EST 2024
File systém type:xfs

Can you please share the output of xfs_info for the filesystem(s) used?

Apparently, it's possible for allocation groups to be suboptimally laid
out in a way that leads to ENOSPC with space still available.

Hmm, I have no clues about that, though I do remember reports of
spurious ENOSPC errors from xfs many years ago on some other database
I was around maybe in the era of that kernel or a bit older.

Actually I was already wondering if we need to add a tunable to
control that the heuristic that redirects to posix_fallocate():

/messages/by-id/CAMazQQfp+3f8tD_Q23rCR=O+Jj4jouSRVigbD8OmrTOfHV+8gA@mail.gmail.com

There's no confirmation that writing zeros would be a useful
workaround here, though. Two things changed in 16: the fallocate()
path was invented, but also we started extending by more than one
block at a time, which might take the pwritev() path or the
fallocate() path, for bulk insertion via COPY. That btrfs user would
prefer pwritev() always IIRC, but if some version of xfs is alergic to
this pattern I don't know if it's the size or the system call that's
triggering it...

Is COPY used here?

And just for curiosity (I don't see any particular connection or what
to do about it either way in the short term), are we talking about
really big tables with lots of 1GB files named N.1, N.2, N.3, ...
files, or millions of smaller tables? I kinda wonder if xfs (and any
file system really) would really prefer us to use large files instead
(patches exist for this), and when many-terabyte clusters start
working with huge numbers of segments, we reach fun new kinds of
internal resource exhaustion, or something like that....

. o O { I particularly dislike our habit of synthesising fake ENOSPC
errors in a few code paths... grumble }

#8Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#7)
Re: Error:could not extend file " with FileFallocate(): No space left on device

I don't understand what ENOSPC has to do with the file descriptor
limits, but this person reported:

# touch test
touch: cannot touch ‘test’: No space left on device

https://serverfault.com/questions/746032/rsync-and-scp-failing-with-no-space-left-on-xfs-device

... with plenty of free space, and it went away with ulimit -Hn and
-Sn changes. Huh? Could this have failed in FileAcces() when trying
to re-open a vfd?

#9Pecsök Ján
jan.pecsok@profinit.eu
In reply to: Thomas Munro (#7)
RE: Error:could not extend file " with FileFallocate(): No space left on device

In link you provided there is mention, that in PostgreSQL 16 data is not being
compressed for PostgreSQL 16 server. Does it mean, that PosgreSQL 16 use much more space while computing queries?
If that is the case, it can be our problem, because our queries use sometimes several TB of disk space for computation and if there is considerable increase in disk usage during the queries, it can happen, that sometimes 27TB is not enough.

I have 2 questions,

Is there any workaround, that Posgres wont use FileFallocate? Maybe set something in Linux not to allow Posgres to use it?
The change was introduced in Posgres 16, does it mean, that Posgres 15.8 should have old behaviour?

We dont use COPY in our queries.

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com>
Sent: Wednesday, September 11, 2024 11:37 PM
To: Alvaro Herrera <alvherre@alvh.no-ip.org>
Cc: Pecsök Ján <jan.pecsok@profinit.eu>; pgsql-general@lists.postgresql.org; Andres Freund <andres@anarazel.de>
Subject: Re: Error:could not extend file " with FileFallocate(): No space left on device

On Thu, Sep 12, 2024 at 12:39 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2024-Sep-11, Pecsök Ján wrote:
In our case:
Kernel: Linux version 4.18.0-513.18.1.el8_9.ppc64le
(mockbuild@ppc-hv-13.build.eng.rdu2.redhat.com) (gcc version 8.5.0
20210514 (Red Hat 8.5.0-20) (GCC)) #1 SMP Thu Feb 1 02:52:53 EST
2024 File systém type:xfs

Can you please share the output of xfs_info for the filesystem(s) used?

Apparently, it's possible for allocation groups to be suboptimally
laid out in a way that leads to ENOSPC with space still available.

Hmm, I have no clues about that, though I do remember reports of spurious ENOSPC errors from xfs many years ago on some other database I was around maybe in the era of that kernel or a bit older.

Actually I was already wondering if we need to add a tunable to control that the heuristic that redirects to posix_fallocate():

/messages/by-id/CAMazQQfp+3f8tD_Q23rCR=O+Jj4jouSRVigbD8OmrTOfHV+8gA@mail.gmail.com

There's no confirmation that writing zeros would be a useful workaround here, though. Two things changed in 16: the fallocate() path was invented, but also we started extending by more than one block at a time, which might take the pwritev() path or the
fallocate() path, for bulk insertion via COPY. That btrfs user would prefer pwritev() always IIRC, but if some version of xfs is alergic to this pattern I don't know if it's the size or the system call that's triggering it...

Is COPY used here?

And just for curiosity (I don't see any particular connection or what to do about it either way in the short term), are we talking about really big tables with lots of 1GB files named N.1, N.2, N.3, ...
files, or millions of smaller tables? I kinda wonder if xfs (and any file system really) would really prefer us to use large files instead (patches exist for this), and when many-terabyte clusters start working with huge numbers of segments, we reach fun new kinds of internal resource exhaustion, or something like that....

. o O { I particularly dislike our habit of synthesising fake ENOSPC errors in a few code paths... grumble }

#10Thomas Munro
thomas.munro@gmail.com
In reply to: Pecsök Ján (#9)
Re: Error:could not extend file " with FileFallocate(): No space left on device

On Thu, Sep 12, 2024 at 8:54 PM Pecsök Ján <jan.pecsok@profinit.eu> wrote:

In link you provided there is mention, that in PostgreSQL 16 data is not being
compressed for PostgreSQL 16 server. Does it mean, that PosgreSQL 16 use much more space while computing queries?
If that is the case, it can be our problem, because our queries use sometimes several TB of disk space for computation and if there is considerable increase in disk usage during the queries, it can happen, that sometimes 27TB is not enough.

The kind of compression discussed there is a btrfs feature. Xfs
doesn't have compression.

I have 2 questions,

Is there any workaround, that Posgres wont use FileFallocate? Maybe set something in Linux not to allow Posgres to use it?

Not currently. I was thinking of proposing to introduce a setting and
back-patching it into 16, because it's a sort of regression for btrfs
users (and a hard one to foresee). It is not at all clear what
exactly is happening on this xfs system, but something else...

The change was introduced in Posgres 16, does it mean, that Posgres 15.8 should have old behaviour?

Yes.

We dont use COPY in our queries.

OK so it's presumably due to having lots of concurrent DML operations
(most likely INSERT, could also be UPDATE) that need to extend the
relation. I'm not sure of the exact behaviour of the heuristics
off the top of my head (but basically it's driven by waitcount[1]https://github.com/postgres/postgres/commit/00d1e02be24987180115e371abaeb84738257ae2)...
perhaps if you had only 7 concurrent DML operations and not 8+, it
would be less likely to take the fallocate path, something like
that... That "8" is the threshold I was thinking of turning into a
GUC, perhaps in the November minor release, but it's not exactly clear
that posix_fallocate() is really the problem. (I see that there have
been bugs in xfs's posix_fallocate() space accounting, but the one
that I found was about redundant posix_fallocate() over a region that
is already allocated, which PostgreSQL doesn't do.)

However it is far from clear what is actually going wrong here.
Although it seems to imply a pretty weird/bogus use of ENOSPC by the
kernel, that link I posted seems to be hinting that something a bit
different is going on. It may be clutching at straws, but you might
try increasing those ulimits. I'm not sure how to try to reproduce it
in lab conditions since it's apparently pretty hard to hit, based on
your 1-2 week MTBF on what sounds like a massive and busy system.
Hmm...

[1]: https://github.com/postgres/postgres/commit/00d1e02be24987180115e371abaeb84738257ae2

#11Aleksandr Fedorov
a.fedorov@postgrespro.ru
In reply to: Pecsök Ján (#1)
Re: Error:could not extend file " with FileFallocate(): No space left on device

Dear community,

Based on the analysis of logs collected from several incidents under OEL
8.10 / 9.3, the most likely cause is local exhaustion of free space in
an allocation group in the XFS filesystem.

Further investigation revealed that a similar issue is documented in the
Red Hat knowledge base (https://access.redhat.com/solutions/7129010),
describing ENOSPC errors from the fallocate() function in XFS
filesystems during PostgreSQL backup operations.
Red Hat references the commit
https://github.com/torvalds/linux/commit/6773da870ab89123d1b513da63ed59e32a29cb77
and
believes that this kernel fix may address the PostgreSQL issue.

After analyzing the change set from this commit, we identified the
following combination of conditions that can trigger the ENOSPC error:
1. Presence of delayed allocations (committed but not yet written to disk).
2. Insufficient free space in the allocation group to cover all pending
delayed allocations.

Subsequent search of the PostgreSQL community knowledge base led to the
message
/messages/by-id/50A117B6.5030300@optionshouse.com.

Important points to highlight from this message:
1. Since kernel versions 2.6.x, XFS has implemented dynamic speculative
preallocation.
2. The term "dynamic" means the preallocation size is regulated by
internal heuristics.
3. These heuristics are based on file access patterns and history.
4. Additional space allocated during preallocation is intended to
prevent file fragmentation.
5. When a file extends, its data is written into extents that may be
distributed across one or more allocation groups.
6. Delayed allocation writes allow merging multiple allocations into
preallocated space before writing to disk, reducing the number of
extents and thus file fragmentation.
7. The logic for tracking additional space retains it as long as there
are in-memory references to the file — for example, in an actively
running PostgreSQL database.
8. The XFS filesystem itself considers this space as used.
9. The actual file size may exceed the 1GB limit (not to be confused
with apparent size).

This is confirmed by information collected using the `du -h` command,
which shows "actual" file sizes and helps to detect files larger than
1GB at the time of command execution (some even up to 2GB but we know
that maximum size is 1GB).
There may have been more such files, but after the replica crash, file
descriptors were released, causing the "actual" size to return to normal.

The dynamic allocator can be disabled by specifying the `allocsize`
mount option when mounting the XFS filesystem.

We would like to share additional observations to help resolve the issue.

We were able to reproduce the original problem in two ways: directly on
a PostgreSQL replica, and using a C program.

The first method is a test script (please see the attached
README_test_pg.md) that uses the mount option `allocsize=$(1*1024*1024)`
when mounting the disk where PGDATA is located.
The pgbench_accounts table is generated using the pgbench tool, and
multiple copies of this table are created and populated in parallel.
During the process of filling these small tables (each table is no
larger than 25 MB upon script completion), numerous delayed
preallocation events occur, consuming free disk space.
The subsequent parallel INSERT statements then cause replica crashes
because there is no contiguous free space left on the disk to extend the
file of the large table.

Here an example of availabled free space in mounted points after replica
is crashed with ENOSPC error ( pgdata_main is related to primary server
and pgdata_repl is related to replica ):
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/loop0     xfs   4.0G  4.0G   74M  99% /pgdata_main
/dev/loop3     xfs   4.0G  3.8G  280M  94% /pgdata_repl

You may observe that when the issue is reproduced and the replica
crashes, the available disk space on the replica side appears larger
than on the primary side.
However, the ENOSPC error in the logs indicates that disk space was
exhausted — and this is indeed accurate: after the crash, all file
descriptors were released, and the space previously preallocate files
was reclaimed by the filesystem. Monitoring of files size using "du -h"
right before the moment of crash and some time ago after that is showing
that files sizes are decrease from 26 Mb to 25 Mb.

The issue does not occur when using the minimum possible value for the 
allocsize parameter, which is set to allocsize=$(4*1024).
Testing various values of allocsize under a specific workload on 
PostgreSQL with synchronous physical replication shows:
+----------------------+----------------------+---------------------------------------------------------------------+
| allocsize setting    | Thread model         | Result                   
                                 |
+----------------------+----------------------+---------------------------------------------------------------------+
| 1M                   | single thread        | No issues observed      
                                             |
+----------------------+----------------------+---------------------------------------------------------------------+
| 1M                   | multiple threads     | Replica failed: "could 
not extend file ... No space left on device" |
+----------------------+----------------------+---------------------------------------------------------------------+
| 1GB                  | multiple threads     | Primary failed: "could 
not extend file ... No space left on device" |
+----------------------+----------------------+---------------------------------------------------------------------+
| 4KB                  | multiple threads     | No failure occurred      
                                            |
+----------------------+----------------------+---------------------------------------------------------------------+

Another method is C program ( please find README_test_c.md ) which
reproduces the ENOSPC error on kernel version
5.15.0-101.103.2.1.el9uek.x86_64.
The program first attempts to write 748 KB to a file and then allocate
an additional 16 KB using posix_fallocate().
If posix_fallocate() fails, it displays a corresponding message and
retries the operation.
The second attempt succeeds, indicating that space was available.

However, the program does not fully reproduce the potential PostgreSQL
scenario, key differences are:
1. The program uses a single process with a single thread, whereas real
systems involve one process with multiple threads or multiple processes
operating on files.
2. The program uses a fixed buffer size for the mounted filesystem's
journal, whereas in production environments the buffer size is dynamic
(allocated based on historical space usage, i.e., workload-dependent).
3. The issue does not occur when there are multiple allocation groups
that are completely empty.

In our practice, we identified two viable approaches:

1. As a permanent solution: Upgrade the UEK kernel.
   Note that the fix has not been backported to all UEK versions:
   - It is not present in UEK7 (5.15.x).
   - It is present in UEK8 (6.12.x, available starting with OL 9.5)
from kernel version 6.12.0-0.20.20 onwards.
2. As a temporary solution: Use the allocsize parameter to disable
dynamic speculative preallocation.
   However, since this does not fix the root cause, failures may still
occur.

Show quoted text

On 9/10/24 17:11, Pecsök Ján wrote:

Dear community,

After upgrade of Posgres from version 13.5 to 16.2 we experience
following error:

could not extend file
"pg_tblspc/16401/PG_16_202307071/17820/3968302971" with
FileFallocate(): No space left on device

We cannot easily replicate problem. It happens at randomly every 1-2
weeks of intensive query computation.

Was there some changes in space allocation from Posgres 13.5  to
Posgres  16.2?

Database has  size 91TB and has 27TB more space available.

Attachments:

README_test_pg.mdtext/markdown; charset=UTF-8; name=README_test_pg.mdDownload
README_test_с.mdtext/markdown; charset=UTF-8; name="=?UTF-8?B?UkVBRE1FX3Rlc3Rf0YEubWQ=?="Download