Orphan files filling root partition after crash

Started by Dimitrios Apostolouabout 2 years ago10 messagesgeneral

Jump to latest

Dimitrios Apostolou

jimis@gmx.net

about 2 years ago

Hello list,

yesterday I was doing:

ALTER TABLE x ADD UNIQUE ... , ADD PRIMARY KEY ...;

The table is almost a billion rows long but lies in its own TABLESPACE
that has plenty of space. But apparently the ALTER TABLE command is
writing a lot to the default tablespace (not the temp_tablespace, that is
already moved to a different partition).

That quickly filled up the 50GB free space in my root partition:

20:18:04.222 UTC [94144] PANIC: could not write to file "pg_wal/xlogtemp.94144": No space left on device
[...]
20:19:11.578 UTC [94140] LOG: WAL writer process (PID 94144) was terminated by signal 6: Aborted
20:19:11.578 UTC [94140] LOG: terminating any other active server processes

After postgresql crashed and restarted, the disk space in the root
partition was still not freed! I believe this is because of "orphaned
files" as discussed in mailing list thread [1]/messages/by-id/CAN-RpxDBA7HbTsJPq4t4VznmRFJkssP2SNEMuG=oNJ+=sxLQew@mail.gmail.com.

[1]: /messages/by-id/CAN-RpxDBA7HbTsJPq4t4VznmRFJkssP2SNEMuG=oNJ+=sxLQew@mail.gmail.com

I ended up doing some risky actions to remediate the problem: Find the
filenames that have no identically named "oid" in pg_class, and delete
(move to backup) the biggest ones while the database is stopped.
Fortunately the database started up fine after that!

So what is the moral of the story? How to guard against this?

Why did the database write so much to the default tablespace, even when
both the table and the temp tablespace are elsewhere? Also should one
always keep the default tablespace away from the wal partition? (I guess
it would have helped to avoid the crash, but the ALTER TABLE command would
have still run out of space, so I'm not sure if the orphan files would
have been avoided)?

Needless to say, I would have hoped the database cleaned-up after itself
even after an uncontrolled crash, or that it provided some tools for the
job. (I tried VACUUM FULL on the table, but the orphaned files did not go
away).

My postgresql version is 16.2 installed on Ubuntu.

Thank you,
Dimitris

Sergey Fukanchik

fukanchik@gmail.com

about 2 years ago

In reply to: Dimitrios Apostolou (#1)

Re: Orphan files filling root partition after crash

Hi Dimitrios,
Do you have wal archiving enabled?
$PGDATA/pg_wal/ is a bit different from tablespaces (including
"default" one). It stores transaction journal.
Instructions are here:
https://www.postgresql.org/docs/current/continuous-archiving.html#BACKUP-ARCHIVING-WAL
Some more info here:
https://www.postgresql.org/docs/current/wal-intro.html and here
https://www.postgresql.org/docs/current/wal-configuration.html
---
Sergey

On Wed, 28 Feb 2024 at 14:18, Dimitrios Apostolou <jimis@gmx.net> wrote:

Hello list,

yesterday I was doing:

ALTER TABLE x ADD UNIQUE ... , ADD PRIMARY KEY ...;

The table is almost a billion rows long but lies in its own TABLESPACE
that has plenty of space. But apparently the ALTER TABLE command is
writing a lot to the default tablespace (not the temp_tablespace, that is
already moved to a different partition).

That quickly filled up the 50GB free space in my root partition:

20:18:04.222 UTC [94144] PANIC: could not write to file "pg_wal/xlogtemp.94144": No space left on device
[...]
20:19:11.578 UTC [94140] LOG: WAL writer process (PID 94144) was terminated by signal 6: Aborted
20:19:11.578 UTC [94140] LOG: terminating any other active server processes

After postgresql crashed and restarted, the disk space in the root
partition was still not freed! I believe this is because of "orphaned
files" as discussed in mailing list thread [1].

[1] /messages/by-id/CAN-RpxDBA7HbTsJPq4t4VznmRFJkssP2SNEMuG=oNJ+=sxLQew@mail.gmail.com

I ended up doing some risky actions to remediate the problem: Find the
filenames that have no identically named "oid" in pg_class, and delete
(move to backup) the biggest ones while the database is stopped.
Fortunately the database started up fine after that!

So what is the moral of the story? How to guard against this?

Why did the database write so much to the default tablespace, even when
both the table and the temp tablespace are elsewhere? Also should one
always keep the default tablespace away from the wal partition? (I guess
it would have helped to avoid the crash, but the ALTER TABLE command would
have still run out of space, so I'm not sure if the orphan files would
have been avoided)?

Needless to say, I would have hoped the database cleaned-up after itself
even after an uncontrolled crash, or that it provided some tools for the
job. (I tried VACUUM FULL on the table, but the orphaned files did not go
away).

My postgresql version is 16.2 installed on Ubuntu.

Thank you,
Dimitris

--
Sergey

Dimitrios Apostolou

jimis@gmx.net

about 2 years ago

In reply to: Sergey Fukanchik (#2)

Re: Orphan files filling root partition after crash

Hi Sergey,

On Wed, 28 Feb 2024, Sergey Fukanchik wrote:

Hi Dimitrios,
Do you have wal archiving enabled?
$PGDATA/pg_wal/ is a bit different from tablespaces (including
"default" one). It stores transaction journal.

I don't think I have WAL archiving enabled. Here are the relevant WAL
options in my config:

wal_compression = zstd
max_wal_size = 8GB
min_wal_size = 1GB
track_wal_io_timing = on
wal_writer_flush_after = 8MB

The issue happens because the WAL directory is by default in the same
filesystem with the default tablespace (root partition for Ubuntu). So
when the default tablespace filled up because of my ALTER TABLE operation,
there was no space for WAL either.

Thanks,
Dimitris

Laurenz Albe

laurenz.albe@cybertec.at

about 2 years ago

In reply to: Dimitrios Apostolou (#1)

Re: Orphan files filling root partition after crash

On Wed, 2024-02-28 at 12:16 +0100, Dimitrios Apostolou wrote:

yesterday I was doing:

ALTER TABLE x ADD UNIQUE ... , ADD PRIMARY KEY ...;

The table is almost a billion rows long but lies in its own TABLESPACE
that has plenty of space. But apparently the ALTER TABLE command is
writing a lot to the default tablespace (not the temp_tablespace, that is
already moved to a different partition).

That quickly filled up the 50GB free space in my root partition:

20:18:04.222 UTC [94144] PANIC: could not write to file "pg_wal/xlogtemp.94144": No space left on device
[...]
20:19:11.578 UTC [94140] LOG: WAL writer process (PID 94144) was terminated by signal 6: Aborted
20:19:11.578 UTC [94140] LOG: terminating any other active server processes

After postgresql crashed and restarted, the disk space in the root
partition was still not freed! I believe this is because of "orphaned
files" as discussed in mailing list thread [1].

[1] /messages/by-id/CAN-RpxDBA7HbTsJPq4t4VznmRFJkssP2SNEMuG=oNJ+=sxLQew@mail.gmail.com

I ended up doing some risky actions to remediate the problem: Find the
filenames that have no identically named "oid" in pg_class, and delete
(move to backup) the biggest ones while the database is stopped.
Fortunately the database started up fine after that!

Lucky you. It should have been "relfilenode" rather than "oid",
and some catalog tables don't have their files listed in the catalog,
because they are needed *before* the database can access tables.

So what is the moral of the story? How to guard against this?

Monitor disk usage ...

The root of the problem is that you created the index in the default
tablespace. You should have

ALTER TABLE x ADD UNIQUE ... USING INDEX TABLESPACE bigtblspc;

Needless to say, I would have hoped the database cleaned-up after itself
even after an uncontrolled crash, or that it provided some tools for the
job. (I tried VACUUM FULL on the table, but the orphaned files did not go
away).

That is not so simple... Also, it would slow down crash recovery.

But I agree that it would be nice to have a tool that reports or
cleans up orphaned files.

Yours,
Laurenz Albe

Dimitrios Apostolou

jimis@gmx.net

about 2 years ago

In reply to: Laurenz Albe (#4)

Re: Orphan files filling root partition after crash

Thanks for the feedback Laurenz,

On Wed, 28 Feb 2024, Laurenz Albe wrote:

On Wed, 2024-02-28 at 12:16 +0100, Dimitrios Apostolou wrote:

I ended up doing some risky actions to remediate the problem: Find the
filenames that have no identically named "oid" in pg_class, and delete
(move to backup) the biggest ones while the database is stopped.
Fortunately the database started up fine after that!

Lucky you. It should have been "relfilenode" rather than "oid",
and some catalog tables don't have their files listed in the catalog,
because they are needed *before* the database can access tables.

I actually double checked that the filenames don't appear anywhere in
SELECT * FROM pg_class
and that the files were multi-GB in size including all the
1GB-pieces. But luck was definitely a part of the equation, I didn't know
that the files might be accessed before tables (at db startup?) or that
"relfilenode" would be more appropriate. Why is that, where can I read
more? I see that many (but not all) rows in pg_class have oid=relfilenode
but for many rows relfilenode=0 which is meaningless as filename.

So what is the moral of the story? How to guard against this?

Monitor disk usage ...

It happened *fast*. And it was quite a big suprise coming
from "just" a disk-full situation.

A couple of suggestions; wouldn't it make sense:

+ for the index to be written by default to the table's tablespace?

+ for postgres to refuse to write non-wal files, if it's on
the same device as the WAL and less than max_wal_size bytes are free?

The root of the problem is that you created the index in the default
tablespace. You should have

ALTER TABLE x ADD UNIQUE ... USING INDEX TABLESPACE bigtblspc;

Thank you, was reading the docs but didn't realize this
syntax is valid. I thought it was only for CREATE/ALTER INDEX.

Show quoted text

Needless to say, I would have hoped the database cleaned-up after itself
even after an uncontrolled crash, or that it provided some tools for the
job. (I tried VACUUM FULL on the table, but the orphaned files did not go
away).

That is not so simple... Also, it would slow down crash recovery.

But I agree that it would be nice to have a tool that reports or
cleans up orphaned files.

Yours,
Laurenz Albe

Adrian Klaver

adrian.klaver@aklaver.com

about 2 years ago

In reply to: Dimitrios Apostolou (#5)

Re: Orphan files filling root partition after crash

On 2/28/24 11:30, Dimitrios Apostolou wrote:

Thanks for the feedback Laurenz,

On Wed, 28 Feb 2024, Laurenz Albe wrote:

On Wed, 2024-02-28 at 12:16 +0100, Dimitrios Apostolou wrote:

I ended up doing some risky actions to remediate the problem: Find the
filenames that have no identically named "oid" in pg_class, and delete
(move to backup) the biggest ones while the database is stopped.
Fortunately the database started up fine after that!

Lucky you. It should have been "relfilenode" rather than "oid",
and some catalog tables don't have their files listed in the catalog,
because they are needed *before* the database can access tables.

I actually double checked that the filenames don't appear anywhere in
SELECT * FROM pg_class
and that the files were multi-GB in size including all the
1GB-pieces. But luck was definitely a part of the equation, I didn't know
that the files might be accessed before tables (at db startup?) or that
"relfilenode" would be more appropriate. Why is that, where can I read

https://www.postgresql.org/docs/current/storage-file-layout.html

Caution

Note that while a table's filenode often matches its OID, this is not
necessarily the case; some operations, like TRUNCATE, REINDEX, CLUSTER
and some forms of ALTER TABLE, can change the filenode while preserving
the OID. Avoid assuming that filenode and table OID are the same. Also,
for certain system catalogs including pg_class itself,
pg_class.relfilenode contains zero. The actual filenode number of these
catalogs is stored in a lower-level data structure, and can be obtained
using the pg_relation_filenode() function.

--
Adrian Klaver
adrian.klaver@aklaver.com

Laurenz Albe

laurenz.albe@cybertec.at

about 2 years ago

In reply to: Dimitrios Apostolou (#5)

Re: Orphan files filling root partition after crash

On Wed, 2024-02-28 at 20:30 +0100, Dimitrios Apostolou wrote:

Lucky you. It should have been "relfilenode" rather than "oid",

and some catalog tables don't have their files listed in the catalog,
because they are needed *before* the database can access tables.

I actually double checked that the filenames don't appear anywhere in
SELECT * FROM pg_class
and that the files were multi-GB in size including all the
1GB-pieces. But luck was definitely a part of the equation, I didn't know
that the files might be accessed before tables (at db startup?) or that
"relfilenode" would be more appropriate. Why is that, where can I read
more? I see that many (but not all) rows in pg_class have oid=relfilenode
but for many rows relfilenode=0 which is meaningless as filename.

If you are curious about such things, start reading the source.
The object ID is immutable, and initially the filenode is the save,
but it changes whenever the table is rewritten (TRUNCATE, ALTER TABLE,
VACUUM (FULL), ...).

Yours,
Laurenz Albe

Peter J. Holzer

hjp-pgsql@hjp.at

about 2 years ago

In reply to: Dimitrios Apostolou (#5)

Re: Orphan files filling root partition after crash

On 2024-02-28 20:30:35 +0100, Dimitrios Apostolou wrote:

On Wed, 28 Feb 2024, Laurenz Albe wrote:

On Wed, 2024-02-28 at 12:16 +0100, Dimitrios Apostolou wrote:

So what is the moral of the story? How to guard against this?

Monitor disk usage ...

It happened *fast*. And it was quite a big suprise coming
from "just" a disk-full situation.

Been there.

To prevent this in the future I wrote a small script to monitor disk
space (on multiple hosts and multiple file systems) every few seconds
and invoke another script (which just terminates all database
connections - a bit drastic but effective) if free space runs low:
https://github.com/hjp/platzangst

Laurenz Albe

laurenz.albe@cybertec.at

about 2 years ago

In reply to: Peter J. Holzer (#8)

Re: Orphan files filling root partition after crash

On Mon, 2024-03-04 at 00:04 +0100, Peter J. Holzer wrote:

On 2024-02-28 20:30:35 +0100, Dimitrios Apostolou wrote:

On Wed, 28 Feb 2024, Laurenz Albe wrote:

On Wed, 2024-02-28 at 12:16 +0100, Dimitrios Apostolou wrote:

So what is the moral of the story? How to guard against this?

Monitor disk usage ...

It happened *fast*. And it was quite a big suprise coming
from "just" a disk-full situation.

Been there.

To prevent this in the future I wrote a small script to monitor disk
space (on multiple hosts and multiple file systems) every few seconds
and invoke another script (which just terminates all database
connections - a bit drastic but effective) if free space runs low:
https://github.com/hjp/platzangst

And I maintain that "temp_file_limit" is the best solution.

Yours,
Laurenz Albe

#10

赵宇鹏(宇彭)

zhaoyupeng.zyp@alibaba-inc.com

about 2 years ago

In reply to: Dimitrios Apostolou (#1)

回复：Orphan files filling root partition after crash

Hello, it looks like I've run into the same issue as you. I exhausted the disk
space while executing DDL operations, and then after crash recovery, I found
there were orphaned files.
I believe the reason is that due to the lack of space, some of the WAL logs were
not persisted, such as the abort-type WAL logs. During the WAL replay phase,
the absence of abort-type WAL logs resulted in missing the corresponding unlink
operations.
I can replicate a similar scenario. For example, 16394 is an orphaned file that
was generated by the DDL but was not rolled back.
1. CREATE TABLE test(id int);
2. INSERT INTO test SELECT generate_series(1, 100000000);
3. ALTER TABLE test ALTER COLUMN id TYPE bigint;
sudo chmod 000 pg_wal
4. sudo chmod 777 pg_wal
pg_waldump xxx
5. pg_ctl -D /data start
6. ll -h /data/base/5
1.0G Mar 26 11:47 16387.1
1.0G Mar 26 11:47 16387.2
385M Mar 26 13:38 16387.3
888K Mar 26 13:38 16387_fsm
112K Mar 26 11:51 16387_vm
1.0G Mar 26 13:47 16394
179M Mar 26 13:47 16394.1
320K Mar 26 13:47 16394_fsm
postgres=# select pg_size_pretty(pg_database_size('postgres'));
pg_size_pretty
----------------
4668 MB
(1 row)
postgres=# select pg_size_pretty(pg_total_relation_size('test'));
pg_size_pretty
----------------
3458 MB
(1 row)
postgres=# select count(*) from pg_class where pg_relation_filenode(oid) = 16394;
count
-------
0
(1 row)
------------------------------------------------------------------
发件人：Dimitrios Apostolou <jimis@gmx.net>
发送时间：2024年3月26日(星期二) 11:15
收件人："pgsql-general"<pgsql-general@lists.postgresql.org>
主　题：Orphan files filling root partition after crash
Hello list,
yesterday I was doing:
ALTER TABLE x ADD UNIQUE ... , ADD PRIMARY KEY ...;
The table is almost a billion rows long but lies in its own TABLESPACE
that has plenty of space. But apparently the ALTER TABLE command is
writing a lot to the default tablespace (not the temp_tablespace, that is
already moved to a different partition).
That quickly filled up the 50GB free space in my root partition:
20:18:04.222 UTC [94144] PANIC: could not write to file "pg_wal/xlogtemp.94144": No space left on device
[...]
20:19:11.578 UTC [94140] LOG: WAL writer process (PID 94144) was terminated by signal 6: Aborted
20:19:11.578 UTC [94140] LOG: terminating any other active server processes
After postgresql crashed and restarted, the disk space in the root
partition was still not freed! I believe this is because of "orphaned
files" as discussed in mailing list thread [1]/messages/by-id/CAN-RpxDBA7HbTsJPq4t4VznmRFJkssP2SNEMuG=oNJ+=sxLQew@mail.gmail.com </messages/by-id/CAN-RpxDBA7HbTsJPq4t4VznmRFJkssP2SNEMuG=oNJ+=sxLQew@mail.gmail.com > I ended up doing some risky actions to remediate the problem: Find the filenames that have no identically named "oid" in pg_class, and delete (move to backup) the biggest ones while the database is stopped. Fortunately the database started up fine after that! So what is the moral of the story? How to guard against this? Why did the database write so much to the default tablespace, even when both the table and the temp tablespace are elsewhere? Also should one always keep the default tablespace away from the wal partition? (I guess it would have helped to avoid the crash, but the ALTER TABLE command would have still run out of space, so I'm not sure if the orphan files would have been avoided)? Needless to say, I would have hoped the database cleaned-up after itself even after an uncontrolled crash, or that it provided some tools for the job. (I tried VACUUM FULL on the table, but the orphaned files did not go away). My postgresql version is 16.2 installed on Ubuntu. Thank you, Dimitris.
[1]: /messages/by-id/CAN-RpxDBA7HbTsJPq4t4VznmRFJkssP2SNEMuG=oNJ+=sxLQew@mail.gmail.com </messages/by-id/CAN-RpxDBA7HbTsJPq4t4VznmRFJkssP2SNEMuG=oNJ+=sxLQew@mail.gmail.com > I ended up doing some risky actions to remediate the problem: Find the filenames that have no identically named "oid" in pg_class, and delete (move to backup) the biggest ones while the database is stopped. Fortunately the database started up fine after that! So what is the moral of the story? How to guard against this? Why did the database write so much to the default tablespace, even when both the table and the temp tablespace are elsewhere? Also should one always keep the default tablespace away from the wal partition? (I guess it would have helped to avoid the crash, but the ALTER TABLE command would have still run out of space, so I'm not sure if the orphan files would have been avoided)? Needless to say, I would have hoped the database cleaned-up after itself even after an uncontrolled crash, or that it provided some tools for the job. (I tried VACUUM FULL on the table, but the orphaned files did not go away). My postgresql version is 16.2 installed on Ubuntu. Thank you, Dimitris
I ended up doing some risky actions to remediate the problem: Find the
filenames that have no identically named "oid" in pg_class, and delete
(move to backup) the biggest ones while the database is stopped.
Fortunately the database started up fine after that!
So what is the moral of the story? How to guard against this?
Why did the database write so much to the default tablespace, even when
both the table and the temp tablespace are elsewhere? Also should one
always keep the default tablespace away from the wal partition? (I guess
it would have helped to avoid the crash, but the ALTER TABLE command would
have still run out of space, so I'm not sure if the orphan files would
have been avoided)?
Needless to say, I would have hoped the database cleaned-up after itself
even after an uncontrolled crash, or that it provided some tools for the
job. (I tried VACUUM FULL on the table, but the orphaned files did not go
away).
My postgresql version is 16.2 installed on Ubuntu.
Thank you,
Dimitris