Unused files in the database directory after crashed VACUUM FULL

Started by Hannes Ervenabout 7 years ago6 messagesgeneral
Jump to latest
#1Hannes Erven
hannes@erven.at

Hi,

I've just had a "VACUUM FULL <table>" crash due to 100% disk usage.
Clearly my fault, I was expecting the new table to be small enough.

After freeing up space, restarting the cluster and issuing another
VACCUM FULL, I noticed that the cluster was way bigger that it should be.
In the base/<db>/ folder, there was a large number of files with one
certain number that pg_filenode_relation() could not turn into a
relation. As that number was just a bit smaller that the
pg_relation_filepath() of the table I was working on, I guess these were
the remains of the failed VACUUM FULL operation?
I removed those files and a VACCUM ANALYZE of the whole database went fine.

So... is this the expected behaviour? Is there a safe procedure how to
check/clean up "unnecessary" files in the cluster directory?

Version:
('PostgreSQL 10.3 (Debian 10.3-1.pgdg90+1) on x86_64-pc-linux-gnu,
compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit')

Table definition:
(
id serial PRIMARY KEY,
data bytea,
parent_id integer NOT NULL
)
The "data" column values are large enough to be TOASTed.

Thanks & best regards,

-hannes

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hannes Erven (#1)
Re: Unused files in the database directory after crashed VACUUM FULL

Hannes Erven <hannes@erven.at> writes:

I've just had a "VACUUM FULL <table>" crash due to 100% disk usage.
Clearly my fault, I was expecting the new table to be small enough.

What do you mean by "crash" exactly? A normal transactional failure
should've cleaned up orphaned files. I suppose if the kernel decided to
kill -9 the vacuum process, that wouldn't happen --- but that's not
the usual response to out-of-disk-space.

After freeing up space, restarting the cluster and issuing another
VACCUM FULL, I noticed that the cluster was way bigger that it should be.
In the base/<db>/ folder, there was a large number of files with one
certain number that pg_filenode_relation() could not turn into a
relation. As that number was just a bit smaller that the
pg_relation_filepath() of the table I was working on, I guess these were
the remains of the failed VACUUM FULL operation?
I removed those files and a VACCUM ANALYZE of the whole database went fine.

So... is this the expected behaviour? Is there a safe procedure how to
check/clean up "unnecessary" files in the cluster directory?

You just described it --- verify with pg_filenode_relation that the
file doesn't correspond to any pg_class entry, then manually delete.

(There is a crash-recovery phase that does something similar, but
I think it's only designed to clean up temporary files.)

regards, tom lane

#3Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Tom Lane (#2)
Re: Unused files in the database directory after crashed VACUUM FULL

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

Hannes Erven <hannes@erven.at> writes:

I've just had a "VACUUM FULL <table>" crash due to 100% disk usage.
Clearly my fault, I was expecting the new table to be small enough.

Tom> What do you mean by "crash" exactly? A normal transactional
Tom> failure should've cleaned up orphaned files. I suppose if the
Tom> kernel decided to kill -9 the vacuum process, that wouldn't happen
Tom> --- but that's not the usual response to out-of-disk-space.

The failure mode of this kind we see reported most often is getting a
PANIC failure from inability to extend the WAL.

--
Andrew (irc:RhodiumToad)

#4Hannes Erven
hannes@erven.at
In reply to: Tom Lane (#2)
Re: Unused files in the database directory after crashed VACUUM FULL

Hi again,

Am 10.02.19 um 16:41 schrieb Tom Lane:

Hannes Erven <hannes@erven.at> writes:

I've just had a "VACUUM FULL <table>" crash due to 100% disk usage.
Clearly my fault, I was expecting the new table to be small enough.

What do you mean by "crash" exactly? A normal transactional failure
should've cleaned up orphaned files. I suppose if the kernel decided to
kill -9 the vacuum process, that wouldn't happen --- but that's not
the usual response to out-of-disk-space.

Here's the exact log (the pgadmin process was running the VACCUM FULL):

2019-02-09 23:44:53.516 CET [20341] @/ <> PANIC: could not write to
file "pg_wal/xlogtemp.20341": No space left on device
2019-02-09 23:44:53.516 CET [19181] @/ <> LOG: WAL writer process (PID
20341) was terminated by signal 6: Aborted
2019-02-09 23:44:53.516 CET [19181] @/ <> LOG: terminating any other
active server processes
2019-02-09 23:44:53.516 CET [20816] postgres@IP/db <pgAdmin 4 - DB:db>
WARNING: terminating connection because of crash of another serve
r process
2019-02-09 23:44:53.516 CET [20816] postgres@IP/db <pgAdmin 4 - DB:db>
DETAIL: The postmaster has commanded this server process to roll
back the current transaction and exit, because another server process
exited abnormally and possibly corrupted shared memory.
2019-02-09 23:44:53.516 CET [20816] postgres@IP/db <pgAdmin 4 - DB:db>
HINT: In a moment you should be able to reconnect to the database
and repeat your command.
2019-02-09 23:44:53.597 CET [19181] @/ <> LOG: all server processes
terminated; reinitializing
2019-02-09 23:44:53.679 CET [11686] @/ <> LOG: database system was
interrupted; last known up at 2019-02-09 23:44:40 CET
2019-02-09 23:44:58.761 CET [11686] @/ <> LOG: database system was not
properly shut down; automatic recovery in progress
2019-02-09 23:44:58.807 CET [11686] @/ <> LOG: redo starts at 1DA/B3400798
2019-02-09 23:44:58.811 CET [11686] @/ <> FATAL: could not extend file
"base/16400/612716_fsm": No space left on device
2019-02-09 23:44:58.811 CET [11686] @/ <> HINT: Check free disk space.
2019-02-09 23:44:58.811 CET [11686] @/ <> CONTEXT: WAL redo at
1DA/B3406310 for Heap/INSERT: off 4

After freeing space and restarting:

2019-02-10 06:28:25.622 CET [14344] @/ <> LOG: database system was
interrupted while in recovery at 2019-02-09 23:44:58 CET
2019-02-10 06:28:25.622 CET [14344] @/ <> HINT: This probably means
that some data is corrupted and you will have to use the last backup for
recovery.
2019-02-10 06:28:26.762 CET [14344] @/ <> LOG: database system was not
properly shut down; automatic recovery in progress
2019-02-10 06:28:26.831 CET [14344] @/ <> LOG: redo starts at 1DA/B3400798
2019-02-10 06:28:27.624 CET [14344] @/ <> LOG: redo done at 1DA/D5FFF9D8

Is there a safe procedure how to
check/clean up "unnecessary" files in the cluster directory?

You just described it --- verify with pg_filenode_relation that the
file doesn't correspond to any pg_class entry, then manually delete.

OK, I see. I came up with this:

with d0 as (
select oid from pg_database where datname=current_database()
), d1 as (
select pg_ls_dir as fn,
regexp_match(pg_ls_dir, '^([0-9]+)(.*)$') as match
from d0, pg_ls_dir('base/'||d0.oid)
order by 1
),
d2 as (
select fn, match[1] as base, match[2] as ext
from d1
where (fn NOT ILIKE 'pg_%')
),
d3 as (
select d.*, pg_filenode_relation(0, base::oid) as relation
from d2 d
)

select fn, pg_size_pretty((pg_stat_file('base/'||d0.oid||'/'||fn)).size)
from d0, d3
where relation is null;

fn | pg_size_pretty
------------+----------------
612713 | 25 MB
612718 | 1024 MB
612718.1 | 1024 MB
612718.2 | 180 MB
t11_621351 | 96 kB
t11_621354 | 0 bytes
t11_621356 | 8192 bytes
t11_621357 | 160 kB
t11_621360 | 0 bytes
t11_621362 | 8192 bytes

So there's still 612713 and 612718* left to clean up; judging from the
numbers and timestamps, they probably belong to other VACUUM FULLs I
manually canceled in the pgadmin UI during my experiments.

Is this interesting enough to further investigate? If yes, I'll be happy
to work on a test case.
In any case, I've now learned (how) to check for orphaned files if a
database cluster's on-disk size is way off from the sums of
pg_total_relation_size() + WAL...

Thanks again & best regards,

-hannes

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hannes Erven (#4)
Re: Unused files in the database directory after crashed VACUUM FULL

Hannes Erven <hannes@erven.at> writes:

Am 10.02.19 um 16:41 schrieb Tom Lane:

What do you mean by "crash" exactly?

Here's the exact log (the pgadmin process was running the VACCUM FULL):
2019-02-09 23:44:53.516 CET [20341] @/ <> PANIC: could not write to
file "pg_wal/xlogtemp.20341": No space left on device

Ah, so Andrew was correct: we panicked due to lack of WAL space, and
that explains why the vacuuming process didn't have an opportunity
to delete the files belonging to the uncommitted new relation.

If you're concerned about this sort of thing, you might consider putting
the WAL directory on a separate filesystem from the main database files.

Is there a safe procedure how to
check/clean up "unnecessary" files in the cluster directory?

You just described it --- verify with pg_filenode_relation that the
file doesn't correspond to any pg_class entry, then manually delete.

OK, I see. I came up with this:

I think this isn't accounting for files in non-default
tablespaces, but maybe that's not a problem for your usage.

Is this interesting enough to further investigate?

It's a pretty well-understood dynamic, I believe. Perhaps we should
try harder to recover cleanly, but I don't know of anyone putting
effort into the case.

regards, tom lane

#6Thomas Munro
thomas.munro@gmail.com
In reply to: Tom Lane (#5)
Re: Unused files in the database directory after crashed VACUUM FULL

On Mon, Feb 11, 2019 at 10:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Ah, so Andrew was correct: we panicked due to lack of WAL space, and
that explains why the vacuuming process didn't have an opportunity
to delete the files belonging to the uncommitted new relation.

It's a pretty well-understood dynamic, I believe. Perhaps we should
try harder to recover cleanly, but I don't know of anyone putting
effort into the case.

FTR I am working on a PG13 patch that records relfilenodes of
uncommitted transactions in undo logs, so it can unlink them reliably,
even if you crash (at the cost of introducing a WAL flush before
creating files). I haven't specifically studied the VACUUM FULL case
yet, but in principle this is exactly what my project aims to fix.
It's mostly intended as example code to demonstrate the undo log
machinery (defining undo record types, registering undo log action
functions that are invoked during rollback, rollback of aborted but
not yet rolled back transaction at startup, ...) without having to
understand the whole zheap sandwich at once, but it's also a solution
to an age old problem. More on that soon.

--
Thomas Munro
http://www.enterprisedb.com