Are all unlogged tables in any case truncated after a server-crash?

Started by Nonameover 4 years ago7 messagesgeneral
Jump to latest
#1Noname
sch8el@posteo.de

Hi everyone,

every few weeks I use Postgres ability, to import huge data sets very
fast by means of "unlogged tables". The bulk load (consisting of plenty
"copy"- & DML-Stmts) and the spatial index creation afterwards, takes
about 5 hours on a proper server  (pg12.7 & PostGIS-Extension). After
that all unlogged tables remain completely unchanged (no
DML-/DDL-Statements). Hence all of my huge unlogged, "static" tables get
never "unclean" and should not be truncated after a server crash.

BTW, if I set all unlogged tables to logged after bulk load, it takes
additional 1.5 hours, mainly because of re-indexing, I suppose. I assume
that a restart of the database after a server crash takes another 1.5
hours (reading from WAL) until the database is up and running.

Therefore I am seeking a strategy, to not tagging those tables as
"unclean" and not truncating all unlogged tables on server restart.

Cheers and regards.

#2David G. Johnston
david.g.johnston@gmail.com
In reply to: Noname (#1)
Re: Are all unlogged tables in any case truncated after a server-crash?

On Thu, Nov 11, 2021 at 11:39 AM <sch8el@posteo.de> wrote:

After
that all unlogged tables remain completely unchanged (no
DML-/DDL-Statements). Hence all of my huge unlogged, "static" tables get
never "unclean" and should not be truncated after a server crash.

The server cannot make this assumption so it truncates unlogged relations
upon an unclean shutdown/crash because it has no WAL with which to ensure a
proper restoration.

BTW, if I set all unlogged tables to logged after bulk load, it takes

additional 1.5 hours, mainly because of re-indexing, I suppose.

More likely it is writing the entire table, and all of its indexes, to WAL.

I assume

that a restart of the database after a server crash takes another 1.5
hours (reading from WAL) until the database is up and running.

That would be incorrect. See "CHECKPOINT".

Therefore I am seeking a strategy, to not tagging those tables as
"unclean" and not truncating all unlogged tables on server restart.

There is no middle ground that I am aware of. Either the contents of the
table are in WAL ,or they are not. If not, they can be lost upon an
unclean shutdown. For manually initiated shutdowns you do have the option
to do so cleanly.

This topic (unlogged optimizations) does draw quite a bit of attention
every year but so far the problem of proving to the system that the
physical file on disk is a truly accurate representation of the post-crash
relation is yet unsolved.

David J.

#3Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Noname (#1)
Re: Are all unlogged tables in any case truncated after a server-crash?

On Thu, 2021-11-11 at 18:39 +0000, sch8el@posteo.de wrote:

every few weeks I use Postgres ability, to import huge data sets very
fast by means of "unlogged tables". The bulk load (consisting of plenty
"copy"- & DML-Stmts) and the spatial index creation afterwards, takes
about 5 hours on a proper server  (pg12.7 & PostGIS-Extension). After
that all unlogged tables remain completely unchanged (no
DML-/DDL-Statements). Hence all of my huge unlogged, "static" tables get
never "unclean" and should not be truncated after a server crash.

There is no way to achieve that.

But you could keep the "huge data sets" around and load them again if
your server happens to crash (which doesn't happen often, I hope).

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

#4Noname
sch8el@posteo.de
In reply to: David G. Johnston (#2)
Re: Are all unlogged tables in any case truncated after a server-crash?

Hi David,

thx for your comments and your advice on reading docs on "checkpoint".

Of course consistency is most important to any DBMS, and if in doubt
about that, truncate data rows and restore from WAL.
But in this case, where data is never modified after bulk load, I
thought there might be an undocumented feature or workaround, like ...
  - option to set the datafiles of those tables in read-only mode and
record this in the metadata
  - on server-recovery spare these unlogged tables and indexes from
truncating all data rows

Its truly a "nice to have"-thing, but I have learned now, that there is
not feature like that.

Mart

Am 11.11.2021 um 22:10 schrieb David G. Johnston:

Show quoted text

On Thu, Nov 11, 2021 at 11:39 AM <sch8el@posteo.de> wrote:

After
that all unlogged tables remain completely unchanged (no
DML-/DDL-Statements). Hence all of my huge unlogged, "static"
tables get
never "unclean" and should not be truncated after a server crash.

The server cannot make this assumption so it truncates unlogged
relations upon an unclean shutdown/crash because it has no WAL with
which to ensure a proper restoration.

BTW, if I set all unlogged tables to logged after bulk load, it takes
additional 1.5 hours, mainly because of re-indexing, I suppose.

More likely it is writing the entire table, and all of its indexes, to
WAL.

I assume
that a restart of the database after a server crash takes another 1.5
hours (reading from WAL) until the database is up and running.

That would be incorrect.  See "CHECKPOINT".

Therefore I am seeking a strategy, to not tagging those tables as
"unclean" and not truncating all unlogged tables on server restart.

There is no middle ground that I am aware of.  Either the contents of
the table are in WAL ,or they are not.  If not, they can be lost upon
an unclean shutdown.  For manually initiated shutdowns you do have the
option to do so cleanly.

This topic (unlogged optimizations) does draw quite a bit of attention
every year but so far the problem of proving to the system that the
physical file on disk is a truly accurate representation of the
post-crash relation is yet unsolved.

David J.

#5Noname
sch8el@posteo.de
In reply to: Laurenz Albe (#3)
Re: Are all unlogged tables in any case truncated after a server-crash?

Am 12.11.2021 um 08:41 schrieb Laurenz Albe:

On Thu, 2021-11-11 at 18:39 +0000, sch8el@posteo.de wrote:

every few weeks I use Postgres ability, to import huge data sets very
fast by means of "unlogged tables". The bulk load (consisting of plenty
"copy"- & DML-Stmts) and the spatial index creation afterwards, takes
about 5 hours on a proper server  (pg12.7 & PostGIS-Extension). After
that all unlogged tables remain completely unchanged (no
DML-/DDL-Statements). Hence all of my huge unlogged, "static" tables get
never "unclean" and should not be truncated after a server crash.

There is no way to achieve that.

But you could keep the "huge data sets" around and load them again if
your server happens to crash (which doesn't happen often, I hope).

Thx Laurenz for yr reply! Yes, that's what we did after server crashes
(~ 2/yr on different locations).
But the system is at least 5 hours offline plus the time until the admin
manually re-starts the bulk loads. On my system, I have 6 databases
configured like this. For all I have to redo the bulk loads.
I hoped there was a 'switch' on crash-recovery, to avoid truncating the
datafiles of these unlogged tables, which are definitely in a perfect
condition.

Mart

Show quoted text

Yours,
Laurenz Albe

#6Michael Lewis
mlewis@entrata.com
In reply to: Noname (#5)
Re: Are all unlogged tables in any case truncated after a server-using

Why keep them as unlogged tables? If data is static, can you spare the disk
space to gradually copy data from existing unlogged table to new copy that
is logged, and then have brief exclusive lock to drop unlogged and rename
new one?

#7Michael Lewis
mlewis@entrata.com
In reply to: Noname (#5)
Re: Are all unlogged tables in any case truncated after a server-crash?

Curious... why keep the table as unlogged if it is static? If you can spare
the disk space, perhaps just create a regular table with same definition,
gradually copy the data to spread the impact on WAL, and when complete,
just drop the old table and rename the new one.