Please help me understand unlogged tables

Started by Alexander Stoddardabout 8 years ago3 messagesgeneral
Jump to latest
#1Alexander Stoddard
alexander.stoddard@gmail.com

If a table is set to unlogged is it inherently non-durable? That, is any
crash or unsafe shutdown _must_ result in truncation upon recovery?

I can imagine a table that is bulk loaded in a warehousing scenario and
then sitting statically could be safe, but maybe the question becomes how
could the system know it is unchanged if it isn't logged...

Would a correct solution be to bulk load into an unlogged table and then
(fairly cheaply?) set the table back to being logged?

If I my mental model is correct the table would be liable to being lost to
a crash during loading but once set back to logged it is durable. Is that
correct?

Is it the case that setting a large table to logged status after loading it
unlogged is a reasonably cheap operation?

Thank you,
Alex Stoddard

#2Jeff Janes
jeff.janes@gmail.com
In reply to: Alexander Stoddard (#1)
Re: Please help me understand unlogged tables

On Wed, Jan 31, 2018 at 8:52 AM, Alexander Stoddard <
alexander.stoddard@gmail.com> wrote:

If a table is set to unlogged is it inherently non-durable? That, is any
crash or unsafe shutdown _must_ result in truncation upon recovery?

Yes.

I can imagine a table that is bulk loaded in a warehousing scenario and
then sitting statically could be safe, but maybe the question becomes how
could the system know it is unchanged if it isn't logged...

Right, that is the problem. It would be nice to solve it, but there are no
immediate plans that I know of to implement that. It is hard, because
during recovery the system can't query system catalog tables to get
information out of them. That is why unlogged tables are indicate by the
existence of certain files in the filesystem, so the system can know they
are unlogged without querying the system catalogs to find out.

Would a correct solution be to bulk load into an unlogged table and then
(fairly cheaply?) set the table back to being logged?

Unfortunately, it is not cheap for large tables.

I have some large datasets which I could always regenerate if needed, which
I don't want blowing out my wal archive. I've resigned myself to just
reloading them after a crash (or after using pg_basebackup to reclone
production to make a fresh test/dev server). If the ETL step from the raw
source is expensive, then I \copy the table out to a file, and then \copy
it back in after a crash, to save on the ET time.

Cheers,

Jeff

#3Michael Paquier
michael@paquier.xyz
In reply to: Alexander Stoddard (#1)
Re: Please help me understand unlogged tables

On Wed, Jan 31, 2018 at 10:52:38AM -0600, Alexander Stoddard wrote:

If a table is set to unlogged is it inherently non-durable? That, is any
crash or unsafe shutdown _must_ result in truncation upon recovery?

Yes, they are designed like that. Upon recovery all unlogged tables are
re-initialized. What happens internally is that their init fork files
gets copied to become the one in use.

I can imagine a table that is bulk loaded in a warehousing scenario and
then sitting statically could be safe, but maybe the question becomes how
could the system know it is unchanged if it isn't logged...

Would a correct solution be to bulk load into an unlogged table and then
(fairly cheaply?) set the table back to being logged?

Note that switching a table from unlogged to logged using ALTER TABLE
generates a large amount of WAL, as you need to make this table's data
consistent it is necessary to copy a full image of it.

If I my mental model is correct the table would be liable to being lost to
a crash during loading but once set back to logged it is durable. Is that
correct?

Yes, ALTER TABLE makes that possible but be careful about the WAL
generated because of the table rewrite.

Is it the case that setting a large table to logged status after loading it
unlogged is a reasonably cheap operation?

If you load a logged table the cost would be the same as switching it
from unlogged to logged. So what matters is how much operation is
generated between the time you created the table, like random writes on
it. For an initial load on a very large table, you could reduce
wal_level temporarily to minimal, and make the WAL generated less
painful. For a one-time load this can matter.
--
Michael