Survey on backing up unlogged tables: help us with PostgreSQL development!
Folks,
Please help us resolve a discussion on -hackers.
PostgreSQL 9.1 is likely to have, as a feature, the ability to create
tables which are "unlogged", meaning that they are not added to the
transaction log, and will be truncated (emptied) on database restart.
Such tables are intended for highly volatile, but not very valuable,
data, such as session statues, application logs, etc.
The question is, how would you, as a DBA, expect pg_dump backups to
treat unlogged tables? Backing them up by default has the potential to
both cause performance drag on the unlogged table and make your backups
take longer unless you remember to omit them. Not backing them up by
default has the drawback that if you forget --include-unlogged switch,
and shut the database down, any unlogged data is gone. How would you
*expect* unlogged tables to behave?
Survey is here:
https://spreadsheets.google.com/ccc?key=0AoeuP3g2YZsFdDFnT2VKNC1FQ0pQNmJGS2dWMTNYMEE&hl=en&authkey=CISbwuYD
--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com
PostgreSQL 9.1 is likely to have, as a feature, the ability to create
tables which are "unlogged", meaning that they are not added to the
transaction log, and will be truncated (emptied) on database restart.
Such tables are intended for highly volatile, but not very valuable,
data, such as session statues, application logs, etc.The question is, how would you, as a DBA, expect pg_dump backups to
treat unlogged tables? Backing them up by default has the potential to
both cause performance drag on the unlogged table and make your backups
take longer unless you remember to omit them. Not backing them up by
default has the drawback that if you forget --include-unlogged switch,
and shut the database down, any unlogged data is gone. How would you
*expect* unlogged tables to behave?
ALTER DATABASE ... SET PG_DUMP_INCLUDE_UNLOGGED TO ON/OFF
with default OFF.
That way I can think about it once per database *before* I am in
the situation when I regret forgetting.
(pg_dump would still support --include-unlogged, defaulting to the
database default)
Karsten
--
Neu: GMX De-Mail - Einfach wie E-Mail, sicher wie ein Brief!
Jetzt De-Mail-Adresse reservieren: http://portal.gmx.net/de/go/demail
On Tue, Nov 16, 2010 at 5:46 PM, Josh Berkus <josh@agliodbs.com> wrote:
Survey is here:
This is a link to a read-only spreadsheet for me.
Derrick
On 11/16/2010 03:24 PM, Karsten Hilbert wrote:
PostgreSQL 9.1 is likely to have, as a feature, the ability to create
tables which are "unlogged", meaning that they are not added to the
transaction log, and will be truncated (emptied) on database restart.
Such tables are intended for highly volatile, but not very valuable,
data, such as session statues, application logs, etc.
I have been following loosely this discussion on HACKERS, but seem to
have missed the part about truncating such tables on server restart.
I have an immediate use for unlogged tables (application logs), but
having them truncate after even a clean server restart would be a show
stopper. I keep log data for 2 months, and never back it up. Having it
disappear after a system melt down is acceptable, but not after a clean
restart. That would be utterly ridiculous!
As to the topic of the thread, I think pg_dump needs to dump unlogged
tables by default.
-Glen
On Nov 16, 2010, at 3:46 PM, Josh Berkus wrote:
...and will be truncated (emptied) on database restart.
I think that's key. Anything that won't survive a database restart, I sure don't expect to survive backup & restore.
--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice
This is a link to a read-only spreadsheet for me.
You're correct. Darn those Google unreadable links!
https://spreadsheets.google.com/viewform?formkey=dDFnT2VKNC1FQ0pQNmJGS2dWMTNYMEE6MQ
That should work.
--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com
On Tue, Nov 16, 2010 at 7:21 PM, Glen Parker <glenebob@nwlink.com> wrote:
On 11/16/2010 03:24 PM, Karsten Hilbert wrote:
PostgreSQL 9.1 is likely to have, as a feature, the ability to create
tables which are "unlogged", meaning that they are not added to the
transaction log, and will be truncated (emptied) on database restart.
Such tables are intended for highly volatile, but not very valuable,
data, such as session statues, application logs, etc.I have been following loosely this discussion on HACKERS, but seem to have
missed the part about truncating such tables on server restart.I have an immediate use for unlogged tables (application logs), but having
them truncate after even a clean server restart would be a show stopper. I
keep log data for 2 months, and never back it up. Having it disappear after
a system melt down is acceptable, but not after a clean restart. That would
be utterly ridiculous!
+1 -- Is there a technical reason to do a TRUNCATE on restart? I'd feel
better if I could just have unlogged tables that survive unless something
like a power-outage etc... I'm in the exact same boat here, lots of big
logging tables that need to survive reboot, but are frustrating when it
comes to WAL generation.
As to the topic of the thread, I think pg_dump needs to dump unlogged
tables by default.-1 I disagree. I'm fine with having the loaded weapon pointed at my foot.
--Scott
Show quoted text
-Glen
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Scott Mead <scott@scottrmead.com> writes:
+1 -- Is there a technical reason to do a TRUNCATE on restart? I'd feel
better if I could just have unlogged tables that survive unless something
like a power-outage etc... I'm in the exact same boat here, lots of big
logging tables that need to survive reboot, but are frustrating when it
comes to WAL generation.
Keep in mind that these tables are *not* going to survive any type of
backend crash. Maybe my perceptions are colored because I deal with
Postgres bugs all the time, but I think of backend crashes as pretty
common, certainly much more common than an OS-level crash. I'm afraid
you may be expecting unlogged tables to be significantly more robust
than they really will be.
regards, tom lane
On 11/16/2010 04:46 PM, Josh Berkus wrote:
PostgreSQL 9.1 is likely to have, as a feature, the ability to create
tables which are "unlogged", meaning that they are not added to the
transaction log, and will be truncated (emptied) on database restart.
Such tables are intended for highly volatile, but not very valuable,
data, such as session statues, application logs, etc.
With the current patches, the data survives a restart just fine.
I'd like to vote for:
safe restart = save data
bad crashy restart = drop date
-Andy
On 11/16/2010 05:15 PM, Tom Lane wrote:
Keep in mind that these tables are *not* going to survive any type of
backend crash. Maybe my perceptions are colored because I deal with
Postgres bugs all the time, but I think of backend crashes as pretty
common, certainly much more common than an OS-level crash. I'm afraid
you may be expecting unlogged tables to be significantly more robust
than they really will be.
But an individual backend crash != server restart, unless that's changed
since 8.1 (yes, I'm still stuck on 8.1 :( )... So if I, for example,
kill -9 a backend that's busy updating a nonlogged table, the table
could be corrupted, but it wouldn't be truncated (and could cause
trouble) for possibly weeks until the postmaster is restarted.
Conversely, even if no backend crash occurs whatsoever, all the
nonlogged tables would be truncated after an orderly postmaster restart.
Just doesn't make sense to me.
-Glen
With the current patches, the data survives a restart just fine.
Per -hackers, that's not guarenteed.
--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com
On 11/16/2010 07:33 PM, Josh Berkus wrote:
With the current patches, the data survives a restart just fine.
Per -hackers, that's not guarenteed.
Ah, I just read the thread on -hackers. And yea, my system had 24 hours to write/flush/etc before I'd restarted it moments ago as a test. I have NOT tested a bunch of writes and then quickly restarting PG. I CAN report that given 24 hours, your data will survive a restart :-)
-Andy
On Tue, Nov 16, 2010 at 5:23 PM, Scott Ribe <scott_ribe@elevated-dev.com> wrote:
On Nov 16, 2010, at 3:46 PM, Josh Berkus wrote:
...and will be truncated (emptied) on database restart.
I think that's key. Anything that won't survive a database restart, I sure don't expect to survive backup & restore.
I'd vote for backing up the schema of an unlogged table so it's there
on a restore.
On Tue, Nov 16, 2010 at 8:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Scott Mead <scott@scottrmead.com> writes:
+1 -- Is there a technical reason to do a TRUNCATE on restart? I'd feel
better if I could just have unlogged tables that survive unless something
like a power-outage etc... I'm in the exact same boat here, lots of big
logging tables that need to survive reboot, but are frustrating when it
comes to WAL generation.Keep in mind that these tables are *not* going to survive any type of
backend crash.
Not surviving a crash is fine. IMHO, if we'd lose data in myisam files,
I'm happy to lose them on pg nologging tables. I just want it to survive a
stop / start operation. The benefits (think of multi-host syslog
consolidation with FTS <drools> ) on these tables FAR outweigh the
off-chance that a crash will cause me some heartache.
Maybe my perceptions are colored because I deal with
Postgres bugs all the time, but I think of backend crashes as pretty
common, certainly much more common than an OS-level crash. I'm afraid
you may be expecting unlogged tables to be significantly more robust
than they really will be.
Bugs? What bugs :)
Honestly, I've only had a couple of *Prod* crashes (knocks on wood), but
the need to restart occurs every now and then.
--Scott
Show quoted text
regards, tom lane
On 11/16/2010 05:33 PM, Josh Berkus wrote:
With the current patches, the data survives a restart just fine.
Per -hackers, that's not guarenteed.
Which is fine. If you choose to set a table to nonlogged, that implies
that you accept the risk of corrupted data, or that you don't "get it",
in which case . It should not however, imply that you want it all
thrown out every so often for no good reason.
If you do wish to have the data tossed out for no good reason every so
often, then there ought to be a separate attribute to control that. I'm
really having trouble seeing how such behavior would be desirable enough
to ever have the server do it for you, on its terms rather than yours.
-Glen
If you do wish to have the data tossed out for no good reason every so
often, then there ought to be a separate attribute to control that. I'm
really having trouble seeing how such behavior would be desirable enough
to ever have the server do it for you, on its terms rather than yours.
I don't quite follow you. The purpose of unlogged tables is for data
which is disposable in the event of downtime; the classic example is the
a user_session_status table. In the event of a restart, all user
sessions are going to be invalid anyway.
--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com
I'd vote for backing up the schema of an unlogged table so it's there
on a restore.
The schema is always there. What may or may not be there is the data.
--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com
On 11/16/2010 07:55 PM, Josh Berkus wrote:
If you do wish to have the data tossed out for no good reason every so
often, then there ought to be a separate attribute to control that. I'm
really having trouble seeing how such behavior would be desirable enough
to ever have the server do it for you, on its terms rather than yours.I don't quite follow you. The purpose of unlogged tables is for data
which is disposable in the event of downtime; the classic example is the
a user_session_status table. In the event of a restart, all user
sessions are going to be invalid anyway.
Why? If you dont blow away the sessions table, everything should be fine.
-Andy
On 11/16/2010 05:55 PM, Josh Berkus wrote:
If you do wish to have the data tossed out for no good reason every so
often, then there ought to be a separate attribute to control that. I'm
really having trouble seeing how such behavior would be desirable enough
to ever have the server do it for you, on its terms rather than yours.I don't quite follow you. The purpose of unlogged tables is for data
which is disposable in the event of downtime; the classic example is the
a user_session_status table. In the event of a restart, all user
sessions are going to be invalid anyway.
As was already mentioned, application logs. Unlogged tables would be
perfect for that, provided they don't go *poof* every now and then for
no good reason. Nobody's going to be too heart broken if a handful of
log records go missing, or get garbled, after a server crash or power
outage. Delete 'em all after every restart though, and that's a problem.
-Glen
Man, the number of misunderstandings in this thread is staggering.
Let me try to explain what the proposed feature will and will not do.
1. The system catalog entries for all tables will be wal-logged.
So schema (DDL) will survive a crash. There wouldn't be any way
to make it not do that, because we can't wal-log only some updates
to a particular table, and that includes the catalogs in particular.
2. What's proposed as the new feature is that specific non-system
tables can be marked as unlogged, meaning that WAL entries won't
be made for changes in those tables' contents (nor their indexes'
contents). So we can't guarantee that the contents of such tables
will be correct or consistent after a crash. The proposed feature
deals with this by forcibly truncating all such tables after a crash,
thus ensuring that they're consistent though not populated. So the
possible use-cases for such tables are limited to where (a) you can
repopulate the tables on demand, or (b) you don't really care about
losing data on a crash.
3. There's a lot of wishful thinking here about what constitutes a
crash. A backend crash *is* a crash, even if the postmaster keeps
going. Data that had been in shared buffers doesn't get written out
in such a scenario (and if we tried, it might be corrupt anyway). So
unlogged tables would be corrupt and in need of truncation after such an
event. Obviously, the same goes for an OS-level crash or power failure.
4. The last bit of discussion on -hackers concerned what to do in
the case where the server got shut down cleanly. If it was shut
down cleanly, then any data for unlogged tables would have been
written out from shared buffers ... but did the data make it to disk?
There's no easy way to know that. In the event of an OS crash or
power failure shortly after server shutdown, it's possible that
the unlogged tables would be corrupt. So Robert's initial proposal
includes truncating unlogged tables at any database startup, even
if the previous shutdown was clean. Some (including me) are arguing
that that is unnecessarily strict; but you do have to realize that
you're taking some risk with data validity if it doesn't do that.
The bottom line here is that you really can only use the feature
for data that you're willing to accept losing on no notice.
Allowing the data to persist across clean shutdowns would probably
improve usability a bit, but it's not changing that fundamental fact.
regards, tom lane