Survey on backing up unlogged tables: help us with PostgreSQL development!

Started by Josh Berkusover 15 years ago53 messagesgeneral
Jump to latest
#1Josh Berkus
josh@agliodbs.com

Folks,

Please help us resolve a discussion on -hackers.

PostgreSQL 9.1 is likely to have, as a feature, the ability to create
tables which are "unlogged", meaning that they are not added to the
transaction log, and will be truncated (emptied) on database restart.
Such tables are intended for highly volatile, but not very valuable,
data, such as session statues, application logs, etc.

The question is, how would you, as a DBA, expect pg_dump backups to
treat unlogged tables? Backing them up by default has the potential to
both cause performance drag on the unlogged table and make your backups
take longer unless you remember to omit them. Not backing them up by
default has the drawback that if you forget --include-unlogged switch,
and shut the database down, any unlogged data is gone. How would you
*expect* unlogged tables to behave?

Survey is here:
https://spreadsheets.google.com/ccc?key=0AoeuP3g2YZsFdDFnT2VKNC1FQ0pQNmJGS2dWMTNYMEE&hl=en&authkey=CISbwuYD

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#2Karsten Hilbert
Karsten.Hilbert@gmx.net
In reply to: Josh Berkus (#1)
Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

PostgreSQL 9.1 is likely to have, as a feature, the ability to create
tables which are "unlogged", meaning that they are not added to the
transaction log, and will be truncated (emptied) on database restart.
Such tables are intended for highly volatile, but not very valuable,
data, such as session statues, application logs, etc.

The question is, how would you, as a DBA, expect pg_dump backups to
treat unlogged tables? Backing them up by default has the potential to
both cause performance drag on the unlogged table and make your backups
take longer unless you remember to omit them. Not backing them up by
default has the drawback that if you forget --include-unlogged switch,
and shut the database down, any unlogged data is gone. How would you
*expect* unlogged tables to behave?

ALTER DATABASE ... SET PG_DUMP_INCLUDE_UNLOGGED TO ON/OFF

with default OFF.

That way I can think about it once per database *before* I am in
the situation when I regret forgetting.

(pg_dump would still support --include-unlogged, defaulting to the
database default)

Karsten
--
Neu: GMX De-Mail - Einfach wie E-Mail, sicher wie ein Brief!
Jetzt De-Mail-Adresse reservieren: http://portal.gmx.net/de/go/demail

#3Derrick Rice
derrick.rice@gmail.com
In reply to: Josh Berkus (#1)
Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

On Tue, Nov 16, 2010 at 5:46 PM, Josh Berkus <josh@agliodbs.com> wrote:

Survey is here:

https://spreadsheets.google.com/ccc?key=0AoeuP3g2YZsFdDFnT2VKNC1FQ0pQNmJGS2dWMTNYMEE&amp;hl=en&amp;authkey=CISbwuYD

This is a link to a read-only spreadsheet for me.

Derrick

#4Glen Parker
glenebob@nwlink.com
In reply to: Karsten Hilbert (#2)
Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

On 11/16/2010 03:24 PM, Karsten Hilbert wrote:

PostgreSQL 9.1 is likely to have, as a feature, the ability to create
tables which are "unlogged", meaning that they are not added to the
transaction log, and will be truncated (emptied) on database restart.
Such tables are intended for highly volatile, but not very valuable,
data, such as session statues, application logs, etc.

I have been following loosely this discussion on HACKERS, but seem to
have missed the part about truncating such tables on server restart.

I have an immediate use for unlogged tables (application logs), but
having them truncate after even a clean server restart would be a show
stopper. I keep log data for 2 months, and never back it up. Having it
disappear after a system melt down is acceptable, but not after a clean
restart. That would be utterly ridiculous!

As to the topic of the thread, I think pg_dump needs to dump unlogged
tables by default.

-Glen

#5Scott Ribe
scott_ribe@elevated-dev.com
In reply to: Josh Berkus (#1)
Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

On Nov 16, 2010, at 3:46 PM, Josh Berkus wrote:

...and will be truncated (emptied) on database restart.

I think that's key. Anything that won't survive a database restart, I sure don't expect to survive backup & restore.

--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice

#6Josh Berkus
josh@agliodbs.com
In reply to: Derrick Rice (#3)
Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

This is a link to a read-only spreadsheet for me.

You're correct. Darn those Google unreadable links!

https://spreadsheets.google.com/viewform?formkey=dDFnT2VKNC1FQ0pQNmJGS2dWMTNYMEE6MQ

That should work.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#7Scott Mead
scott@scottrmead.com
In reply to: Glen Parker (#4)
Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

On Tue, Nov 16, 2010 at 7:21 PM, Glen Parker <glenebob@nwlink.com> wrote:

On 11/16/2010 03:24 PM, Karsten Hilbert wrote:

PostgreSQL 9.1 is likely to have, as a feature, the ability to create

tables which are "unlogged", meaning that they are not added to the
transaction log, and will be truncated (emptied) on database restart.
Such tables are intended for highly volatile, but not very valuable,
data, such as session statues, application logs, etc.

I have been following loosely this discussion on HACKERS, but seem to have
missed the part about truncating such tables on server restart.

I have an immediate use for unlogged tables (application logs), but having
them truncate after even a clean server restart would be a show stopper. I
keep log data for 2 months, and never back it up. Having it disappear after
a system melt down is acceptable, but not after a clean restart. That would
be utterly ridiculous!

+1 -- Is there a technical reason to do a TRUNCATE on restart? I'd feel
better if I could just have unlogged tables that survive unless something
like a power-outage etc... I'm in the exact same boat here, lots of big
logging tables that need to survive reboot, but are frustrating when it
comes to WAL generation.

As to the topic of the thread, I think pg_dump needs to dump unlogged
tables by default.

-1 I disagree. I'm fine with having the loaded weapon pointed at my foot.

--Scott

Show quoted text

-Glen

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Scott Mead (#7)
Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

Scott Mead <scott@scottrmead.com> writes:

+1 -- Is there a technical reason to do a TRUNCATE on restart? I'd feel
better if I could just have unlogged tables that survive unless something
like a power-outage etc... I'm in the exact same boat here, lots of big
logging tables that need to survive reboot, but are frustrating when it
comes to WAL generation.

Keep in mind that these tables are *not* going to survive any type of
backend crash. Maybe my perceptions are colored because I deal with
Postgres bugs all the time, but I think of backend crashes as pretty
common, certainly much more common than an OS-level crash. I'm afraid
you may be expecting unlogged tables to be significantly more robust
than they really will be.

regards, tom lane

#9Andy Colson
andy@squeakycode.net
In reply to: Josh Berkus (#1)
Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

On 11/16/2010 04:46 PM, Josh Berkus wrote:

PostgreSQL 9.1 is likely to have, as a feature, the ability to create
tables which are "unlogged", meaning that they are not added to the
transaction log, and will be truncated (emptied) on database restart.
Such tables are intended for highly volatile, but not very valuable,
data, such as session statues, application logs, etc.

With the current patches, the data survives a restart just fine.

I'd like to vote for:
safe restart = save data
bad crashy restart = drop date

-Andy

#10Glen Parker
glenebob@nwlink.com
In reply to: Tom Lane (#8)
Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

On 11/16/2010 05:15 PM, Tom Lane wrote:

Keep in mind that these tables are *not* going to survive any type of
backend crash. Maybe my perceptions are colored because I deal with
Postgres bugs all the time, but I think of backend crashes as pretty
common, certainly much more common than an OS-level crash. I'm afraid
you may be expecting unlogged tables to be significantly more robust
than they really will be.

But an individual backend crash != server restart, unless that's changed
since 8.1 (yes, I'm still stuck on 8.1 :( )... So if I, for example,
kill -9 a backend that's busy updating a nonlogged table, the table
could be corrupted, but it wouldn't be truncated (and could cause
trouble) for possibly weeks until the postmaster is restarted.
Conversely, even if no backend crash occurs whatsoever, all the
nonlogged tables would be truncated after an orderly postmaster restart.

Just doesn't make sense to me.

-Glen

#11Josh Berkus
josh@agliodbs.com
In reply to: Andy Colson (#9)
Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

With the current patches, the data survives a restart just fine.

Per -hackers, that's not guarenteed.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#12Andy Colson
andy@squeakycode.net
In reply to: Josh Berkus (#11)
Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

On 11/16/2010 07:33 PM, Josh Berkus wrote:

With the current patches, the data survives a restart just fine.

Per -hackers, that's not guarenteed.

Ah, I just read the thread on -hackers. And yea, my system had 24 hours to write/flush/etc before I'd restarted it moments ago as a test. I have NOT tested a bunch of writes and then quickly restarting PG. I CAN report that given 24 hours, your data will survive a restart :-)

-Andy

#13Scott Marlowe
scott.marlowe@gmail.com
In reply to: Scott Ribe (#5)
Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

On Tue, Nov 16, 2010 at 5:23 PM, Scott Ribe <scott_ribe@elevated-dev.com> wrote:

On Nov 16, 2010, at 3:46 PM, Josh Berkus wrote:

 ...and will be truncated (emptied) on database restart.

I think that's key. Anything that won't survive a database restart, I sure don't expect to survive backup & restore.

I'd vote for backing up the schema of an unlogged table so it's there
on a restore.

#14Scott Mead
scott@scottrmead.com
In reply to: Tom Lane (#8)
Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

On Tue, Nov 16, 2010 at 8:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Scott Mead <scott@scottrmead.com> writes:

+1 -- Is there a technical reason to do a TRUNCATE on restart? I'd feel
better if I could just have unlogged tables that survive unless something
like a power-outage etc... I'm in the exact same boat here, lots of big
logging tables that need to survive reboot, but are frustrating when it
comes to WAL generation.

Keep in mind that these tables are *not* going to survive any type of
backend crash.

Not surviving a crash is fine. IMHO, if we'd lose data in myisam files,
I'm happy to lose them on pg nologging tables. I just want it to survive a
stop / start operation. The benefits (think of multi-host syslog
consolidation with FTS <drools> ) on these tables FAR outweigh the
off-chance that a crash will cause me some heartache.

Maybe my perceptions are colored because I deal with
Postgres bugs all the time, but I think of backend crashes as pretty
common, certainly much more common than an OS-level crash. I'm afraid
you may be expecting unlogged tables to be significantly more robust
than they really will be.

Bugs? What bugs :)

Honestly, I've only had a couple of *Prod* crashes (knocks on wood), but
the need to restart occurs every now and then.

--Scott

Show quoted text

regards, tom lane

#15Glen Parker
glenebob@nwlink.com
In reply to: Josh Berkus (#11)
Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

On 11/16/2010 05:33 PM, Josh Berkus wrote:

With the current patches, the data survives a restart just fine.

Per -hackers, that's not guarenteed.

Which is fine. If you choose to set a table to nonlogged, that implies
that you accept the risk of corrupted data, or that you don't "get it",
in which case . It should not however, imply that you want it all
thrown out every so often for no good reason.

If you do wish to have the data tossed out for no good reason every so
often, then there ought to be a separate attribute to control that. I'm
really having trouble seeing how such behavior would be desirable enough
to ever have the server do it for you, on its terms rather than yours.

-Glen

#16Josh Berkus
josh@agliodbs.com
In reply to: Glen Parker (#15)
Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

If you do wish to have the data tossed out for no good reason every so
often, then there ought to be a separate attribute to control that. I'm
really having trouble seeing how such behavior would be desirable enough
to ever have the server do it for you, on its terms rather than yours.

I don't quite follow you. The purpose of unlogged tables is for data
which is disposable in the event of downtime; the classic example is the
a user_session_status table. In the event of a restart, all user
sessions are going to be invalid anyway.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#17Josh Berkus
josh@agliodbs.com
In reply to: Scott Marlowe (#13)
Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

I'd vote for backing up the schema of an unlogged table so it's there
on a restore.

The schema is always there. What may or may not be there is the data.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#18Andy Colson
andy@squeakycode.net
In reply to: Josh Berkus (#16)
Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

On 11/16/2010 07:55 PM, Josh Berkus wrote:

If you do wish to have the data tossed out for no good reason every so
often, then there ought to be a separate attribute to control that. I'm
really having trouble seeing how such behavior would be desirable enough
to ever have the server do it for you, on its terms rather than yours.

I don't quite follow you. The purpose of unlogged tables is for data
which is disposable in the event of downtime; the classic example is the
a user_session_status table. In the event of a restart, all user
sessions are going to be invalid anyway.

Why? If you dont blow away the sessions table, everything should be fine.

-Andy

#19Glen Parker
glenebob@nwlink.com
In reply to: Josh Berkus (#16)
Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

On 11/16/2010 05:55 PM, Josh Berkus wrote:

If you do wish to have the data tossed out for no good reason every so
often, then there ought to be a separate attribute to control that. I'm
really having trouble seeing how such behavior would be desirable enough
to ever have the server do it for you, on its terms rather than yours.

I don't quite follow you. The purpose of unlogged tables is for data
which is disposable in the event of downtime; the classic example is the
a user_session_status table. In the event of a restart, all user
sessions are going to be invalid anyway.

As was already mentioned, application logs. Unlogged tables would be
perfect for that, provided they don't go *poof* every now and then for
no good reason. Nobody's going to be too heart broken if a handful of
log records go missing, or get garbled, after a server crash or power
outage. Delete 'em all after every restart though, and that's a problem.

-Glen

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Scott Mead (#14)
Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

Man, the number of misunderstandings in this thread is staggering.
Let me try to explain what the proposed feature will and will not do.

1. The system catalog entries for all tables will be wal-logged.
So schema (DDL) will survive a crash. There wouldn't be any way
to make it not do that, because we can't wal-log only some updates
to a particular table, and that includes the catalogs in particular.

2. What's proposed as the new feature is that specific non-system
tables can be marked as unlogged, meaning that WAL entries won't
be made for changes in those tables' contents (nor their indexes'
contents). So we can't guarantee that the contents of such tables
will be correct or consistent after a crash. The proposed feature
deals with this by forcibly truncating all such tables after a crash,
thus ensuring that they're consistent though not populated. So the
possible use-cases for such tables are limited to where (a) you can
repopulate the tables on demand, or (b) you don't really care about
losing data on a crash.

3. There's a lot of wishful thinking here about what constitutes a
crash. A backend crash *is* a crash, even if the postmaster keeps
going. Data that had been in shared buffers doesn't get written out
in such a scenario (and if we tried, it might be corrupt anyway). So
unlogged tables would be corrupt and in need of truncation after such an
event. Obviously, the same goes for an OS-level crash or power failure.

4. The last bit of discussion on -hackers concerned what to do in
the case where the server got shut down cleanly. If it was shut
down cleanly, then any data for unlogged tables would have been
written out from shared buffers ... but did the data make it to disk?
There's no easy way to know that. In the event of an OS crash or
power failure shortly after server shutdown, it's possible that
the unlogged tables would be corrupt. So Robert's initial proposal
includes truncating unlogged tables at any database startup, even
if the previous shutdown was clean. Some (including me) are arguing
that that is unnecessarily strict; but you do have to realize that
you're taking some risk with data validity if it doesn't do that.

The bottom line here is that you really can only use the feature
for data that you're willing to accept losing on no notice.
Allowing the data to persist across clean shutdowns would probably
improve usability a bit, but it's not changing that fundamental fact.

regards, tom lane

#21Karsten Hilbert
Karsten.Hilbert@gmx.net
In reply to: Tom Lane (#20)
#22Yeb Havinga
yebhavinga@gmail.com
In reply to: Josh Berkus (#16)
#23Derrick Rice
derrick.rice@gmail.com
In reply to: Yeb Havinga (#22)
#24Scott Mead
scott@scottrmead.com
In reply to: Tom Lane (#20)
#25Scott Mead
scott@scottrmead.com
In reply to: Scott Mead (#24)
#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Derrick Rice (#23)
#27Steve Crawford
scrawford@pinpointresearch.com
In reply to: Tom Lane (#20)
#28Ivan Voras
ivoras@freebsd.org
In reply to: Scott Ribe (#5)
#29Ivan Voras
ivoras@freebsd.org
In reply to: Josh Berkus (#16)
#30A.M.
agentm@themactionfaction.com
In reply to: Ivan Voras (#29)
#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: Steve Crawford (#27)
#32Josh Berkus
josh@agliodbs.com
In reply to: Glen Parker (#19)
#33Glen Parker
glenebob@nwlink.com
In reply to: Tom Lane (#20)
#34Scott Mead
scott@scottrmead.com
In reply to: Josh Berkus (#32)
#35Marc Mamin
M.Mamin@intershop.de
In reply to: Glen Parker (#15)
#36Marc Mamin
M.Mamin@intershop.de
In reply to: Marc Mamin (#35)
#37Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marc Mamin (#36)
#38Marc Mamin
M.Mamin@intershop.de
In reply to: Tom Lane (#37)
#39Ivan Voras
ivoras@freebsd.org
In reply to: A.M. (#30)
#40Merlin Moncure
mmoncure@gmail.com
In reply to: Josh Berkus (#1)
#41Alban Hertroys
dalroi@solfertje.student.utwente.nl
In reply to: Josh Berkus (#1)
#42Karsten Hilbert
Karsten.Hilbert@gmx.net
In reply to: Alban Hertroys (#41)
#43Karsten Hilbert
Karsten.Hilbert@gmx.net
In reply to: Karsten Hilbert (#42)
#44Jayadevan M
Jayadevan.Maymala@ibsplc.com
In reply to: Alban Hertroys (#41)
#45Andy Colson
andy@squeakycode.net
In reply to: Jayadevan M (#44)
#46Josh Berkus
josh@agliodbs.com
In reply to: Alban Hertroys (#41)
#47Jayadevan M
Jayadevan.Maymala@ibsplc.com
In reply to: Andy Colson (#45)
#48Alban Hertroys
dalroi@solfertje.student.utwente.nl
In reply to: Jayadevan M (#47)
#49Marc Mamin
M.Mamin@intershop.de
In reply to: Alban Hertroys (#48)
#50Jayadevan M
Jayadevan.Maymala@ibsplc.com
In reply to: Alban Hertroys (#48)
#51Ron Mayer
rm_pg@cheapcomplexdevices.com
In reply to: Glen Parker (#19)
#52Ron Mayer
rm_pg@cheapcomplexdevices.com
In reply to: Josh Berkus (#11)
#53Josh Berkus
josh@agliodbs.com
In reply to: Ron Mayer (#52)