unlogged tables
I have played around a little more, and think I found a problem.
If given enough time, an unlogged table makes it to disk, and a restart wont clear the data. If I insert a bunch of stuff, commit, and quickly restart PG, it table is cleared. If I let it sit for a while, it stays.
Based on that, I have a pgbench_accounts table (unlogged) that after a restart has data in it.
andy=# select aid, bid, abalance from pgbench_accounts where abalance = 3305;
aid | bid | abalance
---------+-----+----------
3790226 | 38 | 3305
274130 | 3 | 3305
2169892 | 22 | 3305
705321 | 8 | 3305
4463145 | 45 | 3305
I dropped the index, and added a new one, then restart PG. Now it seems the index is empty/unusable.
andy=# select aid, bid, abalance from pgbench_accounts where aid = 3790226;
aid | bid | abalance
-----+-----+----------
(0 rows)
andy=# select pg_indexes_size('pgbench_accounts');
pg_indexes_size
-----------------
16384
Lets recreate it:
andy=# drop index bob;
DROP INDEX
Time: 13.829 ms
andy=# create index bob on pgbench_accounts(aid, bid);
CREATE INDEX
Time: 17215.859 ms
andy=# select aid, bid, abalance from pgbench_accounts where aid = 3790226;
aid | bid | abalance
---------+-----+----------
3790226 | 38 | 3305
(1 row)
Time: 0.712 ms
andy=# select pg_indexes_size('pgbench_accounts');
pg_indexes_size
-----------------
179716096
I also did kill -9 on all the postgres* processes, while they were busy inserting records, to try to corrupt the database. But could not seem to. Setting fsync off also did not give me errors, but I assume because I was using unlogged tables, and they were all getting cleared anyway, I never saw them.
With fsync off and normal tables, I got bad looking things in my logs and vacuum:
LOG: unexpected pageaddr 1/AB1D6000 in log file 1, segment 187, offset 1925120
WARNING: relation "access" page 28184 is uninitialized --- fixing
etc...
AND last, I tried to update my git repo and see if the patches still work. They do not.
There was much discussion on the syntax:
create unlogged table vs create temp xxx table vs something else.
There was much discussion on how persistent the tables should be. And some on backups.
At this point, though, I find myself at an end, not sure what else to do until the dust settles.
Oh, also, I wanted to add:
There is \h help: +1
but I can find no way of determining the "tempness"/"unloggedness" of a table via \d*
The only way I found was to "pg_dump -s"
I will attempt to link this to the website, and mark it as returned to author.
-Andy
On Tue, Nov 30, 2010 at 10:36 PM, Andy Colson <andy@squeakycode.net> wrote:
Based on that, I have a pgbench_accounts table (unlogged) that after a
restart has data in it.andy=# select aid, bid, abalance from pgbench_accounts where abalance =
3305;
aid | bid | abalance
---------+-----+----------
3790226 | 38 | 3305
274130 | 3 | 3305
2169892 | 22 | 3305
705321 | 8 | 3305
4463145 | 45 | 3305I dropped the index, and added a new one, then restart PG. Now it seems the
index is empty/unusable.andy=# select aid, bid, abalance from pgbench_accounts where aid = 3790226;
aid | bid | abalance
-----+-----+----------
(0 rows)andy=# select pg_indexes_size('pgbench_accounts');
pg_indexes_size
-----------------
16384Lets recreate it:
andy=# drop index bob;
DROP INDEX
Time: 13.829 ms
andy=# create index bob on pgbench_accounts(aid, bid);
CREATE INDEX
Time: 17215.859 ms
andy=# select aid, bid, abalance from pgbench_accounts where aid = 3790226;
aid | bid | abalance
---------+-----+----------
3790226 | 38 | 3305
(1 row)Time: 0.712 ms
andy=# select pg_indexes_size('pgbench_accounts');
pg_indexes_size
-----------------
179716096
This appears as though you've somehow gotten a normal table connected
to an unlogged index. That certainly sounds like a bug, but there's
not enough details here to figure out what series of steps I should
perform to recreate the problem.
AND last, I tried to update my git repo and see if the patches still work.
They do not.
Updated patches attached.
Oh, also, I wanted to add:
There is \h help: +1
but I can find no way of determining the "tempness"/"unloggedness" of a
table via \d*
It's clearly displayed in the \d output.
Unlogged Table "public.test"
Column | Type | Modifiers
--------+---------+-----------
a | integer | not null
Indexes:
"test_pkey" PRIMARY KEY, btree (a)
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 11/30/2010 10:27 PM, Robert Haas wrote:
This appears as though you've somehow gotten a normal table connected
to an unlogged index. That certainly sounds like a bug, but there's
not enough details here to figure out what series of steps I should
perform to recreate the problem.There is \h help: +1
but I can find no way of determining the "tempness"/"unloggedness" of a
table via \d*It's clearly displayed in the \d output.
Unlogged Table "public.test"
Column | Type | Modifiers
--------+---------+-----------
a | integer | not null
Indexes:
"test_pkey" PRIMARY KEY, btree (a)
Jeez... Were it a snake it'd a bit me!
Ok. I blew away my database and programs, re-gitted, re-patched (they work), re-compiled (ok), and re-ran initdb.
I have these non-standard settings:
shared_buffers = 512MB
work_mem = 5MB
checkpoint_segments = 7
1st) I can recreate some warning messages from vacuum:
WARNING: relation "ulone" page 0 is uninitialized --- fixing
WARNING: relation "pg_toast_16433" page 0 is uninitialized --- fixing
you create an unlogged table, fill it, restart pg (and it clears the table), then fill it again, and vacuum complains. Here is a log:
andy=# drop table ulone;
DROP TABLE
Time: 40.532 ms
andy=# create unlogged table ulone(id serial, a integer, b integer, c text);
NOTICE: CREATE TABLE will create implicit sequence "ulone_id_seq" for serial column "ulone.id"
CREATE TABLE
Time: 151.968 ms
andy=# insert into ulone(a, b, c) select x, 1, 'bbbbbbbbbbb' from generate_series(1, 10000000) x;
INSERT 0 10000000
Time: 80401.505 ms
andy=# \q
$ vacuumdb -az
vacuumdb: vacuuming database "andy"
vacuumdb: vacuuming database "postgres"
vacuumdb: vacuuming database "template1"
$ sudo /etc/rc.d/postgresql stop
Stopping PostgreSQL: No directory, logging in with HOME=/
$ sudo /etc/rc.d/postgresql start
Starting PostgreSQL:
$ psql
Timing is on.
psql (9.1devel)
Type "help" for help.
andy=# select count(*) from ulone;
count
-------
0
(1 row)
Time: 1.164 ms
andy=# insert into ulone(a, b, c) select x, 1, 'bbbbbbbbbbb' from generate_series(1, 10000000) x;
INSERT 0 10000000
Time: 75312.753 ms
andy=# \q
$ vacuumdb -az
vacuumdb: vacuuming database "andy"
WARNING: relation "ulone" page 0 is uninitialized --- fixing
WARNING: relation "pg_toast_16478" page 0 is uninitialized --- fixing
vacuumdb: vacuuming database "postgres"
vacuumdb: vacuuming database "template1"
2nd) I can get the data to stick around after restart. Though not reliably. In general:
create and fill a table, vacuum it (not sure if its important, I do it because thats what I'd done in my pgbench testing where I noticed the data stuck around), wait an hour (I usually left it for 12-24 hours, but recreated it with as little as a half hour), then restart pg. Sometimes the data is there... sometimes not.
I also filled my table with more data than memory would hold so it would spill to disk, again, because it recreates my pgbench setup.
I'm still working on finding the exact steps, but I wanted to get you #1 above.
-Andy
2nd) I can get the data to stick around after restart. Though not reliably. In general:
create and fill a table, vacuum it (not sure if its important, I do it because thats what I'd done in my pgbench testing where I noticed the data stuck around), wait an hour (I usually left it for 12-24 hours, but recreated it with as little as a half hour), then restart pg. Sometimes the data is there... sometimes not.
I also filled my table with more data than memory would hold so it would spill to disk, again, because it recreates my pgbench setup.
I'm still working on finding the exact steps, but I wanted to get you #1 above.
-Andy
Ok, forget the time thing. Has nothing to do with it. (Which everyone already assumed I imagine).
Its truncate.
Create unloged table, fill it, truncate it, fill it again, restart pg, and the data will still be there.
-Andy
Excerpts from Andy Colson's message of vie dic 03 00:37:17 -0300 2010:
Ok, forget the time thing. Has nothing to do with it. (Which everyone already assumed I imagine).
Its truncate.
Create unloged table, fill it, truncate it, fill it again, restart pg, and the data will still be there.
Hmm, presumably the table rewrite thing in truncate is not preserving
the unlogged state (perhaps it's the swap-relfilenode business). Does
CLUSTER have a similar effect? What about VACUUM FULL? If so you know
where the bug is.
--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Thu, Dec 2, 2010 at 10:53 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
Excerpts from Andy Colson's message of vie dic 03 00:37:17 -0300 2010:
Ok, forget the time thing. Has nothing to do with it. (Which everyone already assumed I imagine).
Its truncate.
Create unloged table, fill it, truncate it, fill it again, restart pg, and the data will still be there.
Hmm, presumably the table rewrite thing in truncate is not preserving
the unlogged state (perhaps it's the swap-relfilenode business).
Oh ho. Right. Yeah, that case is not handled. Woopsie.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tue, Nov 30, 2010 at 10:36 PM, Andy Colson <andy@squeakycode.net> wrote:
[ review ]
Currently, if you create an unlogged table, restart PG, and vacuum the
table, you'll get this:
rhaas=# vacuum unlogged;
WARNING: relation "unlogged" page 0 is uninitialized --- fixing
VACUUM
The reason this happens is because the init fork of an unlogged heap
consists of a single empty page, rather than a totally empty file. I
needed to WAL-log the creation of the init fork, and there's currently
no way to WAL-log the creation of an empty file other than the main
relation fork. I figured a file with one empty page would be just as
good as a totally empty file, and that way I could piggyback on
XLOG_HEAP_NEWPAGE, which will automatically create the relation fork
if it's not already there. However, as the above warning message
demonstrates, this was a bit too clever.
One possible fix is to change the XLOG_SMGR_CREATE record to carry a
fork number. Does that seem reasonable, or would anyone like to
recommend another approach?
I'm also going to go through and change all instances of the word
"unlogged" to "volatile", per previous discussion. If this seems like
a bad idea to anyone, please object now rather than afterwards.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
I'm also going to go through and change all instances of the word
"unlogged" to "volatile", per previous discussion. If this seems like
a bad idea to anyone, please object now rather than afterwards.
Hm... I thought there had been discussion of a couple of different
flavors of table volatility. Is it really a good idea to commandeer
the word "volatile" for this particular one?
regards, tom lane
On Tue, Dec 7, 2010 at 1:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
I'm also going to go through and change all instances of the word
"unlogged" to "volatile", per previous discussion. If this seems like
a bad idea to anyone, please object now rather than afterwards.Hm... I thought there had been discussion of a couple of different
flavors of table volatility. Is it really a good idea to commandeer
the word "volatile" for this particular one?
So far I've come up with the following possible behaviors we could
theoretically implement:
1. Any crash or shutdown truncates the table.
2. Any crash truncates the table, but a clean shutdown does not.
3. A crash truncates the table only if it's been written since the
last checkpoint; a clean shutdown does not truncate it.
The main argument for doing #1 rather than #2 is that we'd rather not
have to include unlogged table data in checkpoints. Andres Freund
made the argument that we could avoid that anyway, though, by just
doing an fsync() on every unlogged table file in the cluster at
shutdown time. If that's acceptable, then ISTM there's no benefit to
implementing #1 and we should just go with #2. If it's not
acceptable, then we have to think about whether and how to have both
of those behaviors.
#3 seems like a lot of work relative to #1 and #2 for a pretty
marginal increase in durability.
Thoughts?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
On Tue, Dec 7, 2010 at 1:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Hm... I thought there had been discussion of a couple of different
flavors of table volatility. �Is it really a good idea to commandeer
the word "volatile" for this particular one?
So far I've come up with the following possible behaviors we could
theoretically implement:
1. Any crash or shutdown truncates the table.
2. Any crash truncates the table, but a clean shutdown does not.
3. A crash truncates the table only if it's been written since the
last checkpoint; a clean shutdown does not truncate it.
The main argument for doing #1 rather than #2 is that we'd rather not
have to include unlogged table data in checkpoints. Andres Freund
made the argument that we could avoid that anyway, though, by just
doing an fsync() on every unlogged table file in the cluster at
shutdown time. If that's acceptable, then ISTM there's no benefit to
implementing #1 and we should just go with #2. If it's not
acceptable, then we have to think about whether and how to have both
of those behaviors.
#3 seems like a lot of work relative to #1 and #2 for a pretty
marginal increase in durability.
OK. I agree that #3 adds a lot of complexity for not much of anything.
If you've got data that's static enough that #3 adds a useful amount
of safety, then you might as well be keeping it in a regular table.
I think a more relevant question is how complicated it'll be to issue
those fsyncs --- do you have a concrete implementation in mind?
regards, tom lane
On Tue, Dec 7, 2010 at 3:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Tue, Dec 7, 2010 at 1:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Hm... I thought there had been discussion of a couple of different
flavors of table volatility. Is it really a good idea to commandeer
the word "volatile" for this particular one?So far I've come up with the following possible behaviors we could
theoretically implement:1. Any crash or shutdown truncates the table.
2. Any crash truncates the table, but a clean shutdown does not.
3. A crash truncates the table only if it's been written since the
last checkpoint; a clean shutdown does not truncate it.The main argument for doing #1 rather than #2 is that we'd rather not
have to include unlogged table data in checkpoints. Andres Freund
made the argument that we could avoid that anyway, though, by just
doing an fsync() on every unlogged table file in the cluster at
shutdown time. If that's acceptable, then ISTM there's no benefit to
implementing #1 and we should just go with #2. If it's not
acceptable, then we have to think about whether and how to have both
of those behaviors.#3 seems like a lot of work relative to #1 and #2 for a pretty
marginal increase in durability.OK. I agree that #3 adds a lot of complexity for not much of anything.
If you've got data that's static enough that #3 adds a useful amount
of safety, then you might as well be keeping it in a regular table.I think a more relevant question is how complicated it'll be to issue
those fsyncs --- do you have a concrete implementation in mind?
It can reuse most of the infrastructure we use for re-initializing
everything after a crash or unclean shutdown. We just iterate over
every tablepace/dbspace directory and look for files with _init forks.
If we find any then we open the main fork files and fsync() each one.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
On Tue, Dec 7, 2010 at 3:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think a more relevant question is how complicated it'll be to issue
those fsyncs --- do you have a concrete implementation in mind?
It can reuse most of the infrastructure we use for re-initializing
everything after a crash or unclean shutdown. We just iterate over
every tablepace/dbspace directory and look for files with _init forks.
If we find any then we open the main fork files and fsync() each one.
I assume you meant "all the other fork files", but OK. Still, couldn't
that be rather expensive in a large DB?
regards, tom lane
2010/12/7 Robert Haas <robertmhaas@gmail.com>:
On Tue, Dec 7, 2010 at 3:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Tue, Dec 7, 2010 at 1:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Hm... I thought there had been discussion of a couple of different
flavors of table volatility. Is it really a good idea to commandeer
the word "volatile" for this particular one?So far I've come up with the following possible behaviors we could
theoretically implement:1. Any crash or shutdown truncates the table.
2. Any crash truncates the table, but a clean shutdown does not.
3. A crash truncates the table only if it's been written since the
last checkpoint; a clean shutdown does not truncate it.The main argument for doing #1 rather than #2 is that we'd rather not
have to include unlogged table data in checkpoints. Andres Freund
made the argument that we could avoid that anyway, though, by just
doing an fsync() on every unlogged table file in the cluster at
shutdown time. If that's acceptable, then ISTM there's no benefit to
implementing #1 and we should just go with #2. If it's not
acceptable, then we have to think about whether and how to have both
of those behaviors.#3 seems like a lot of work relative to #1 and #2 for a pretty
marginal increase in durability.OK. I agree that #3 adds a lot of complexity for not much of anything.
If you've got data that's static enough that #3 adds a useful amount
of safety, then you might as well be keeping it in a regular table.I think a more relevant question is how complicated it'll be to issue
those fsyncs --- do you have a concrete implementation in mind?It can reuse most of the infrastructure we use for re-initializing
everything after a crash or unclean shutdown. We just iterate over
every tablepace/dbspace directory and look for files with _init forks.
If we find any then we open the main fork files and fsync() each one.
It might make sense to document this behavior : a 'simple' restart
might be way longer than before. I would probably issue a sync(1)
before restarting the server in such situation. (if the
unlogged-volatile tables are large)
--
Cédric Villemain 2ndQuadrant
http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support
On Tue, Dec 7, 2010 at 5:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Tue, Dec 7, 2010 at 3:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think a more relevant question is how complicated it'll be to issue
those fsyncs --- do you have a concrete implementation in mind?It can reuse most of the infrastructure we use for re-initializing
everything after a crash or unclean shutdown. We just iterate over
every tablepace/dbspace directory and look for files with _init forks.
If we find any then we open the main fork files and fsync() each one.I assume you meant "all the other fork files", but OK.
Oh, good point.
Still, couldn't
that be rather expensive in a large DB?
Well, that's why I asked whether it would be acceptable to take that
approach. I'm guessing the overhead isn't too horrible. If you
didn't want to take this approach but did want to survive a clean
shutdown, you would need to fsync everything written since the last
checkpoint. The amount of additional stuff that needs to be written
here is just whatever you failed to write out during previous
checkpoints, which is probably not a ton.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
A very useful feature for unlogged tables would be the ability to
switch them back to normal tables -- this way you could do bulk
loading into an unlogged table and then turn it into a regular table
using just fsync(), bypassing all the WAL-logging overhead. It seems
this could even be implemented in pg_restore itself.
Which brings me to:
On Tue, Dec 7, 2010 at 20:44, Robert Haas <robertmhaas@gmail.com> wrote:
2. Any crash truncates the table, but a clean shutdown does not.
Seems that syncing on a clean shutdown could use the same
infrastructure as the above functionality.
Have you thought about switching unlogged tables back to logged? Are
there any significant obstacles?
Regards,
Marti
On Tue, 2010-12-07 at 13:17 -0500, Tom Lane wrote:
Robert Haas <robertmhaas@gmail.com> writes:
I'm also going to go through and change all instances of the word
"unlogged" to "volatile", per previous discussion. If this seems like
a bad idea to anyone, please object now rather than afterwards.Hm... I thought there had been discussion of a couple of different
flavors of table volatility. Is it really a good idea to commandeer
the word "volatile" for this particular one?
Note that DB2 uses the table modifier VOLATILE to indicate a table that
has a widely fluctuating table size, for example a queue table. It's
used as a declarative optimizer hint. So the term has many possible
meanings.
Prefer UNLOGGED or similar descriptive term.
--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services
On Wed, Dec 8, 2010 at 9:52 AM, Marti Raudsepp <marti@juffo.org> wrote:
Have you thought about switching unlogged tables back to logged? Are
there any significant obstacles?
I think it can be done, and I think it's useful, but I didn't want to
tackle it for version one, because it's not trivial.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wed, Dec 8, 2010 at 10:19 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On Tue, 2010-12-07 at 13:17 -0500, Tom Lane wrote:
Robert Haas <robertmhaas@gmail.com> writes:
I'm also going to go through and change all instances of the word
"unlogged" to "volatile", per previous discussion. If this seems like
a bad idea to anyone, please object now rather than afterwards.Hm... I thought there had been discussion of a couple of different
flavors of table volatility. Is it really a good idea to commandeer
the word "volatile" for this particular one?Note that DB2 uses the table modifier VOLATILE to indicate a table that
has a widely fluctuating table size, for example a queue table. It's
used as a declarative optimizer hint. So the term has many possible
meanings.Prefer UNLOGGED or similar descriptive term.
Hrm. The previous consensus seemed to be in favor of trying to
describe the behavior (your contents might disappear) rather than the
implementation (we don't WAL-log those contents). However, the fact
that DB2 uses that word to mean something entirely different is
certainly a bit awkward, so maybe we should reconsider. Or maybe not.
I'm not sure. Anyone else want to weigh in here?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> wrote:
Simon Riggs <simon@2ndquadrant.com> wrote:
Note that DB2 uses the table modifier VOLATILE to indicate a
table that has a widely fluctuating table size, for example a
queue table.
the fact that DB2 uses that word to mean something entirely
different is certainly a bit awkward
It would be especially awkward should someone port their DB2
database to PostgreSQL without noticing the semantic difference, and
then find their data missing.
so maybe we should reconsider.
+1 for choosing terminology without known conflicts with other
significant products.
-Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
Robert Haas <robertmhaas@gmail.com> wrote:
Simon Riggs <simon@2ndquadrant.com> wrote:
Note that DB2 uses the table modifier VOLATILE to indicate a
table that has a widely fluctuating table size, for example a
queue table.
the fact that DB2 uses that word to mean something entirely
different is certainly a bit awkward
It would be especially awkward should someone port their DB2
database to PostgreSQL without noticing the semantic difference, and
then find their data missing.
Not to mention that DB2 syntax tends to appear in the standard a few
years later.
so maybe we should reconsider.
+1 for choosing terminology without known conflicts with other
significant products.
Yeah. Given this info I'm strongly inclined to stick with UNLOGGED.
regards, tom lane