switch UNLOGGED to LOGGED

Started by Leonardo Francalanciabout 15 years ago44 messageshackers
Jump to latest
#1Leonardo Francalanci
m_lists@yahoo.it

Hi,

I read the discussion at

http://archives.postgresql.org/pgsql-hackers/2011-01/msg00315.php

From what I can understand, going from/to unlogged to/from logged in
the wal_level == minimal case is not too complicated.

Suppose I try to write a patch that allows

ALTER TABLE tablename SET LOGGED (or UNLOGGED)
(proper sql wording to be discussed...)

only in the wal_level == minimal case: would it be accepted as a
"first step"? Or rejected because it doesn't allow it in the other
cases?

From what I got in the discussion, handling the other wal_level cases
can be very complicated (example: the issues in case "we *crash*
without writing an abort record").

Leonardo

#2Robert Haas
robertmhaas@gmail.com
In reply to: Leonardo Francalanci (#1)
Re: switch UNLOGGED to LOGGED

On Fri, Apr 8, 2011 at 6:01 AM, Leonardo Francalanci <m_lists@yahoo.it> wrote:

I read the discussion at

http://archives.postgresql.org/pgsql-hackers/2011-01/msg00315.php

From what I can understand, going from/to unlogged to/from logged in
the wal_level == minimal case is not too complicated.

Suppose I try to write a patch that allows

ALTER TABLE tablename SET LOGGED (or UNLOGGED)
(proper sql wording to be discussed...)

only in the wal_level == minimal case: would it be accepted as a
"first step"? Or rejected because it doesn't allow it in the other
cases?

I'm pretty sure we wouldn't accept a patch for a feature that would
only work with wal_level=minimal, but it might be a useful starting
point for someone else to keep hacking on.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#3Leonardo Francalanci
m_lists@yahoo.it
In reply to: Robert Haas (#2)
Re: switch UNLOGGED to LOGGED

I'm pretty sure we wouldn't accept a patch for a feature that would
only work with wal_level=minimal, but it might be a useful starting
point for someone else to keep hacking on.

I understand.

Reading your post at
http://archives.postgresql.org/pgsql-hackers/2011-01/msg00315.php
I thought I got the part:

"what happens if we *crash* without writing an abort record? It
seems like that could leave a stray file around on a standby,
because the current code only cleans things up on the standby
at the start of recovery"

But re-reading it, I don't understand: what's the difference in creating
a new "regular" table and crashing before emitting the abort record,
and converting an unlogged table to logged and crashing before
emitting the abort record? How do the standby servers handle a
"CREATE TABLE" followed by a ROLLBACK if the master crashes
before writing the abort record? I thought that too would "leave a
stray file around on a standby".

Leonardo

#4Robert Haas
robertmhaas@gmail.com
In reply to: Leonardo Francalanci (#3)
Re: switch UNLOGGED to LOGGED

On Sat, Apr 9, 2011 at 3:29 AM, Leonardo Francalanci <m_lists@yahoo.it> wrote:

I'm pretty sure we wouldn't accept a patch for a  feature that would
only work with wal_level=minimal, but it might be a useful  starting
point for someone else to keep hacking on.

I understand.

Reading your post at
http://archives.postgresql.org/pgsql-hackers/2011-01/msg00315.php
I thought I got the part:

"what happens if we *crash* without writing an abort record?  It
seems  like that could leave a stray file around on a standby,
because the  current code only cleans things up on the standby
at the start of  recovery"

But re-reading it, I don't understand: what's the difference in creating
a new "regular" table and crashing before emitting the abort record,
and converting an unlogged table to logged and crashing before
emitting the abort record? How do the standby servers handle a
"CREATE TABLE" followed by a ROLLBACK if the master crashes
before writing the abort record? I thought that too would "leave a
stray file around on a standby".

I've been thinking about the same thing. And AFAICS, your analysis is
correct, though there may be some angle to it I'm not seeing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#5Leonardo Francalanci
m_lists@yahoo.it
In reply to: Robert Haas (#4)
Re: switch UNLOGGED to LOGGED

But re-reading it, I don't understand: what's the difference in creating
a new "regular" table and crashing before emitting the abort record,
and converting an unlogged table to logged and crashing before
emitting the abort record? How do the standby servers handle a
"CREATE TABLE" followed by a ROLLBACK if the master crashes
before writing the abort record? I thought that too would "leave a
stray file around on a standby".

I've been thinking about the same thing. And AFAICS, your analysis is
correct, though there may be some angle to it I'm not seeing.

Anyone else? I would like to know if what I'm trying to do is, in fact,
possible... otherwise starting with thewal_level=minimal case first
will be wasted effort in case the other cases can't be integrated
somehow...

Leonardo

#6Noah Misch
noah@leadboat.com
In reply to: Leonardo Francalanci (#5)
Re: switch UNLOGGED to LOGGED

On Mon, Apr 11, 2011 at 11:41:18AM +0100, Leonardo Francalanci wrote:

But re-reading it, I don't understand: what's the difference in creating
a new "regular" table and crashing before emitting the abort record,
and converting an unlogged table to logged and crashing before
emitting the abort record? How do the standby servers handle a
"CREATE TABLE" followed by a ROLLBACK if the master crashes
before writing the abort record? I thought that too would "leave a
stray file around on a standby".

I've been thinking about the same thing. And AFAICS, your analysis is
correct, though there may be some angle to it I'm not seeing.

Anyone else? I would like to know if what I'm trying to do is, in fact,
possible... otherwise starting with thewal_level=minimal case first
will be wasted effort in case the other cases can't be integrated
somehow...

If the master crashes while a transaction that used CREATE TABLE is unfinished,
both the master and the standby will indefinitely retain identical, stray (not
referenced by pg_class) files. The catalogs do reference the relfilenode of
each unlogged relation; currently, that relfilenode never exists on a standby
while that standby is accepting connections. By the time the startup process
releases the AccessExclusiveLock acquired by the proposed UNLOGGED -> normal
conversion process, that relfilenode needs to be either fully copied or unlinked
all over again. (Alternately, find some other way to make sure queries don't
read the half-copied file.) In effect, the problem is that the relfilenode is
*not* stray, so its final state does need to be well-defined.

nm

#7Robert Haas
robertmhaas@gmail.com
In reply to: Noah Misch (#6)
Re: switch UNLOGGED to LOGGED

On Mon, Apr 11, 2011 at 10:29 AM, Noah Misch <noah@leadboat.com> wrote:

On Mon, Apr 11, 2011 at 11:41:18AM +0100, Leonardo Francalanci wrote:

But re-reading  it, I don't understand: what's the difference in creating
a new  "regular" table and crashing before emitting the abort record,
and  converting an unlogged table to logged and crashing before
emitting the  abort record? How do the standby servers handle a
"CREATE TABLE"  followed by a ROLLBACK if the master crashes
before writing the abort  record? I thought that too would "leave a
stray file around on a  standby".

I've been thinking about the same thing.  And AFAICS, your  analysis is
correct, though there may be some angle to it I'm not  seeing.

Anyone else? I would like to know if what I'm trying to do is, in fact,
possible... otherwise starting with thewal_level=minimal case first
will be wasted effort in case the other cases can't be integrated
somehow...

If the master crashes while a transaction that used CREATE TABLE is unfinished,
both the master and the standby will indefinitely retain identical, stray (not
referenced by pg_class) files.  The catalogs do reference the relfilenode of
each unlogged relation; currently, that relfilenode never exists on a standby
while that standby is accepting connections.  By the time the startup process
releases the AccessExclusiveLock acquired by the proposed UNLOGGED -> normal
conversion process, that relfilenode needs to be either fully copied or unlinked
all over again.  (Alternately, find some other way to make sure queries don't
read the half-copied file.)  In effect, the problem is that the relfilenode is
*not* stray, so its final state does need to be well-defined.

Oh, right.

Maybe we should just put in a rule that a server in Hot Standby mode
won't ever try to read from an unlogged table (right now we count on
the fact that there will be nothing to read). If we crash before
copying the whole file, it won't matter, because the catalogs won't
have been updated, so we'll refuse to look at it anyway. And we have
to reinitialize on entering normal running anyway, so we can clean it
up then.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#8Leonardo Francalanci
m_lists@yahoo.it
In reply to: Robert Haas (#7)
Re: switch UNLOGGED to LOGGED

If the master crashes while a transaction that used CREATE TABLE is

unfinished,

both the master and the standby will indefinitely retain identical, stray

(not

referenced by pg_class) files. The catalogs do reference the relfilenode

of

each unlogged relation; currently, that relfilenode never exists on a

standby

while that standby is accepting connections. By the time the startup

process

releases the AccessExclusiveLock acquired by the proposed UNLOGGED ->

normal

conversion process, that relfilenode needs to be either fully copied or

unlinked

all over again. (Alternately, find some other way to make sure queries

don't

read the half-copied file.) In effect, the problem is that the relfilenode

is

*not* stray, so its final state does need to be well-defined.

Oh, right.

Maybe we should just put in a rule that a server in Hot Standby mode
won't ever try to read from an unlogged table (right now we count on
the fact that there will be nothing to read). If we crash before
copying the whole file, it won't matter, because the catalogs won't
have been updated, so we'll refuse to look at it anyway. And we have
to reinitialize on entering normal running anyway, so we can clean it
up then.

Ok then... I'll try to code the "easy" version first (the wal_level=minimal
case)
and then we'll see....

Leonardo

#9Leonardo Francalanci
m_lists@yahoo.it
In reply to: Leonardo Francalanci (#8)
Re: switch UNLOGGED to LOGGED

I think I coded a very basic version of the UNLOGGED to LOGGED patch
(only wal_level=minimal case for the moment).

To remove the INIT fork, I changed somehow PendingRelDelete to have
a flag "bool onlyInitFork" so that the delete would remove only the INIT
fork at commit.

Everything "works" (note the quotes...) except in the case of not-clean
shutdown ("-m immediate" to pg_ctl stop). The reason is that the replay
code doesn't have any idea that it has to remove only the INIT fork: it will
remove ALL forks; so at the end of the redo procedure (at startup, after
a "pg_ctl -m immediate stop") the table doesn't have any forks at all.

See xact_redo_commit:

/* Make sure files supposed to be dropped are dropped */
for (i = 0; i < xlrec->nrels; i++)
{
[...]
for (fork = 0; fork <= MAX_FORKNUM; fork++)
{
if (smgrexists(srel, fork))
{
XLogDropRelation(xlrec->xnodes[i], fork);
smgrdounlink(srel, fork, true);
}
}
smgrclose(srel);
}
[...]

Should I change xl_xact_commit in order to have, instead of:

/* Array of RelFileNode(s) to drop at commit */
RelFileNode xnodes[1]; /* VARIABLE LENGTH ARRAY */

an array of structures such as:

{
RelFileNode relfilenode;
bool onlyInitFork;
}

???

Otherwise I don't know how to tell the redo commit code to delete only
the INIT fork, instead of all the relation forks...
(maybe I'm doing all wrong: I'm open to any kind of suggestion here...)

Leonardo

#10Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Leonardo Francalanci (#9)
Re: switch UNLOGGED to LOGGED

Excerpts from Leonardo Francalanci's message of lun abr 18 09:36:13 -0300 2011:

I think I coded a very basic version of the UNLOGGED to LOGGED patch
(only wal_level=minimal case for the moment).

To remove the INIT fork, I changed somehow PendingRelDelete to have
a flag "bool onlyInitFork" so that the delete would remove only the INIT
fork at commit.

Everything "works" (note the quotes...) except in the case of not-clean
shutdown ("-m immediate" to pg_ctl stop). The reason is that the replay
code doesn't have any idea that it has to remove only the INIT fork: it will
remove ALL forks; so at the end of the redo procedure (at startup, after
a "pg_ctl -m immediate stop") the table doesn't have any forks at all.

Maybe you should change xl_act_commit to have a separate list of rels to
drop the init fork for (instead of mixing those with the list of files to
drop as a whole).

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#11Leonardo Francalanci
m_lists@yahoo.it
In reply to: Alvaro Herrera (#10)
Re: switch UNLOGGED to LOGGED

Maybe you should change xl_act_commit to have a separate list of rels to
drop the init fork for (instead of mixing those with the list of files to
drop as a whole).

I tried to follow your suggestion, thank you very much.

Here's a first attempt at the patch.

I "tested" it with:

create table forei (v integer primary key);
insert into forei select * from generate_series(1,10000);
create unlogged table pun (c integer primary key, constraint
con foreign key (c) references forei(v));
insert into pun select * from generate_series(1,10000);
alter table pun set logged;

then shutdown the master with "immediate":

bin/pg_ctl -D data -m immediate stop
bin/pg_ctl -D data start

and "pun" still has data:

select * from pun where c=100;

Question/comments:

1) it's a very-first-stage patch; I would need to know if something is
*very* wrong before cleaning it.
2) there are some things I implemented using a logic like "let's see how it
worked 10 lines above, and I'll do the same". For example, the 2PC stuff
is totally "copied" from the other places, I have no idea if the code makes
sense at all (how can I test it?)
3) Should we have a "cascade" option? I don't know if I have to handle
inherited tables and other dependent objects
4) During the check for dependencies problems, I stop as soon as I find an
error; would it be enough?

Leonardo

Attachments:

unl2log_20110422.patchapplication/octet-stream; name=unl2log_20110422.patchDownload+386-30
#12Robert Haas
robertmhaas@gmail.com
In reply to: Leonardo Francalanci (#11)
Re: switch UNLOGGED to LOGGED

On Fri, Apr 22, 2011 at 4:13 AM, Leonardo Francalanci <m_lists@yahoo.it> wrote:

Maybe you should change  xl_act_commit to have a separate list of rels to
drop the init fork for  (instead of mixing those with the list of files to
drop as a  whole).

I tried to follow your suggestion, thank you very much.

I have to admit I don't like this approach very much. I can't see
adding 4 bytes to every commit record for this feature.

3) Should we have a "cascade" option? I don't know if I have to handle
inherited tables and other dependent objects

Look at the way ALTER TABLE [ONLY] works for other action types, and copy it.

4) During the check for dependencies problems, I stop as soon as I find an
error; would it be enough?

It's a bit awkwardly phrased the way you have it. I would suggest
something like:

ERROR: constraints on permanent tables may reference only permanent tables
HINT: constraint %s

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#13Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#12)
Re: switch UNLOGGED to LOGGED

On Fri, May 6, 2011 at 10:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:

ERROR:  constraints on permanent tables may reference only permanent tables
HINT:  constraint %s

Argh, hit send too soon.

HINT: constraint %s references table %s

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#13)
Re: switch UNLOGGED to LOGGED

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, May 6, 2011 at 10:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:

ERROR: �constraints on permanent tables may reference only permanent tables
HINT: �constraint %s

Argh, hit send too soon.

HINT: constraint %s references table %s

<nitpick>
That looks like a DETAIL, not a HINT. Also see message style guide
about how that should be a complete sentence.
</nitpick>

regards, tom lane

#15Leonardo Francalanci
m_lists@yahoo.it
In reply to: Robert Haas (#12)
Re: switch UNLOGGED to LOGGED

On Fri, Apr 22, 2011 at 4:13 AM, Leonardo Francalanci <m_lists@yahoo.it>

wrote:

Maybe you should change xl_act_commit to have a separate list of rels to
drop the init fork for (instead of mixing those with the list of files

to

drop as a whole).

I tried to follow your suggestion, thank you very much.

I have to admit I don't like this approach very much. I can't see
adding 4 bytes to every commit record for this feature.

I understand.

What if, in xl_xact_commit, instead of

RelFileNode xnodes

I use another struct for xnodes, something like:

{
RelFileNode xnode;
bool onlyInitFork;
}

That would increase the commit record size only when there are
RelFileNode(s) to drop at commit. So, instead of 4 bytes in
every commit, there are "wasted" bytes when the commit record
contains deleted permanent relations (that should happen much
less). I'm open to suggestions here...

3) Should we have a "cascade" option? I don't know if I have to handle
inherited tables and other dependent objects

Look at the way ALTER TABLE [ONLY] works for other action types, and copy it.

Ok

Thank you very much

Leonardo

#16Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#12)
Re: switch UNLOGGED to LOGGED

Excerpts from Robert Haas's message of vie may 06 23:25:09 -0300 2011:

On Fri, Apr 22, 2011 at 4:13 AM, Leonardo Francalanci <m_lists@yahoo.it> wrote:

Maybe you should change  xl_act_commit to have a separate list of rels to
drop the init fork for  (instead of mixing those with the list of files to
drop as a  whole).

I tried to follow your suggestion, thank you very much.

I have to admit I don't like this approach very much. I can't see
adding 4 bytes to every commit record for this feature.

Hmm, yeah. Maybe we can add a "flags" int8 somewhere in that struct and
set a bit in it if nrels, nsubxacts, nmsgs and respective arrays are present.
That would save some int's that are already in there.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#17Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#16)
Re: switch UNLOGGED to LOGGED

On Mon, May 9, 2011 at 12:51 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

Excerpts from Robert Haas's message of vie may 06 23:25:09 -0300 2011:

On Fri, Apr 22, 2011 at 4:13 AM, Leonardo Francalanci <m_lists@yahoo.it> wrote:

Maybe you should change  xl_act_commit to have a separate list of rels to
drop the init fork for  (instead of mixing those with the list of files to
drop as a  whole).

I tried to follow your suggestion, thank you very much.

I have to admit I don't like this approach very much.  I can't see
adding 4 bytes to every commit record for this feature.

Hmm, yeah.  Maybe we can add a "flags" int8 somewhere in that struct and
set a bit in it if nrels, nsubxacts, nmsgs and respective arrays are present.
That would save some int's that are already in there.

Yes, that seems like a very appealing approach. There is plenty of
bit-space available in xinfo, and we could reserve a bit each for
nrels, nsubxacts, and nmsgs, with set meaning that an integer count of
that item is present and clear meaning that the count is omitted from
the structure (and zero). This will probably require a bit of tricky
code reorganization so I think it should be done separately from the
main patch. With that done, then it's not a big deal for the main
patch to add in one more array that will normally get omitted. And in
the process, we can save 12 bytes on every commit record in the common
case, which is quite appealing: I don't expect a huge performance
gain, but a penny saved is a penny earned.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#18Leonardo Francalanci
m_lists@yahoo.it
In reply to: Robert Haas (#17)
Re: switch UNLOGGED to LOGGED

Yes, that seems like a very appealing approach. There is plenty of
bit-space available in xinfo, and we could reserve a bit each for
nrels, nsubxacts, and nmsgs, with set meaning that an integer count of
that item is present and clear meaning that the count is omitted from
the structure (and zero). This will probably require a bit of tricky
code reorganization so I think it should be done separately from the
main patch.

Ok, I'll try and send a patch with this change only.
BTW xinfo is 32 bit long, but I think only 2 bits are used right now?
I think I can make it a 8 bits, and add another 8 bits for nrels,
nsubxacts, and nmsgs and the new thing. That should save
another 2 bytes, while leaving space for extention. Or we can make
it a 8 bits only, but only 2 bits would be left "empty" for future
extentions; I don't know if we care about it...

Leonardo

#19Robert Haas
robertmhaas@gmail.com
In reply to: Leonardo Francalanci (#18)
Re: switch UNLOGGED to LOGGED

On Tue, May 10, 2011 at 3:35 AM, Leonardo Francalanci <m_lists@yahoo.it> wrote:

Yes, that seems like a very appealing approach.   There is plenty of
bit-space available in xinfo, and we could reserve a bit  each for
nrels, nsubxacts, and nmsgs, with set meaning that an integer count  of
that item is present and clear meaning that the count is omitted  from
the structure (and zero).  This will probably require a bit of  tricky
code reorganization so I think it should be done separately from  the
main patch.

Ok, I'll try and send a patch with this change only.
BTW  xinfo  is 32 bit long, but I think only 2 bits are used right now?
I think I can make it a 8 bits, and add another 8 bits for nrels,
nsubxacts, and nmsgs and the new thing. That should save
another 2 bytes, while leaving space for extention. Or we can make
it a 8 bits only, but only 2 bits would be left "empty" for future
extentions; I don't know if we care about it...

I don't think making xinfo shorter will save anything, because
whatever follows it is going to be a 4-byte quantity and therefore
4-byte aligned.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#20Leonardo Francalanci
m_lists@yahoo.it
In reply to: Robert Haas (#19)
Re: switch UNLOGGED to LOGGED

I don't think making xinfo shorter will save anything, because
whatever follows it is going to be a 4-byte quantity and therefore
4-byte aligned.

ups, didn't notice it.

I'll split xinfo into:

uint16 xinfo;
uint16 presentFlags;

I guess it helps with the reading? I mean, instead
of having a single uint32?

#21Robert Haas
robertmhaas@gmail.com
In reply to: Leonardo Francalanci (#20)
#22Leonardo Francalanci
m_lists@yahoo.it
In reply to: Noah Misch (#6)
#23Noah Misch
noah@leadboat.com
In reply to: Leonardo Francalanci (#22)
#24Leonardo Francalanci
m_lists@yahoo.it
In reply to: Noah Misch (#23)
#25Noah Misch
noah@leadboat.com
In reply to: Leonardo Francalanci (#24)
#26Leonardo Francalanci
m_lists@yahoo.it
In reply to: Noah Misch (#25)
#27Noah Misch
noah@leadboat.com
In reply to: Leonardo Francalanci (#26)
#28Robert Haas
robertmhaas@gmail.com
In reply to: Noah Misch (#27)
#29Noah Misch
noah@leadboat.com
In reply to: Robert Haas (#28)
#30Robert Haas
robertmhaas@gmail.com
In reply to: Noah Misch (#29)
#31Noah Misch
noah@leadboat.com
In reply to: Robert Haas (#30)
#32Leonardo Francalanci
m_lists@yahoo.it
In reply to: Robert Haas (#30)
#33Leonardo Francalanci
m_lists@yahoo.it
In reply to: Leonardo Francalanci (#32)
#34Noah Misch
noah@leadboat.com
In reply to: Leonardo Francalanci (#32)
#35Leonardo Francalanci
m_lists@yahoo.it
In reply to: Noah Misch (#34)
#36Noah Misch
noah@leadboat.com
In reply to: Leonardo Francalanci (#35)
#37Robert Haas
robertmhaas@gmail.com
In reply to: Noah Misch (#36)
#38Noah Misch
noah@leadboat.com
In reply to: Robert Haas (#37)
#39Leonardo Francalanci
m_lists@yahoo.it
In reply to: Robert Haas (#37)
#40Robert Haas
robertmhaas@gmail.com
In reply to: Noah Misch (#38)
#41Leonardo Francalanci
m_lists@yahoo.it
In reply to: Robert Haas (#40)
#42Robert Haas
robertmhaas@gmail.com
In reply to: Leonardo Francalanci (#41)
#43Leonardo Francalanci
m_lists@yahoo.it
In reply to: Robert Haas (#42)
#44Robert Haas
robertmhaas@gmail.com
In reply to: Leonardo Francalanci (#43)