Point in time recovery: recreating relation files

Started by Marc Munroalmost 24 years ago11 messages
#1Marc Munro
marc@bloodnok.com

The current WAL recovery implementation does not recover newly created
objects such as tables. My suggested patch is:

When XLogOpenRelation fails to open the relation file, if errno is
ENOENT (no file or directory) we shuld attempt to recreate the file
using smgrcreate.

This seems to work fine for tables, indexes and sequences but can anyone
see any potential problems? I have not tried this with Toast tables;
are these handled any differently?

Is it reasonable to assume that recreating the file in this way is
safe? It seems OK to me as we only recreate the file if it does not
already exist, so we are not in danger of making a bad situation worse.

If no-one tells me this is a bad idea, I will submit a patch.

--
Marc marc@bloodnok.com

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marc Munro (#1)
Re: Point in time recovery: recreating relation files

Marc Munro <marc@bloodnok.com> writes:

The current WAL recovery implementation does not recover newly created
objects such as tables. My suggested patch is:

When XLogOpenRelation fails to open the relation file, if errno is
ENOENT (no file or directory) we shuld attempt to recreate the file
using smgrcreate.

No, that's wrong. The missing ingredient is that the WAL log should
explicitly log table creations. (And also table drops.) If you look
you will find some comments showing the places where code is missing.

If you try to do it as you suggest above, then you will erroneously
recreate files that have been dropped.

regards, tom lane

#3Marc Munro
marc@bloodnok.com
In reply to: Tom Lane (#2)
Re: Point in time recovery: recreating relation files

On Wed, 2002-02-27 at 19:44, Tom Lane wrote:

No, that's wrong. The missing ingredient is that the WAL log should
explicitly log table creations. (And also table drops.) If you look
you will find some comments showing the places where code is missing.

If you try to do it as you suggest above, then you will erroneously
recreate files that have been dropped.

OK, that makes sense. I will take another look. Thanks.

--
Marc marc@bloodnok.com

#4Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Tom Lane (#2)
Re: Point in time recovery: recreating relation files

No, that's wrong. The missing ingredient is that the WAL log should
explicitly log table creations. (And also table drops.) If you look
you will find some comments showing the places where code is missing.

I'm wondering where we could record the LSN when creating or dropping
tables.

If you try to do it as you suggest above, then you will erroneously
recreate files that have been dropped.

Yes, but I think we need to compare log's LSN and tables LSN.
--
Tatsuo Ishii

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tatsuo Ishii (#4)
Re: Point in time recovery: recreating relation files

Tatsuo Ishii <t-ishii@sra.co.jp> writes:

No, that's wrong. The missing ingredient is that the WAL log should
explicitly log table creations. (And also table drops.) If you look
you will find some comments showing the places where code is missing.

I'm wondering where we could record the LSN when creating or dropping
tables.

Um, why would that matter?

regards, tom lane

#6Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Tom Lane (#5)
Re: Point in time recovery: recreating relation files

Tatsuo Ishii <t-ishii@sra.co.jp> writes:

No, that's wrong. The missing ingredient is that the WAL log should
explicitly log table creations. (And also table drops.) If you look
you will find some comments showing the places where code is missing.

I'm wondering where we could record the LSN when creating or dropping
tables.

Um, why would that matter?

In my understanding to prevent redo-ing two or more times while in the
recovery process, we need to compare LSN in the object against the LSN
in the WAL log.
--
Tatsuo Ishii

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tatsuo Ishii (#6)
Re: Point in time recovery: recreating relation files

Tatsuo Ishii <t-ishii@sra.co.jp> writes:

I'm wondering where we could record the LSN when creating or dropping
tables.

Um, why would that matter?

In my understanding to prevent redo-ing two or more times while in the
recovery process, we need to compare LSN in the object against the LSN
in the WAL log.

But undo/redo checking on file creation or deletion is trivial: either
the kernel has the file or it doesn't. We do not need any other check
AFAICS.

regards, tom lane

#8Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Tom Lane (#7)
Re: Point in time recovery: recreating relation files

In my understanding to prevent redo-ing two or more times while in the
recovery process, we need to compare LSN in the object against the LSN
in the WAL log.

But undo/redo checking on file creation or deletion is trivial: either
the kernel has the file or it doesn't. We do not need any other check
AFAICS.

Are you saying that the table creation log record would contain a
relfilenode? I'm not sure the relfilenode is same before and after the
recovery if we consider the point time recovery.
--
Tatsuo Ishii

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tatsuo Ishii (#8)
Re: Point in time recovery: recreating relation files

Tatsuo Ishii <t-ishii@sra.co.jp> writes:

But undo/redo checking on file creation or deletion is trivial: either
the kernel has the file or it doesn't. We do not need any other check
AFAICS.

Are you saying that the table creation log record would contain a
relfilenode?

Sure. What else would it contain?

I'm not sure the relfilenode is same before and after the
recovery if we consider the point time recovery.

Considering that all the WAL entries concerning updates to the table
will name it by relfilenode, we'd better be prepared to ensure that
the relfilenode doesn't change over recovery.

regards, tom lane

#10Marc Munro
marc@bloodnok.com
In reply to: Tom Lane (#9)
Re: Point in time recovery: recreating relation files

Could someone explain to this poor newbie (who is hoping to implement
this) exactly what the issue is here? Like Tom, I could originally see
no reason to worry about the LSN for file creation but I am very
concerned that I have failed to grasp Tatsuo's concerns.

Is there some reason why the relfilenode might change either during or
as a result of recovery? Unless I have missed the point again, during
recovery we must recreate files with exactly the same path, name and
relfilenode as they would have originally been created, and in the same
order relative to the creation of the relation. I see no scope for
anything to be different.

On Wed, 2002-03-06 at 21:29, Tom Lane wrote:

Tatsuo Ishii <t-ishii@sra.co.jp> writes:

But undo/redo checking on file creation or deletion is trivial: either
the kernel has the file or it doesn't. We do not need any other check
AFAICS.

Are you saying that the table creation log record would contain a
relfilenode?

Sure. What else would it contain?

I'm not sure the relfilenode is same before and after the
recovery if we consider the point time recovery.

Considering that all the WAL entries concerning updates to the table
will name it by relfilenode, we'd better be prepared to ensure that
the relfilenode doesn't change over recovery.

regards, tom lane

--
Marc marc@bloodnok.com

#11Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Marc Munro (#10)
Re: Point in time recovery: recreating relation files

Could someone explain to this poor newbie (who is hoping to implement
this) exactly what the issue is here? Like Tom, I could originally see
no reason to worry about the LSN for file creation but I am very
concerned that I have failed to grasp Tatsuo's concerns.

Is there some reason why the relfilenode might change either during or
as a result of recovery? Unless I have missed the point again, during
recovery we must recreate files with exactly the same path, name and
relfilenode as they would have originally been created, and in the same
order relative to the creation of the relation. I see no scope for
anything to be different.

Sorry for the confusion. I'm not very familiar with other DBMSs, and I
just don't know what kind of features for point in time recovery in
them could provide. One a scenario I could imagine is recovering
single table with different name. I'm not sure this is implemented by
other DBMS though.

BTW, next issue would be TRUCATE and CREATE/DROP DATABASE.
I regard this is not currently supported by WAL.
--
Tatsuo Ishii