Point in time recovery: recreating relation files
The current WAL recovery implementation does not recover newly created
objects such as tables. My suggested patch is:
When XLogOpenRelation fails to open the relation file, if errno is
ENOENT (no file or directory) we shuld attempt to recreate the file
using smgrcreate.
This seems to work fine for tables, indexes and sequences but can anyone
see any potential problems? I have not tried this with Toast tables;
are these handled any differently?
Is it reasonable to assume that recreating the file in this way is
safe? It seems OK to me as we only recreate the file if it does not
already exist, so we are not in danger of making a bad situation worse.
If no-one tells me this is a bad idea, I will submit a patch.
--
Marc marc@bloodnok.com
Marc Munro <marc@bloodnok.com> writes:
The current WAL recovery implementation does not recover newly created
objects such as tables. My suggested patch is:
When XLogOpenRelation fails to open the relation file, if errno is
ENOENT (no file or directory) we shuld attempt to recreate the file
using smgrcreate.
No, that's wrong. The missing ingredient is that the WAL log should
explicitly log table creations. (And also table drops.) If you look
you will find some comments showing the places where code is missing.
If you try to do it as you suggest above, then you will erroneously
recreate files that have been dropped.
regards, tom lane
On Wed, 2002-02-27 at 19:44, Tom Lane wrote:
No, that's wrong. The missing ingredient is that the WAL log should
explicitly log table creations. (And also table drops.) If you look
you will find some comments showing the places where code is missing.If you try to do it as you suggest above, then you will erroneously
recreate files that have been dropped.
OK, that makes sense. I will take another look. Thanks.
--
Marc marc@bloodnok.com
No, that's wrong. The missing ingredient is that the WAL log should
explicitly log table creations. (And also table drops.) If you look
you will find some comments showing the places where code is missing.
I'm wondering where we could record the LSN when creating or dropping
tables.
If you try to do it as you suggest above, then you will erroneously
recreate files that have been dropped.
Yes, but I think we need to compare log's LSN and tables LSN.
--
Tatsuo Ishii
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
No, that's wrong. The missing ingredient is that the WAL log should
explicitly log table creations. (And also table drops.) If you look
you will find some comments showing the places where code is missing.
I'm wondering where we could record the LSN when creating or dropping
tables.
Um, why would that matter?
regards, tom lane
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
No, that's wrong. The missing ingredient is that the WAL log should
explicitly log table creations. (And also table drops.) If you look
you will find some comments showing the places where code is missing.I'm wondering where we could record the LSN when creating or dropping
tables.Um, why would that matter?
In my understanding to prevent redo-ing two or more times while in the
recovery process, we need to compare LSN in the object against the LSN
in the WAL log.
--
Tatsuo Ishii
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
I'm wondering where we could record the LSN when creating or dropping
tables.Um, why would that matter?
In my understanding to prevent redo-ing two or more times while in the
recovery process, we need to compare LSN in the object against the LSN
in the WAL log.
But undo/redo checking on file creation or deletion is trivial: either
the kernel has the file or it doesn't. We do not need any other check
AFAICS.
regards, tom lane
In my understanding to prevent redo-ing two or more times while in the
recovery process, we need to compare LSN in the object against the LSN
in the WAL log.But undo/redo checking on file creation or deletion is trivial: either
the kernel has the file or it doesn't. We do not need any other check
AFAICS.
Are you saying that the table creation log record would contain a
relfilenode? I'm not sure the relfilenode is same before and after the
recovery if we consider the point time recovery.
--
Tatsuo Ishii
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
But undo/redo checking on file creation or deletion is trivial: either
the kernel has the file or it doesn't. We do not need any other check
AFAICS.
Are you saying that the table creation log record would contain a
relfilenode?
Sure. What else would it contain?
I'm not sure the relfilenode is same before and after the
recovery if we consider the point time recovery.
Considering that all the WAL entries concerning updates to the table
will name it by relfilenode, we'd better be prepared to ensure that
the relfilenode doesn't change over recovery.
regards, tom lane
Could someone explain to this poor newbie (who is hoping to implement
this) exactly what the issue is here? Like Tom, I could originally see
no reason to worry about the LSN for file creation but I am very
concerned that I have failed to grasp Tatsuo's concerns.
Is there some reason why the relfilenode might change either during or
as a result of recovery? Unless I have missed the point again, during
recovery we must recreate files with exactly the same path, name and
relfilenode as they would have originally been created, and in the same
order relative to the creation of the relation. I see no scope for
anything to be different.
On Wed, 2002-03-06 at 21:29, Tom Lane wrote:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
But undo/redo checking on file creation or deletion is trivial: either
the kernel has the file or it doesn't. We do not need any other check
AFAICS.Are you saying that the table creation log record would contain a
relfilenode?Sure. What else would it contain?
I'm not sure the relfilenode is same before and after the
recovery if we consider the point time recovery.Considering that all the WAL entries concerning updates to the table
will name it by relfilenode, we'd better be prepared to ensure that
the relfilenode doesn't change over recovery.regards, tom lane
--
Marc marc@bloodnok.com
Could someone explain to this poor newbie (who is hoping to implement
this) exactly what the issue is here? Like Tom, I could originally see
no reason to worry about the LSN for file creation but I am very
concerned that I have failed to grasp Tatsuo's concerns.Is there some reason why the relfilenode might change either during or
as a result of recovery? Unless I have missed the point again, during
recovery we must recreate files with exactly the same path, name and
relfilenode as they would have originally been created, and in the same
order relative to the creation of the relation. I see no scope for
anything to be different.
Sorry for the confusion. I'm not very familiar with other DBMSs, and I
just don't know what kind of features for point in time recovery in
them could provide. One a scenario I could imagine is recovering
single table with different name. I'm not sure this is implemented by
other DBMS though.
BTW, next issue would be TRUCATE and CREATE/DROP DATABASE.
I regard this is not currently supported by WAL.
--
Tatsuo Ishii