automatic restore point
Hi, I'm a newbie to the hackers but I'd like to propose the "automatic restore point" feature.
This feature automatically create backup label just before making a huge change to DB. It's useful when this change is accidental case.
The following is a description of "automatic restore point".
【Background】
When DBA's operation failure, for example DBA accidently drop table, the database is restored from the file system backup and recovered by using time or transaction ID. The transaction ID is identified from WAL.
But below are the following problems in using time or transaction ID.
-Time
・Need to memorize the time of failure operation.
(It is possible to identify the time from WAL. But it takes time and effort to identify the time.)
・Difficult to specify detail point.
-Transaction ID
・It takes time and effort to identify the transaction ID.
In order to solve the above problem,
I'd like propose a feature to implement automatic recording function of recovery point.
【Feature Description】
In PostgreSQL, there is a backup control function "pg_create_restore_point()".
User can create a named point for performing restore by using "pg_create_restore_point()".
And user can recover by using the named point.
So, execute "pg_create_restore_point()" automatically before executing the following command to create a point for performing restore(recovery point).
The name of recovery point is the date and time when the command was executed.
In this operation, target resource (database name, table name) and recovery point name are output as a message to PostgreSQL server log.
- Commands wherein this feature can be appended
・TRUNCATE
・DROP
・DELETE(Without WHERE clause)
・UPDATE(Without WHERE clause)
・COPY FROM
【How to use】
1) When executing the above command, identify the command and recovery point name that matches the resource indicating the operation failure from the server log.
ex)Message for executing TRUNCATE at 2018/6/1 12:30:30 (database name:testdb, table name:testtb)
set recovery point. operation = 'truncate'
database = 'testdb' relation = 'testtb' recovery_point_name = '2018-06-01-12:30:30'
2) Implement PostgreSQL document '25 .3.4.Recovering Using a Continuous Archive Backup.'
※Set "recovery_target_name = 'recovery_point name'" at recovery.conf.
【Setting file】
Set postgres.conf.
auto_create_restore_point = on # Switch on/off automatic recording function of recovery point. The default value is 'off'.
So what do you think about it? Do you think is it useful?
Also, when recovering with the current specification, tables other than the returned table also return to the state of the specified recovery point.
So, I’m looking for ways to recover only specific tables. Do you have any ideas?
------
Naoki Yotsunaga
On Mon, Jun 25, 2018 at 6:17 PM, Yotsunaga, Naoki <
yotsunaga.naoki@jp.fujitsu.com> wrote:
So what do you think about it? Do you think is it useful?
The cost/benefit ratio seems low...
Also, when recovering with the current specification, tables other than the
returned table also return to the state of the specified recovery point.
So, I’m looking for ways to recover only specific tables. Do you have any
ideas?
...and this lowers it even further.
I'd rather spend effort making the initial execution of said commands less
likely. Something like:
TRUNCATE table YES_I_REALLY_WANT_TO_DO_THIS;
which will fail if you don't add the keyword "YES_I..." to the end of the
command and the system was setup to require it.
Or, less annoyingly:
BEGIN;
SET LOCAL perform_dangerous_action = true; --can we require local?
TRUNCATE table;
COMMIT;
David J.
On 25 June 2018 at 21:33, David G. Johnston <david.g.johnston@gmail.com>
wrote:
On Mon, Jun 25, 2018 at 6:17 PM, Yotsunaga, Naoki <
yotsunaga.naoki@jp.fujitsu.com> wrote:
So what do you think about it? Do you think is it useful?
I'd rather spend effort making the initial execution of said commands less
likely. Something like:TRUNCATE table YES_I_REALLY_WANT_TO_DO_THIS;
I think an optional setting making DELETE and UPDATE without a WHERE clause
illegal would be handy. Obviously this would have to be optional for
backward compatibility. Perhaps even just a GUC setting, with the intent
being that one would set it in .psqlrc so that omitting the WHERE clause at
the command line would just be a syntax error. If one actually does need to
affect the whole table one can just say WHERE TRUE. For applications, which
presumably have their SQL queries tightly controlled and pre-written
anyway, this would most likely not be particularly useful.
Why not use auto commit off in the session or .psqlrc file or begin and then use rollback? \set AUTOCOMMIT off
What would be nice is if a syntax error didn’t abort the transaction when auto commit is off — being a bad typist.
On Tue, Jun 26, 2018 at 12:04:59AM -0400, Rui DeSousa wrote:
Why not use auto commit off in the session or .psqlrc file or begin and then use rollback? \set AUTOCOMMIT off
What would be nice is if a syntax error didn’t abort the transaction when auto commit is off — being a bad typist.
I think you'll get that behavior with ON_ERROR_ROLLBACK.
Justin
On Jun 26, 2018, at 12:37 AM, Justin Pryzby <pryzby@telsasoft.com> wrote:
I think you'll get that behavior with ON_ERROR_ROLLBACK.
Awesome. Thanks!
On Mon, Jun 25, 2018 at 11:01:06PM -0400, Isaac Morland wrote:
I think an optional setting making DELETE and UPDATE without a WHERE clause
illegal would be handy. Obviously this would have to be optional for
backward compatibility. Perhaps even just a GUC setting, with the intent
being that one would set it in .psqlrc so that omitting the WHERE clause at
the command line would just be a syntax error. If one actually does need to
affect the whole table one can just say WHERE TRUE. For applications, which
presumably have their SQL queries tightly controlled and pre-written
anyway, this would most likely not be particularly useful.
There was a patch doing exactly that which was discussed last year:
https://commitfest.postgresql.org/13/948/
/messages/by-id/20160721045746.GA25043@fetter.org
What was proposed was rather limiting though, see my messages on the
thread. Using a hook, that's simple enough to develop an extension
which does that.
--
Michael
On Tue, Jun 26, 2018 at 01:17:31AM +0000, Yotsunaga, Naoki wrote:
The following is a description of "automatic restore point".
【Background】
When DBA's operation failure, for example DBA accidently drop table,
the database is restored from the file system backup and recovered by
using time or transaction ID. The transaction ID is identified from
WAL.In order to solve the above problem,
I'd like propose a feature to implement automatic recording function
of recovery point.
There is also recovery_target_lsn which is new as of v10. This
parameter is way better than having to track down time or XID, which is
a reason why I developped it. Please note that this is also one of the
reasons why it is possible to delay WAL replays on standbys, so as an
operator has room to fix such operator errors. Having of course cold
backups with a proper WAL archive and a correct retention policy never
hurts.
【Setting file】
Set postgres.conf.
auto_create_restore_point = on # Switch on/off automatic recording
function of recovery point. The default value is 'off'.So what do you think about it? Do you think is it useful?
So basically what you are looking for here is a way to enforce a restore
point to be created depending on a set of pre-defined conditions? How
would you define and choose those?
Also, when recovering with the current specification, tables other
than the returned table also return to the state of the specified
recovery point.
So, I’m looking for ways to recover only specific tables. Do you have
any ideas?
Why not using the utility hook which filters out for commands you'd
like to forbid, in this case TRUNCATE or a DROP TABLE on a given
relation? Or why not simply using an event trigger at your application
level so as you can actually *prevent* the error to happen first? With
the last option you don't have to write C code, but this would not
filter TRUNCATE. In short, what you propose looks over-complicated to
me and there are options on the table which allow the problem you are
trying to solve to not happen at all. You could also use the utility
hook to log or register somewhere hte XID/time/LSN associated to a given
command and then use it as your restore point. This could also happen
out of core.
--
Michael
Hi. Thanks for comments.
Explanation of the background of the function proposal was inadequate.
So, I explain again.
I assume the following situation.
User needs to make a quick, seemingly simple fix to an important production database. User composes the query, gives it an once-over, and lets it run. Seconds later user realizes that user forgot the WHERE clause, dropped the wrong table, or made another serious mistake, and interrupts the query, but the damage has been done.
Also user did not record the time and did not look at a lsn position.
Certainly, I thought about reducing the possibility of executing the wrong command, but I thought that the possibility could not be completely eliminated.
So I proposed the “automatic restore point”.
With this function, user can recover quickly and reliably even if you perform a failure operation.
I'd rather spend effort making the initial execution of said commands less likely.
I think that the function to prohibit DELETE and UPDATE without a WHERE clause in the later response is good way.
But I think that it is impossible to completely eliminate the failure of the other commands.
For example, drop the wrong table.
-----
Naoki Yotsunaga
-----Original Message-----
From: Michael Paquier [mailto:michael@paquier.xyz]
Sent: Tuesday, June 26, 2018 2:16 PM
To: Isaac Morland <isaac.morland@gmail.com>
Cc: David G. Johnston <david.g.johnston@gmail.com>; Yotsunaga, Naoki/四ツ永 直輝 <yotsunaga.naoki@jp.fujitsu.com>; Postgres hackers <pgsql-hackers@postgresql.org>
Subject: Re: automatic restore point
On Mon, Jun 25, 2018 at 11:01:06PM -0400, Isaac Morland wrote:
I think an optional setting making DELETE and UPDATE without a WHERE
clause illegal would be handy. Obviously this would have to be
optional for backward compatibility. Perhaps even just a GUC setting,
with the intent being that one would set it in .psqlrc so that
omitting the WHERE clause at the command line would just be a syntax
error. If one actually does need to affect the whole table one can
just say WHERE TRUE. For applications, which presumably have their SQL
queries tightly controlled and pre-written anyway, this would most likely not be particularly useful.
There was a patch doing exactly that which was discussed last year:
https://commitfest.postgresql.org/13/948/
/messages/by-id/20160721045746.GA25043@fetter.org
What was proposed was rather limiting though, see my messages on the thread. Using a hook, that's simple enough to develop an extension which does that.
--
Michael
Hi. Thanks for comments.
There is also recovery_target_lsn which is new as of v10.
In this method, it is necessary to look at a lsn position before operating.
But I assume the user who did not look it before operating.
So I think that this method is not appropriate.
So basically what you are looking for here is a way to enforce a restore point to be created depending on a set of pre-defined conditions?
How would you define and choose those?
I understand that I was asked how to set up a command to apply this function.
Ex) DROP = on
TRUNCATE = off
Is my interpretation right?
If my interpretation is correct, all the above commands will be applied.
When this function is turned on, this function works when all the above commands are executed.
-------
Naoki Yotsynaga
-----Original Message-----
From: Michael Paquier [mailto:michael@paquier.xyz]
Sent: Tuesday, June 26, 2018 2:31 PM
To: Yotsunaga, Naoki/四ツ永 直輝 <yotsunaga.naoki@jp.fujitsu.com>
Cc: Postgres hackers <pgsql-hackers@postgresql.org>
Subject: Re: automatic restore point
On Tue, Jun 26, 2018 at 01:17:31AM +0000, Yotsunaga, Naoki wrote:
The following is a description of "automatic restore point".
【Background】
When DBA's operation failure, for example DBA accidently drop table,
the database is restored from the file system backup and recovered by
using time or transaction ID. The transaction ID is identified from
WAL.In order to solve the above problem,
I'd like propose a feature to implement automatic recording function
of recovery point.
There is also recovery_target_lsn which is new as of v10. This parameter is way better than having to track down time or XID, which is a reason why I developped it. Please note that this is also one of the reasons why it is possible to delay WAL replays on standbys, so as an operator has room to fix such operator errors. Having of course cold backups with a proper WAL archive and a correct retention policy never hurts.
【Setting file】
Set postgres.conf.
auto_create_restore_point = on # Switch on/off automatic recording
function of recovery point. The default value is 'off'.So what do you think about it? Do you think is it useful?
So basically what you are looking for here is a way to enforce a restore point to be created depending on a set of pre-defined conditions? How would you define and choose those?
Also, when recovering with the current specification, tables other
than the returned table also return to the state of the specified
recovery point.
So, I’m looking for ways to recover only specific tables. Do you have
any ideas?
Why not using the utility hook which filters out for commands you'd like to forbid, in this case TRUNCATE or a DROP TABLE on a given relation? Or why not simply using an event trigger at your application level so as you can actually *prevent* the error to happen first? With the last option you don't have to write C code, but this would not filter TRUNCATE. In short, what you propose looks over-complicated to me and there are options on the table which allow the problem you are trying to solve to not happen at all. You could also use the utility hook to log or register somewhere hte XID/time/LSN associated to a given command and then use it as your restore point. This could also happen out of core.
--
Michael
On Tue, Jul 03, 2018 at 01:07:41AM +0000, Yotsunaga, Naoki wrote:
There is also recovery_target_lsn which is new as of v10.
In this method, it is necessary to look at a lsn position before operating.
But I assume the user who did not look it before operating.
So I think that this method is not appropriate.
You should avoid top-posting on the mailing lists, this breaks the
consistency of the thread.
So basically what you are looking for here is a way to enforce a
restore point to be created depending on a set of pre-defined
conditions? How would you define and choose those?I understand that I was asked how to set up a command to apply this function.
Ex) DROP = on
TRUNCATE = off
Is my interpretation right?
If my interpretation is correct, all the above commands will be
applied.
When this function is turned on, this function works when all the
above commands are executed.
Yeah, but based on which factors are you able to define that such
conditions are enough to say that this feature is fully-compliant with
user's need, and how can you be sure that this is not going to result in
an additional maintenance burden if you need to define a new set of
conditions in the future. For example an operator has issued a costly
ALTER TABLE which causes a full table rewrite, which could be also an
operation that you would like to prevent. Having a set of GUCs which
define such low-level behavior is not really user-friendly.
--
Michael
On Tue, Jul 03, 2018 at 01:06:31AM +0000, Yotsunaga, Naoki wrote:
I'd rather spend effort making the initial execution of said commands
less likely.I think that the function to prohibit DELETE and UPDATE without a
WHERE clause in the later response is good way.
This has popped up already in the lists in the past.
But I think that it is impossible to completely eliminate the failure
of the other commands. For example, drop the wrong table.
This kind of thing is heavily application-dependent. For example, you
would likely not care if an operator, who has newly-joined the team in
charge of the maintenance of this data, drops unfortunately a table
which includes logs from 10 years back, and you would very likely care
about a table dropped which has user's login data. My point is that you
need to carefully design the shape of the configuration you would use,
so as any application's admin would be able to cope with it, for example
allowing exclusion filters with regular expressions could be a good idea
to dig into. And also you need to think about it so as it is backward
compatible.
--
Michael
On Mon, 2 Jul 2018 at 20:07, Yotsunaga, Naoki
<yotsunaga.naoki@jp.fujitsu.com> wrote:
Hi. Thanks for comments.
Explanation of the background of the function proposal was inadequate.
So, I explain again.I assume the following situation.
User needs to make a quick, seemingly simple fix to an important production database. User composes the query, gives it an once-over, and lets it run. Seconds later user realizes that user forgot the WHERE clause, dropped the wrong table, or made another serious mistake, and interrupts the query, but the damage has been done.
Also user did not record the time and did not look at a lsn position.
Thinking on Michael's suggestion of using event triggers, you can
create an event trigger to run pg_create_restore_point() on DROP,
here's a simple example of how that should like:
https://www.postgresql.org/docs/current/static/functions-event-triggers.html
You can also create a normal trigger BEFORE TRUNCATE to create a
restore point just before running the TRUNCATE command.
Those would run *on the background* (you don't need to call them
manually), you can use them right now, won't affect performance for
people not wanting this "functionality".
BTW, Michael's suggestion also included the idea of recording
xid/time/lsn which can be done through triggers too
--
Jaime Casanova www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
-----Original Message-----
From: Michael Paquier [mailto:michael@paquier.xyz]
Sent: Tuesday, July 3, 2018 10:22 AM
This kind of thing is heavily application-dependent. For example, you would likely not care if an operator, who has newly-joined the team in >charge of the maintenance of this data, drops unfortunately a table which includes logs from 10 years back, and you would very likely care >about a table dropped which has user's login data. My point is that you need to carefully design the shape of the configuration you would use, >so as any application's admin would be able to cope with it, for example allowing exclusion filters with regular expressions could be a good >idea to dig into. And also you need to think about it so as it is backward compatible.
Thanks for comments.
Does that mean that the application (user) is interested in which table?
For example, there are two tables A. It is ok even if one table disappears, but it is troubled if another table B disappears. So, when the table B is dropped, automatic restore point works. In the table A, automatic restore point does not work.
So, it is difficult to implement that automatic restore point in postgresql by default.
Is my interpretation right?
---
Naoki Yotsunaga
-----Original Message-----
From: Jaime Casanova [mailto:jaime.casanova@2ndquadrant.com]
Sent: Tuesday, July 3, 2018 11:06 AM
Thinking on Michael's suggestion of using event triggers, you can create an event >trigger to run pg_create_restore_point() on DROP, here's a simple example of how >that should like:
https://www.postgresql.org/docs/current/static/functions-event-triggers.html
You can also create a normal trigger BEFORE TRUNCATE to create a restore point just >before running the TRUNCATE command.
Thanks for comments.
I was able to understand.
---
Naoki Yotsunaga
-----Original Message-----
From: Yotsunaga, Naoki [mailto:yotsunaga.naoki@jp.fujitsu.com]
Sent: Friday, July 6, 2018 5:05 PM
Does that mean that the application (user) is interested in which table?
For example, there are two tables A. It is ok even if one table disappears, but it is troubled if another table B disappears. So, when the table B is dropped, automatic restore point works. In the table A, automatic restore point does not work.
So, it is difficult to implement that automatic restore point in postgresql by default.
Is my interpretation right?
I want to hear about the following in addition to the previous comment.
What would you do if your customer dropped the table and asked you to restore it?
Everyone is thinking what to do to avoid operation failure, but I’m thinking about after the user’s failure.
What I mean is that not all users will set up in advance.
For example, if you make the settings described in the manual, you will not drop the table by operation failure. However, not all users do that setting.
For such users, I think that it is necessary to have a function to easily restore data after failing operation without setting anything in advance.
So I proposed this function.
---
Naoki Yotsunaga
On Wed, Jul 11, 2018 at 06:11:01AM +0000, Yotsunaga, Naoki wrote:
I want to hear about the following in addition to the previous
comment. What would you do if your customer dropped the table and asked you to
restore it?
I can think of 4 reasons on top of my mind:
1) Don't do that.
2) Implement safe-guards using utility hooks or event triggers.
3) Have a delayed standby if you don't believe that your administrators
are skilled enough in case.
4) Have backups and a WAL archive.
Everyone is thinking what to do to avoid operation failure, but I’m
thinking about after the user’s failure.
What I mean is that not all users will set up in advance.
For example, if you make the settings described in the manual, you
will not drop the table by operation failure. However, not all users
do that setting.
For such users, I think that it is necessary to have a function to
easily restore data after failing operation without setting anything
in advance. So I proposed this function.
Well, if you put in place correct measures from the start you would not
have problems. It seems to me that there is no point in implementing
something which is a solution for a very narrow case, where the user has
shot his own foot to begin with. Having backups anyway is mandatory by
the way, standby replicas are not backups.
--
Michael
-----Original Message-----
From: Michael Paquier [mailto:michael@paquier.xyz]
Sent: Wednesday, July 11, 2018 3:34 PM
Well, if you put in place correct measures from the start you would not have problems.
It seems to me that there is no point in implementing something which is a solution for a very narrow case, where the user has shot his own foot to begin with.
Having backups anyway is mandatory by the way, standby replicas are not backups.
I think that the Undo function of AWS and Oracle's Flashback function are to save such users, and it is a function to prevent human error.
So, how about postgres implementing such a function?
Also, as an approach to achieving the goal, I thought about outputting lsn to the server log when a specific command was executed.
I do not think the source code of postgres will be complicated when implementing this function.
Do you feel it is too complicated?
-------
Naoki Yotsunaga
On Fri, Jul 13, 2018 at 08:16:00AM +0000, Yotsunaga, Naoki wrote:
Do you feel it is too complicated?
In short, yes.
--
Michael
-----Original Message-----
From: Yotsunaga, Naoki [mailto:yotsunaga.naoki@jp.fujitsu.com]
Sent: Tuesday, June 26, 2018 10:18 AM
To: Postgres hackers <pgsql-hackers@postgresql.org>
Subject: automatic restore point
Hi, I attached a patch to output the LSN before execution to the server log when executing a specific command and accidentally erasing data.
A detailed background has been presented before.
In short explain: After the DBA's operation failure and erases the data, it is necessary to perform PITR immediately.
Since it is not possible to easily obtain information for doing the current PITR, I would like to solve it.
The specification has changed from the first proposal.
-Target command
DROP TABLE
TRUNCATE
-Setting file
postgresql.conf
log_recovery_points = on #default value is 'off'. When the switch is turned on, LSN is output to the server log when DROP TABLE, TRUNCATE is executed.
-How to use
1) When executing the above command, identify the command and recovery point that matches the resource indicating the operation failure from the server log.
ex) LOG: recovery_point_lsn: 0/201BB70
STATEMENT: drop table test ;
2) Implement PostgreSQL document '25 .3.4.Recovering Using a Continuous Archive Backup.'
*Set "recovery_target_lsn = 'recovery_point_lsn'" at recovery.conf.
Although there was pointed out that the source becomes complicated in the past, we could add the function by adding about 20 steps.
What do you think about it? Do you think is it useful?
------
Naoki Yotsunaga