PITR Phase 2 - Design Planning

Started by Simon Riggsover 21 years ago34 messages
#1Simon Riggs
simon@2ndquadrant.com

Since Phase1 is functioning and should hopefully soon complete, we can
now start thinking about Phase 2: full recovery to a point-in-time.

Previous thinking was that a command line switch would be used to
specify recover to a given point in time, rather than the default, which
will be recover all the way to end of (available) xlogs.

Recovering to a specific point in time forces us to consider what the
granularity is of time.
We could recover:
1.to end of a full transaction log file
2.to end of a full transaction

Transaction log files currently have timestamps, so that is
straightforward, but probably not the best we can do. We would
rollforward until the xlog file time > desired point in time.

To make (2) work we would have to have a timestamp associated with each
transaction. This could be in one of two places:
1. the transaction record in the clog
2. the log record in the xlog
We would then recover the xlog record by record, until we found a record
that had a timestamp > desired point-in-time.

Currently, neither of these places have a timestamp. Hmmmm. We can't use
pg_control because we are assuming that it needs recovery...

I can't see any general way of adding a timestamp in any less than 2
bytes. We don't need a timezone. The timestamp could refer to a number
of seconds since last checkpoint; since this is limited already by a GUC
to force checkpoints every so often. Although code avoids a checkpoint
if no updates have taken place, we wouldn't be too remiss to use a
forced checkpoint every 32,000 seconds (9 hours).
Assuming that accuracy of the point-in-time was of the order of
seconds?? If we went to 0.1 second accuracy, we could checkpoint (force)
every 40 minutes or so. All of that seems too restrictive.
If we went to milliseconds, then we could use a 4 byte value and use a
checkpoint (force) every 284 hours or 1.5 weeks.
Thoughts?

Clog uses 2 bits per transaction, so even 2 bytes extra per transaction
will make the clog 9 times larger than originally intended. This could
well cause it to segment quicker, but I'm sure no one would be happy
with that. So, lets not add anything to the clog.

The alternative is to make the last part of the XlogHeader record a
timestamp value, increasing each xlog write. It might be possible to
make this part of the header optional depending upon whether or not PITR
was required, but then my preference is against such dynamic coding.

So, I propose:

- appending 8 byte date/time data into xlog file header record
- appending 4 bytes of time offset onto each xlog record
- altering the recovery logic to compare the calculated time of each
xlog record (file header + offset) against the desired point-in-time,
delivered to it by GUC.

Input is sought from anybody with detailed NTP knowledge, since the
working of NTP drift correction may have some subtle interplay with this
proposal.

Also, while that code is being altered, some additional log records need
to be added when recovery of each new xlog starts, with timing, to allow
DBAs watching a recovery to calculate expected completion times for the
recovery, which is essential for long recovery situations.

I am also considering any changes that may be required to prepare the
way for a future implementation of parallel redo recovery.

Best regards, Simon Riggs, 2ndQuadrant
http://www.2ndquadrant.com

#2Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Simon Riggs (#1)
Re: PITR Phase 2 - Design Planning

Simon Riggs wrote:

Transaction log files currently have timestamps, so that is
straightforward, but probably not the best we can do. We would
rollforward until the xlog file time > desired point in time.

To make (2) work we would have to have a timestamp associated with each
transaction. This could be in one of two places:
1. the transaction record in the clog
2. the log record in the xlog
We would then recover the xlog record by record, until we found a record
that had a timestamp > desired point-in-time.

Currently, neither of these places have a timestamp. Hmmmm. We can't use
pg_control because we are assuming that it needs recovery...

I can't see any general way of adding a timestamp in any less than 2
bytes. We don't need a timezone. The timestamp could refer to a number
of seconds since last checkpoint; since this is limited already by a GUC
to force checkpoints every so often. Although code avoids a checkpoint
if no updates have taken place, we wouldn't be too remiss to use a
forced checkpoint every 32,000 seconds (9 hours).
Assuming that accuracy of the point-in-time was of the order of
seconds?? If we went to 0.1 second accuracy, we could checkpoint (force)
every 40 minutes or so. All of that seems too restrictive.
If we went to milliseconds, then we could use a 4 byte value and use a
checkpoint (force) every 284 hours or 1.5 weeks.
Thoughts?

I was thinking --- how would someone know the time to use for restore?
Certainly they will not know subsecond accuracy? Probably second-level
accuracty is enough, _except_ when they want everything restored up to a
DROP TABLE transaction or some major problem. Is there a way to give
users a list of transactions on a log backup? Can we show them the
username, database, or commands or something? Would they be able to
restore up to a specific transaction in that case?

Basically, we could give them sub-second recovery, but what value would
it be?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#3Alvaro Herrera
alvherre@dcc.uchile.cl
In reply to: Bruce Momjian (#2)
Re: PITR Phase 2 - Design Planning

On Mon, Apr 26, 2004 at 05:05:41PM -0400, Bruce Momjian wrote:

Simon Riggs wrote:

Transaction log files currently have timestamps, so that is
straightforward, but probably not the best we can do. We would
rollforward until the xlog file time > desired point in time.

I was thinking --- how would someone know the time to use for restore?

I think there should be a way to get a TransactionId and restore up to
that point. It'd be cool, but not required, if the system showed what
valid TransactionIds there are, and roughly what they did (the xlog
code already has "describers" everywhere AFAICS).

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"El realista sabe lo que quiere; el idealista quiere lo que sabe" (An�nimo)

#4Simon Riggs
simon@2ndquadrant.com
In reply to: Bruce Momjian (#2)
Re: PITR Phase 2 - Design Planning

On Mon, 2004-04-26 at 22:05, Bruce Momjian wrote:

Simon Riggs wrote:

Transaction log files currently have timestamps, so that is
straightforward, but probably not the best we can do. We would
rollforward until the xlog file time > desired point in time.

To make (2) work we would have to have a timestamp associated with each
transaction. This could be in one of two places:
1. the transaction record in the clog
2. the log record in the xlog
We would then recover the xlog record by record, until we found a record
that had a timestamp > desired point-in-time.

Currently, neither of these places have a timestamp. Hmmmm. We can't use
pg_control because we are assuming that it needs recovery...

I can't see any general way of adding a timestamp in any less than 2
bytes. We don't need a timezone. The timestamp could refer to a number
of seconds since last checkpoint; since this is limited already by a GUC
to force checkpoints every so often. Although code avoids a checkpoint
if no updates have taken place, we wouldn't be too remiss to use a
forced checkpoint every 32,000 seconds (9 hours).
Assuming that accuracy of the point-in-time was of the order of
seconds?? If we went to 0.1 second accuracy, we could checkpoint (force)
every 40 minutes or so. All of that seems too restrictive.
If we went to milliseconds, then we could use a 4 byte value and use a
checkpoint (force) every 284 hours or 1.5 weeks.
Thoughts?

I was thinking --- how would someone know the time to use for restore?
Certainly they will not know subsecond accuracy? Probably second-level
accuracty is enough, _except_ when they want everything restored up to a
DROP TABLE transaction or some major problem. Is there a way to give
users a list of transactions on a log backup? Can we show them the
username, database, or commands or something? Would they be able to
restore up to a specific transaction in that case?

Basically, we could give them sub-second recovery, but what value would
it be?

Yes, you remind me of a whole train of thought...

There should be a switch to allow you to specify the txnid you wish to
recover up until as well.

You raise the point of how you know what time to recover to. That is in
fact the very hardest part of recovery for a DBA. That's a good reason
for being able to list xlog contents, as you can with Oracle. Sounds
like we need an XlogMiner utility...

Can we show them the username, database, or commands or something?

Yes, that sounds fairly straightforward possible using a modification of
the ReadRecord functions at the bottom of xlog.c - which is why security
of the xlogs is important.

It's also a good reason for being able to pause and restart recovery, so
you can see what it's like before continuing further.

Usually you are trying to sync up the contents of the database with all
of the other things that were being updated too. Often these will define
either the required transaction contents, or give a time to use.

Whatever level of time accuracy you choose, we would always need to
handle the case where multiple transactions have been committed with
exactly the same time (after rounding) and yet we may wish to split
them. Rolling forward to a txnid would help there.

Best regards, Simon

#5Simon Riggs
simon@2ndquadrant.com
In reply to: Alvaro Herrera (#3)
Re: PITR Phase 2 - Design Planning

On Mon, 2004-04-26 at 23:01, Alvaro Herrera wrote:

On Mon, Apr 26, 2004 at 05:05:41PM -0400, Bruce Momjian wrote:

Simon Riggs wrote:

Transaction log files currently have timestamps, so that is
straightforward, but probably not the best we can do. We would
rollforward until the xlog file time > desired point in time.

I was thinking --- how would someone know the time to use for restore?

I think there should be a way to get a TransactionId and restore up to
that point. It'd be cool, but not required, if the system showed what
valid TransactionIds there are, and roughly what they did (the xlog
code already has "describers" everywhere AFAICS).

You're right, I think we should start by implementing the rollforward to
a txnid before we consider the rollforward to a specified point-in-time.
All the hooks for that are already there...

Best regards, Simon Riggs

#6Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Simon Riggs (#5)
Re: PITR Phase 2 - Design Planning

Simon Riggs wrote:

On Mon, 2004-04-26 at 23:01, Alvaro Herrera wrote:

On Mon, Apr 26, 2004 at 05:05:41PM -0400, Bruce Momjian wrote:

Simon Riggs wrote:

Transaction log files currently have timestamps, so that is
straightforward, but probably not the best we can do. We would
rollforward until the xlog file time > desired point in time.

I was thinking --- how would someone know the time to use for restore?

I think there should be a way to get a TransactionId and restore up to
that point. It'd be cool, but not required, if the system showed what
valid TransactionIds there are, and roughly what they did (the xlog
code already has "describers" everywhere AFAICS).

You're right, I think we should start by implementing the rollforward to
a txnid before we consider the rollforward to a specified point-in-time.
All the hooks for that are already there...

Yep, sounds like a plan.
-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#7Hans-Jürgen Schönig
postgres@cybertec.at
In reply to: Simon Riggs (#1)
Re: PITR Phase 2 - Design Planning

Simon Riggs wrote:

Since Phase1 is functioning and should hopefully soon complete, we can
now start thinking about Phase 2: full recovery to a point-in-time.

Previous thinking was that a command line switch would be used to
specify recover to a given point in time, rather than the default, which
will be recover all the way to end of (available) xlogs.

Recovering to a specific point in time forces us to consider what the
granularity is of time.
We could recover:
1.to end of a full transaction log file
2.to end of a full transaction

Transaction log files currently have timestamps, so that is
straightforward, but probably not the best we can do. We would
rollforward until the xlog file time > desired point in time.

To make (2) work we would have to have a timestamp associated with each
transaction. This could be in one of two places:
1. the transaction record in the clog
2. the log record in the xlog
We would then recover the xlog record by record, until we found a record
that had a timestamp > desired point-in-time.

Currently, neither of these places have a timestamp. Hmmmm. We can't use
pg_control because we are assuming that it needs recovery...

I can't see any general way of adding a timestamp in any less than 2
bytes. We don't need a timezone. The timestamp could refer to a number
of seconds since last checkpoint; since this is limited already by a GUC
to force checkpoints every so often. Although code avoids a checkpoint
if no updates have taken place, we wouldn't be too remiss to use a
forced checkpoint every 32,000 seconds (9 hours).
Assuming that accuracy of the point-in-time was of the order of
seconds?? If we went to 0.1 second accuracy, we could checkpoint (force)
every 40 minutes or so. All of that seems too restrictive.
If we went to milliseconds, then we could use a 4 byte value and use a
checkpoint (force) every 284 hours or 1.5 weeks.
Thoughts?

Clog uses 2 bits per transaction, so even 2 bytes extra per transaction
will make the clog 9 times larger than originally intended. This could
well cause it to segment quicker, but I'm sure no one would be happy
with that. So, lets not add anything to the clog.

The alternative is to make the last part of the XlogHeader record a
timestamp value, increasing each xlog write. It might be possible to
make this part of the header optional depending upon whether or not PITR
was required, but then my preference is against such dynamic coding.

So, I propose:

- appending 8 byte date/time data into xlog file header record
- appending 4 bytes of time offset onto each xlog record
- altering the recovery logic to compare the calculated time of each
xlog record (file header + offset) against the desired point-in-time,
delivered to it by GUC.

Input is sought from anybody with detailed NTP knowledge, since the
working of NTP drift correction may have some subtle interplay with this
proposal.

Also, while that code is being altered, some additional log records need
to be added when recovery of each new xlog starts, with timing, to allow
DBAs watching a recovery to calculate expected completion times for the
recovery, which is essential for long recovery situations.

I am also considering any changes that may be required to prepare the
way for a future implementation of parallel redo recovery.

Best regards, Simon Riggs, 2ndQuadrant
http://www.2ndquadrant.com

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Simon,

I have one question which might be important: If we use timestamps
inside the WAL system to find out where to stop. What happens if
somebody changes the time of the system? (e.g. correcting the system
clock by calling ntpdate). Wouldn't it confuse the PITR system? How do
you plan to handle that? Unfortunately time is nothing which can be used
as a key (at least not from my point of view).

Just some lousy ideas early in the morning ...

Regards,

Hans

--
Cybertec Geschwinde u Schoenig
Schoengrabern 134, A-2020 Hollabrunn, Austria
Tel: +43/2952/30706 or +43/664/233 90 75
www.cybertec.at, www.postgresql.at, kernel.cybertec.at

#8Richard Huxton
dev@archonet.com
In reply to: Bruce Momjian (#6)
Re: PITR Phase 2 - Design Planning

On Tuesday 27 April 2004 00:32, Bruce Momjian wrote:

Simon Riggs wrote:

On Mon, 2004-04-26 at 23:01, Alvaro Herrera wrote:

On Mon, Apr 26, 2004 at 05:05:41PM -0400, Bruce Momjian wrote:

I was thinking --- how would someone know the time to use for
restore?

I think there should be a way to get a TransactionId and restore up to
that point. It'd be cool, but not required, if the system showed what
valid TransactionIds there are, and roughly what they did (the xlog
code already has "describers" everywhere AFAICS).

You're right, I think we should start by implementing the rollforward to
a txnid before we consider the rollforward to a specified point-in-time.
All the hooks for that are already there...

Yep, sounds like a plan.

Speaking as a DBA, what I usually want to do is restore to "immediately before
I started the payroll calculation". An actual wall-clock time is mostly
irrelevant to me.

Suggestion: How about a pg_trans_note table (crap name, I know) that only
permits inserts - records (backend-pid, timestamp, notes). My app inserts
"starting payroll calc" and "ending payroll calc" entries because those are
the points I might wish to sync to. If I want to sync for each individual
transaction in my calculations, my app can do that too.

From a usability point of view you might want to automatically insert rows on

client connection/table creation etc. You could also delete any rows more
than a week old when archiving WAL files.

--
Richard Huxton
Archonet Ltd

#9Bruno Wolff III
bruno@wolff.to
In reply to: Richard Huxton (#8)
Re: PITR Phase 2 - Design Planning

On Tue, Apr 27, 2004 at 10:38:45 +0100,
Richard Huxton <dev@archonet.com> wrote:

Speaking as a DBA, what I usually want to do is restore to "immediately before
I started the payroll calculation". An actual wall-clock time is mostly
irrelevant to me.

For long running transactions where you want to recover as much as possible,
one might also want to recover up until just before a specific transaction
committed (as opposed to started).

#10Simon Riggs
simon@2ndquadrant.com
In reply to: Hans-Jürgen Schönig (#7)
Re: PITR Phase 2 - Design Planning

On Tue, 2004-04-27 at 08:56, Hans-J��rgen Sch��nig wrote:

Simon Riggs wrote:

Since Phase1 is functioning and should hopefully soon complete, we can
now start thinking about Phase 2: full recovery to a point-in-time.

Previous thinking was that a command line switch would be used to
specify recover to a given point in time, rather than the default, which
will be recover all the way to end of (available) xlogs.

Recovering to a specific point in time forces us to consider what the
granularity is of time.
We could recover:
1.to end of a full transaction log file
2.to end of a full transaction

Transaction log files currently have timestamps, so that is
straightforward, but probably not the best we can do. We would
rollforward until the xlog file time > desired point in time.

To make (2) work we would have to have a timestamp associated with each
transaction.

...

So, I propose:

- appending 8 byte date/time data into xlog file header record
- appending 4 bytes of time offset onto each xlog record
- altering the recovery logic to compare the calculated time of each
xlog record (file header + offset) against the desired point-in-time,
delivered to it by GUC.

Input is sought from anybody with detailed NTP knowledge, since the
working of NTP drift correction may have some subtle interplay with this
proposal.

I have one question which might be important: If we use timestamps
inside the WAL system to find out where to stop. What happens if
somebody changes the time of the system? (e.g. correcting the system
clock by calling ntpdate). Wouldn't it confuse the PITR system? How do
you plan to handle that? Unfortunately time is nothing which can be used
as a key (at least not from my point of view).

Interesting question.

The timestamps written within WAL would be "just data" once written. The
recovery process would refer only to that timestamp data, so would not
refer to any external notion of time. This is required to ensure that
the same recovery will happen identically, no matter how many times you
choose to re-run it (or are forced to by external circumstances).

If you change the time of the system backwards, this might invalidate
the transaction log history....this would effectively create two (or
more) xlog records with the same timestamp on them and it would be
logically indeterminate which one should limit recovery. In practical
terms, I would implement this as "apply all records <= PIT". This would
mean that recovery would stop only when the time became larger than PIT,
which would only occur at the second (or last) record that had a
timestamp equal to PIT. I guess I could put a WARNING in to say "time
just went backwards...spoohw"

Overall, I'd refer back to the points Bruce raised - you certainly do
need a way of finding out the time to recover to, and as others have
said also, time isn't the only desirable "recovery point".

Best regards, Simon Riggs

#11Simon Riggs
simon@2ndquadrant.com
In reply to: Richard Huxton (#8)
Re: PITR Phase 2 - Design Planning

On Tue, 2004-04-27 at 10:38, Richard Huxton wrote:

On Tuesday 27 April 2004 00:32, Bruce Momjian wrote:

Simon Riggs wrote:

On Mon, 2004-04-26 at 23:01, Alvaro Herrera wrote:

On Mon, Apr 26, 2004 at 05:05:41PM -0400, Bruce Momjian wrote:

I was thinking --- how would someone know the time to use for
restore?

I think there should be a way to get a TransactionId and restore up to
that point. It'd be cool, but not required, if the system showed what
valid TransactionIds there are, and roughly what they did (the xlog
code already has "describers" everywhere AFAICS).

You're right, I think we should start by implementing the rollforward to
a txnid before we consider the rollforward to a specified point-in-time.
All the hooks for that are already there...

Yep, sounds like a plan.

Speaking as a DBA, what I usually want to do is restore to "immediately before
I started the payroll calculation". An actual wall-clock time is mostly
irrelevant to me.

Suggestion: How about a pg_trans_note table (crap name, I know) that only
permits inserts - records (backend-pid, timestamp, notes). My app inserts
"starting payroll calc" and "ending payroll calc" entries because those are
the points I might wish to sync to. If I want to sync for each individual
transaction in my calculations, my app can do that too.

From a usability point of view you might want to automatically insert rows on

client connection/table creation etc. You could also delete any rows more
than a week old when archiving WAL files.

Not sure I like you're implementation, but the overall idea is great.

I'd suggest extending the CHECKPOINT command so you can say:
CHECKPOINT <text message>
e.g. CHECKPOINT 'starting payroll Feb04';
(I'm sure some other DBMS does this...head spinning can;t recall...)
the text could just appear in the xlog record data packet...

That could then be used as the target recovery point.

Best Regards, Simon Riggs

#12Simon Riggs
simon@2ndquadrant.com
In reply to: Bruno Wolff III (#9)
Re: PITR Phase 2 - Design Planning

On Tue, 2004-04-27 at 18:43, Bruno Wolff III wrote:

On Tue, Apr 27, 2004 at 10:38:45 +0100,
Richard Huxton <dev@archonet.com> wrote:

Speaking as a DBA, what I usually want to do is restore to "immediately before
I started the payroll calculation". An actual wall-clock time is mostly
irrelevant to me.

For long running transactions where you want to recover as much as possible,
one might also want to recover up until just before a specific transaction
committed (as opposed to started).

Sounds like the difference between > and >=, so should be possible...

Best Regards, Simon Riggs

#13Rod Taylor
pg@rbt.ca
In reply to: Simon Riggs (#10)
Re: PITR Phase 2 - Design Planning

Overall, I'd refer back to the points Bruce raised - you certainly do
need a way of finding out the time to recover to, and as others have
said also, time isn't the only desirable "recovery point".

Wouldn't it be sufficient to simply use the transaction ID and ensure
that all the parameters the user might want to use to find that ID can
be made available in the log files?

#14Simon Riggs
simon@2ndquadrant.com
In reply to: Rod Taylor (#13)
Re: PITR Phase 2 - Design Planning

On Tue, 2004-04-27 at 21:56, Rod Taylor wrote:

Overall, I'd refer back to the points Bruce raised - you certainly do
need a way of finding out the time to recover to, and as others have
said also, time isn't the only desirable "recovery point".

Wouldn't it be sufficient to simply use the transaction ID and ensure
that all the parameters the user might want to use to find that ID can
be made available in the log files?

Yes, of course, all methods of locating a particular xlog file to stop
at are effectively equivalent. The discussion is mostly about what is
convenient for the user in a real recovery situation.

From all that has been said so far, I would implement:

1. Recovery to a specific txnid, which is fairly straightforward
2. Recovery to a specific date/time
a) either by implementing a log inspection tool that shows the txnid for
a PIT
b) implementing recovery to a PIT directly
3. Recovery to a named checkpoint

Best Regards, Simon Riggs

#15Rod Taylor
pg@rbt.ca
In reply to: Simon Riggs (#14)
Re: PITR Phase 2 - Design Planning

On Tue, 2004-04-27 at 17:36, Simon Riggs wrote:

On Tue, 2004-04-27 at 21:56, Rod Taylor wrote:

Overall, I'd refer back to the points Bruce raised - you certainly do
need a way of finding out the time to recover to, and as others have
said also, time isn't the only desirable "recovery point".

Wouldn't it be sufficient to simply use the transaction ID and ensure
that all the parameters the user might want to use to find that ID can
be made available in the log files?

Yes, of course, all methods of locating a particular xlog file to stop
at are effectively equivalent. The discussion is mostly about what is
convenient for the user in a real recovery situation.

I see.. The first thing I would need to do is look at /var/log/pgsql. At
that point it really doesn't matter what the identifier is so long as
the identifier is there.

#16Simon Riggs
simon@2ndquadrant.com
In reply to: Rod Taylor (#15)
Re: PITR Phase 2 - Design Planning

On Tue, 2004-04-27 at 23:11, Rod Taylor wrote:

On Tue, 2004-04-27 at 17:36, Simon Riggs wrote:

On Tue, 2004-04-27 at 21:56, Rod Taylor wrote:

Overall, I'd refer back to the points Bruce raised - you certainly do
need a way of finding out the time to recover to, and as others have
said also, time isn't the only desirable "recovery point".

Wouldn't it be sufficient to simply use the transaction ID and ensure
that all the parameters the user might want to use to find that ID can
be made available in the log files?

Yes, of course, all methods of locating a particular xlog file to stop
at are effectively equivalent. The discussion is mostly about what is
convenient for the user in a real recovery situation.

I see.. The first thing I would need to do is look at /var/log/pgsql. At
that point it really doesn't matter what the identifier is so long as
the identifier is there.

PITR works on the assumption that /var/log/pgsql no longer exists at
all. It is suitable for use in bare-metal recovery situations, as well
as usage-induced situations.

You pick up the pieces, work out what the best identifier is, then plan
on using that.... might not be a pgsql log, it might be:
i) literally wallclock - "power went off about 2"
ii) other systems logs
iii) etc

Best Regards, Simon Riggs

#17Simon Riggs
simon@2ndquadrant.com
In reply to: Bruce Momjian (#6)
Re: PITR Phase 2 - Design Planning

On Fri, 2004-05-28 at 00:02, Peter Galbavy wrote:

Bruno Wolff III wrote:

For long running transactions where you want to recover as much as possible,
one might also want to recover up until just before a specific transaction
committed (as opposed to started).

If your DB has died and you are recovering it, how do you reestablish a
session so that a transaction can complete ? Doesn't all client
connections assume that a transaction has failed if the connection to
the DB fails ?

Reasonable question...

You re-establish connection, but cannot resume the failed transaction.

PostgreSQL already has crash recovery...this is for restore from backup
scenarios.

Best Regards, Simon Riggs

#18Jim C. Nasby
jim@nasby.net
In reply to: Simon Riggs (#1)
Re: PITR Phase 2 - Design Planning

Another idea would be to provide some means to roll a database forwards
and backwards. If you're doing a recovery because you did something like
an accidental UPDATE SET field = blah without a where clause, what you
really care about is going up to the point right before that update. If
there's a log viewer with enough detail, that would suffice; otherwise
you'd need to test for some specific condition.

Of course timestamps would still be useful in this scenario since they
can get you close to the transaction in question.
--
Jim C. Nasby, Database Consultant jim@nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

#19Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Simon Riggs (#14)
Re: PITR Phase 2 - Design Planning

Simon Riggs wrote:

On Tue, 2004-04-27 at 21:56, Rod Taylor wrote:

Overall, I'd refer back to the points Bruce raised - you certainly do
need a way of finding out the time to recover to, and as others have
said also, time isn't the only desirable "recovery point".

Wouldn't it be sufficient to simply use the transaction ID and ensure
that all the parameters the user might want to use to find that ID can
be made available in the log files?

Yes, of course, all methods of locating a particular xlog file to stop
at are effectively equivalent. The discussion is mostly about what is
convenient for the user in a real recovery situation.

From all that has been said so far, I would implement:

1. Recovery to a specific txnid, which is fairly straightforward
2. Recovery to a specific date/time
a) either by implementing a log inspection tool that shows the txnid for
a PIT
b) implementing recovery to a PIT directly
3. Recovery to a named checkpoint

What if we added transaction id to log_line_prefix? The user could then
log all queries and find the xid where they want to stop, but of course
that assumes they have enabled such logging, and they have access to the
logs.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#20Bruno Wolff III
bruno@wolff.to
In reply to: Bruce Momjian (#6)
Re: PITR Phase 2 - Design Planning

On Thu, May 27, 2004 at 23:02:42 +0000,
Peter Galbavy <peter.galbavy@knowtion.net> wrote:

Bruno Wolff III wrote:

For long running transactions where you want to recover as much as
possible,
one might also want to recover up until just before a specific transaction
committed (as opposed to started).

If your DB has died and you are recovering it, how do you reestablish a
session so that a transaction can complete ? Doesn't all client
connections assume that a transaction has failed if the connection to
the DB fails ?

The context of my suggestion was for recovering up until a transaction which
messed things up was committed. I did not want the problem transaction to
be committed. If the problem transaction ran for a long time, there might
be other transactions that I want to keep, if possible, that committed
after the problem transaction started and before it ended.

#21Peter Galbavy
peter.galbavy@knowtion.net
In reply to: Bruno Wolff III (#20)
Re: PITR Phase 2 - Design Planning

Bruno Wolff III wrote:

The context of my suggestion was for recovering up until a transaction which
messed things up was committed. I did not want the problem transaction to
be committed. If the problem transaction ran for a long time, there might
be other transactions that I want to keep, if possible, that committed
after the problem transaction started and before it ended.

Ah! followed by Eek! Now I see the light. It's very bright and painful.

What I can see is that expressing this accurately and unambiguously is
going to be _difficult_. How do you know accurately the point just
before a transaction was completed. There must be a good subset of
candidates that can be labelled.

Is there anyway to label/name a transaction that can be kept somewhere ?
Like "begin transaction 'bigtrasacation26';" - is there any allowance in
the SQL standards for naming trasactions ?

PS I have fixed my system clock - apologies to my earlier reply being a
month ahead.

rgds,
--
Peter

#22Simon Riggs
simon@2ndquadrant.com
In reply to: Bruce Momjian (#19)
Re: PITR Phase 2 - Design Planning

On Wed, 2004-04-28 at 05:00, Bruce Momjian wrote:

Simon Riggs wrote:

On Tue, 2004-04-27 at 21:56, Rod Taylor wrote:

Overall, I'd refer back to the points Bruce raised - you certainly do
need a way of finding out the time to recover to, and as others have
said also, time isn't the only desirable "recovery point".

Wouldn't it be sufficient to simply use the transaction ID and ensure
that all the parameters the user might want to use to find that ID can
be made available in the log files?

Yes, of course, all methods of locating a particular xlog file to stop
at are effectively equivalent. The discussion is mostly about what is
convenient for the user in a real recovery situation.

From all that has been said so far, I would implement:

1. Recovery to a specific txnid, which is fairly straightforward
2. Recovery to a specific date/time
a) either by implementing a log inspection tool that shows the txnid for
a PIT
b) implementing recovery to a PIT directly
3. Recovery to a named checkpoint

What if we added transaction id to log_line_prefix? The user could then
log all queries and find the xid where they want to stop, but of course
that assumes they have enabled such logging, and they have access to the
logs.

Good thinking.

I'll have a look at this and come back to you.

Best Regards, Simon Riggs

#23Andrew Dunstan
andrew@dunslane.net
In reply to: Simon Riggs (#22)
Re: PITR Phase 2 - Design Planning

Simon Riggs wrote:

On Wed, 2004-04-28 at 05:00, Bruce Momjian wrote:

What if we added transaction id to log_line_prefix? The user could then
log all queries and find the xid where they want to stop, but of course
that assumes they have enabled such logging, and they have access to the
logs.

Good thinking.

I'll have a look at this and come back to you.

log_statement triggered logging happens very early in the process - if
you are logging them all it happens before the statements are even
parsed. Would we have an xid to log sensibly at that stage?

Perhaps with log_min_duration_statment = 0 we would (i.e. log statements
when finished rather than when started).

cheers

andrew

#24Simon Riggs
simon@2ndquadrant.com
In reply to: Andrew Dunstan (#23)
Re: PITR Phase 2 - Design Planning

On Wed, 2004-04-28 at 18:35, Andrew Dunstan wrote:

Simon Riggs wrote:

On Wed, 2004-04-28 at 05:00, Bruce Momjian wrote:

What if we added transaction id to log_line_prefix? The user could then
log all queries and find the xid where they want to stop, but of course
that assumes they have enabled such logging, and they have access to the
logs.

Good thinking.

I'll have a look at this and come back to you.

log_statement triggered logging happens very early in the process - if
you are logging them all it happens before the statements are even
parsed. Would we have an xid to log sensibly at that stage?

Perhaps with log_min_duration_statment = 0 we would (i.e. log statements
when finished rather than when started).

Let's call this XLogSpy.

For starters, we only need to look at write transactions. Many read-only
transactions would not need to be examined, just to know they were read
only.

Remembering that we're using xlogs for recovery, we perhaps should not
assume that we have anything other than that which has been archived.
Currently, that is just the xlogs. So really we need to work off what is
within them and right now that isn't much at all.

We might optionally include the parsed statement data into the xlog,
since this might be smaller than the exact text itself and would also
allow us to filter the xlogs based upon any aspect of the lists.

Not really happy with any of these ideas yet.

Best Regards, Simon Riggs

#25Andrew Dunstan
andrew@dunslane.net
In reply to: Simon Riggs (#24)
Re: PITR Phase 2 - Design Planning

Simon Riggs said:

On Wed, 2004-04-28 at 18:35, Andrew Dunstan wrote:

Simon Riggs wrote:

On Wed, 2004-04-28 at 05:00, Bruce Momjian wrote:

What if we added transaction id to log_line_prefix? The user could
then log all queries and find the xid where they want to stop, but
of course that assumes they have enabled such logging, and they have
access to the logs.

Good thinking.

I'll have a look at this and come back to you.

log_statement triggered logging happens very early in the process - if
you are logging them all it happens before the statements are even
parsed. Would we have an xid to log sensibly at that stage?

Perhaps with log_min_duration_statment = 0 we would (i.e. log
statements when finished rather than when started).

Let's call this XLogSpy.

For starters, we only need to look at write transactions. Many
read-only transactions would not need to be examined, just to know they
were read only.

Remembering that we're using xlogs for recovery, we perhaps should not
assume that we have anything other than that which has been archived.
Currently, that is just the xlogs. So really we need to work off what
is within them and right now that isn't much at all.

We might optionally include the parsed statement data into the xlog,
since this might be smaller than the exact text itself and would also
allow us to filter the xlogs based upon any aspect of the lists.

Not really happy with any of these ideas yet.

I don't see how this relates to the log_line_prefix idea, or are you
abandoning that?

cheers

andrew

#26Peter Eisentraut
peter_e@gmx.net
In reply to: Simon Riggs (#11)
Re: PITR Phase 2 - Design Planning

Simon Riggs wrote:

I'd suggest extending the CHECKPOINT command so you can say:
CHECKPOINT <text message>
e.g. CHECKPOINT 'starting payroll Feb04';
(I'm sure some other DBMS does this...head spinning can;t recall...)
the text could just appear in the xlog record data packet...

I believe you are thinking of a savepoint rather than a checkpoint.

#27Alvaro Herrera
alvherre@dcc.uchile.cl
In reply to: Peter Eisentraut (#26)
Re: PITR Phase 2 - Design Planning

On Thu, Apr 29, 2004 at 05:09:19PM +0200, Peter Eisentraut wrote:

Simon Riggs wrote:

I'd suggest extending the CHECKPOINT command so you can say:
CHECKPOINT <text message>
e.g. CHECKPOINT 'starting payroll Feb04';
(I'm sure some other DBMS does this...head spinning can;t recall...)
the text could just appear in the xlog record data packet...

I believe you are thinking of a savepoint rather than a checkpoint.

But a "savepoint" has a very precise meaning in the SQL standard, which
relates to points in a transaction you can roll back to. I don't think
you want to overload with this other meaning, which I see as putting a
special mark in the XLog -- completely unrelated.

ISTM Checkpoint 'something' would flush all xlogs, insert some kind of
special xlog record with 'something' on it, and then archive everything
(up to that point?).

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"XML!" Exclaimed C++. "What are you doing here? You're not a programming
language."
"Tell that to the people who use me," said XML.

#28Simon Riggs
simon@2ndquadrant.com
In reply to: Peter Eisentraut (#26)
Re: PITR Phase 2 - Design Planning

On Thu, 2004-04-29 at 16:09, Peter Eisentraut wrote:

Simon Riggs wrote:

I'd suggest extending the CHECKPOINT command so you can say:
CHECKPOINT <text message>
e.g. CHECKPOINT 'starting payroll Feb04';
(I'm sure some other DBMS does this...head spinning can;t recall...)
the text could just appear in the xlog record data packet...

I believe you are thinking of a savepoint rather than a checkpoint.

Perhaps that was the inspiration, but no, I definitely meant a
CHECKPOINT.

But now you come to mention it, it would be better just to have a
command that simply wrote a named record to the xlog, so it can be
searched for later...

COMMENT [IN TRANSACTION LOG] 'starting payroll Feb04'

Best Regards, Simon Riggs

#29Mike Mascari
mascarm@mascari.com
In reply to: Simon Riggs (#28)
Re: PITR Phase 2 - Design Planning

Simon Riggs wrote:

On Thu, 2004-04-29 at 16:09, Peter Eisentraut wrote:

Perhaps that was the inspiration, but no, I definitely meant a
CHECKPOINT.

But now you come to mention it, it would be better just to have a
command that simply wrote a named record to the xlog, so it can
be searched for later...

COMMENT [IN TRANSACTION LOG] 'starting payroll Feb04'

FWFW,

Oracle's COMMIT syntax has an optional COMMENT specifier which is
used for documenting a distributed transaction. In-doubt
transactions can then be manually committed or aborted by
referencing the transaction associated with the comment. Example:

COMMIT WORK COMMENT 'A complex distributed Tx';

Perhaps there is some common ground between the 2PC implementation
and PITR?

Mike Mascari

#30Peter Eisentraut
peter_e@gmx.net
In reply to: Alvaro Herrera (#27)
Re: PITR Phase 2 - Design Planning

Alvaro Herrera wrote:

But a "savepoint" has a very precise meaning in the SQL standard,
which relates to points in a transaction you can roll back to. I
don't think you want to overload with this other meaning, which I see
as putting a special mark in the XLog -- completely unrelated.

They are completely unrelated because you're considering them on two
different levels. From the user interface level they are both a place
to roll back to. Whether they are inside the same transaction or not
is like the difference between a normal and a holdable cursor, but it's
the same interface.

#31jihuang
jihuang@iis.sinica.edu.tw
In reply to: Bruce Momjian (#6)
ERROR: heapgettup: failed ReadBuffer

I put 360000+ rows in a table , and now any select , update , analyze
... command fail.
the log shows "ERROR: heapgettup: failed ReadBuffer",
but any INSERT sql command success.

the table schema is

row | type | modifiers
---------------+-----------------------------+----------
test_id | integer | not null
snapshot | timestamp without time zone |
ip_client | inet |
ip_server | inet |
conn_time | integer |
response_time | integer |
response_head | character varying |
Check constraints:
"invalid_conn_time" CHECK (conn_time >= 0)
"invalid_resp_time" CHECK (response_time >= 0)

I didn't create any index,
any one know why or suggestion to save the un-readable database?

or anything I mis-configuration ??

Thanks for your help.

June-Yen

#32Tom Lane
tgl@sss.pgh.pa.us
In reply to: jihuang (#31)
Re: ERROR: heapgettup: failed ReadBuffer

jihuang <jihuang@iis.sinica.edu.tw> writes:

I put 360000+ rows in a table , and now any select , update , analyze
... command fail.
the log shows "ERROR: heapgettup: failed ReadBuffer",

What Postgres version is this? AFAICS that error has been impossible
for quite some time ...

regards, tom lane

#33jihuang
jihuang@iis.sinica.edu.tw
In reply to: Tom Lane (#32)
Re: ERROR: heapgettup: failed ReadBuffer

Thanks for your real-time respone!

the problem was sloved after I upgrade the postgreSQL from 7.3.4 to 7.4.2.

by the way, is there any bug-tracking website for postgreSQL ?
I follow the [HOMEPAGE] -> [DEVELOPERS] -> find nothing relative to
bugzilla-like items,
follow the [GBROG] ->>> it's PostgreSQL related projects , but without
PostgreSQL itself ?

let me show a advertisement... quote from ORELLY's Developer Weblogs

RT foundry is being developed in Taiwan as part of the Open Foundry

Project, which is aimed at encouraging for

FS/OSS development in Taiwan. The foundry is a SF-like, expect using

better technologies

(RT for bug/request tracking, subversion for source control, etc ...

the following link is the issue and comments log for sloving this
problem I said.
http://rt.openfoundry.org/Foundry/Project/Tracker/Display.html?Queue=90&amp;id=2653

there are some chinese characters mixed, but I just wanna to show that
host a dedicate issue/bug tracking system
may improve a software project evloution.

June-Yen

Tom Lane wrote:

Show quoted text

jihuang <jihuang@iis.sinica.edu.tw> writes:

I put 360000+ rows in a table , and now any select , update , analyze
... command fail.
the log shows "ERROR: heapgettup: failed ReadBuffer",

What Postgres version is this? AFAICS that error has been impossible
for quite some time ...

regards, tom lane

#34Peter Galbavy
peter.galbavy@knowtion.net
In reply to: Bruno Wolff III (#9)
Re: PITR Phase 2 - Design Planning

Bruno Wolff III wrote:

For long running transactions where you want to recover as much as possible,
one might also want to recover up until just before a specific transaction
committed (as opposed to started).

If your DB has died and you are recovering it, how do you reestablish a
session so that a transaction can complete ? Doesn't all client
connections assume that a transaction has failed if the connection to
the DB fails ?

Or am I just being naive, as usual ?

Peter