PITR Dead horse?

Started by Austin Gonyoualmost 22 years ago53 messages
#1Austin Gonyou
austin@coremetrics.com

Has this been beaten to death now? Just curious if PITR was in Dev tree
yet. Been out of the loop. TIA.
--
Austin Gonyou <austin@coremetrics.com>
Coremetrics, Inc.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Austin Gonyou (#1)
Re: PITR Dead horse?

Austin Gonyou <austin@coremetrics.com> writes:

Has this been beaten to death now? Just curious if PITR was in Dev tree
yet. Been out of the loop. TIA.

Nope... I've got some patches from Patrick Macdonald and JR Nield that I
need to integrate, but I believe those only cover some low-level changes
to the WAL log contents. There's a lot of management code yet to be
written.

regards, tom lane

#3Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Austin Gonyou (#1)
Re: PITR Dead horse?

Has this been beaten to death now? Just curious if PITR was in Dev tree
yet. Been out of the loop. TIA.

I and my co workers are very interested in implementing PITR. We will
tackle this for 7.5 if no one objects.
--
Tatsuo Ishii

#4Satoshi Nagayasu
snaga@snaga.org
In reply to: Tatsuo Ishii (#3)
Re: PITR Dead horse?

I and some other developers are also interested in.
Do you think we can work together?

Tatsuo Ishii <t-ishii@sra.co.jp> wrote:

Has this been beaten to death now? Just curious if PITR was in Dev tree
yet. Been out of the loop. TIA.

I and my co workers are very interested in implementing PITR. We will
tackle this for 7.5 if no one objects.
--
Tatsuo Ishii

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

--
NAGAYASU Satoshi <snaga@snaga.org>

#5Satoshi Nagayasu
pgsql@snaga.org
In reply to: Tatsuo Ishii (#3)
Re: PITR Dead horse?

I and some other developers are also interested in.
Do you think we can work together?

Tatsuo Ishii <t-ishii@sra.co.jp> wrote:

Has this been beaten to death now? Just curious if PITR was in Dev tree
yet. Been out of the loop. TIA.

I and my co workers are very interested in implementing PITR. We will
tackle this for 7.5 if no one objects.
--
Tatsuo Ishii

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

--
NAGAYASU Satoshi <snaga@snaga.org>

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tatsuo Ishii (#3)
Re: PITR Dead horse?

Tatsuo Ishii <t-ishii@sra.co.jp> writes:

I and my co workers are very interested in implementing PITR. We will
tackle this for 7.5 if no one objects.

Sounds good. I'll try to push in the work that Patrick and JR did
within the next day or two, and then you can take it from there...

regards, tom lane

#7Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Satoshi Nagayasu (#4)
Re: PITR Dead horse?

I and some other developers are also interested in.
Do you think we can work together?

Sure. Why not. I think it would be practical to decide who is the
leader of this project, though.
--
Tatsuo Ishii

#8Koichi Suzuki
suzukikui@nttdata.co.jp
In reply to: Tatsuo Ishii (#3)
Re: PITR Dead horse?

Hi, This is Suzuki from NTT DATA Intellilink.

I told Bruce Momjan that I and my colleagues are interested in
implementing PITR in BOF in NY LW2004. NTT's laboratory is very
interested in this issue and I'm planning to work with them. I hope we
could cooperate.

Tatsuo Ishii wrote:

Show quoted text

Has this been beaten to death now? Just curious if PITR was in Dev tree
yet. Been out of the loop. TIA.

I and my co workers are very interested in implementing PITR. We will
tackle this for 7.5 if no one objects.
--
Tatsuo Ishii

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

#9Nicolai Tufar
ntufar@pisem.net
In reply to: Koichi Suzuki (#8)
Re: PITR Dead horse?

I would like to join this effort too.
I was afraid that people at RedHat are already
halfway though and were to release their work
shortly. But it does not seem to be the case.

Regards,
Nicolai

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
owner@postgresql.org] On Behalf Of Koichi Suzuki
Sent: Wednesday, February 04, 2004 11:25 AM
To: Tatsuo Ishii
Cc: austin@coremetrics.com; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] PITR Dead horse?

Hi, This is Suzuki from NTT DATA Intellilink.

I told Bruce Momjan that I and my colleagues are interested in
implementing PITR in BOF in NY LW2004. NTT's laboratory is very
interested in this issue and I'm planning to work with them. I hope

we

could cooperate.

Tatsuo Ishii wrote:

Has this been beaten to death now? Just curious if PITR was in Dev

tree

yet. Been out of the loop. TIA.

I and my co workers are very interested in implementing PITR. We

will

tackle this for 7.5 if no one objects.
--
Tatsuo Ishii

---------------------------(end of

broadcast)---------------------------

TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to

majordomo@postgresql.org)

---------------------------(end of

broadcast)---------------------------

TIP 1: subscribe and unsubscribe commands go to

majordomo@postgresql.org

#10Marc G. Fournier
scrappy@postgresql.org
In reply to: Tatsuo Ishii (#7)
Re: PITR Dead horse?

On Wed, 4 Feb 2004, Tatsuo Ishii wrote:

I and some other developers are also interested in.
Do you think we can work together?

Sure. Why not. I think it would be practical to decide who is the
leader of this project, though.

Is this something large enough, like the win32 stuff, that having a side
list for discussions is worth setting up?

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marc G. Fournier (#10)
Re: PITR Dead horse?

"Marc G. Fournier" <scrappy@postgresql.org> writes:

Is this something large enough, like the win32 stuff, that having a side
list for discussions is worth setting up?

In terms of the amount of code to be written, I expect it's larger than
the win32 porting effort. And it should be mostly pretty separate from
hacking the core backend, since most of what remains to do is writing
external management utilities (I think).

I've been dissatisfied with having the separate pgsql-hackers-win32
list; I feel it just fragments the discussion, and people tend to end up
crossposting to -hackers anyway. But a separate list for PITR work
might be a good idea despite that experience, since it seems like it'd
be a more separable project.

Any other opinions out there?

regards, tom lane

#12Simon Riggs
simon@2ndquadrant.com
In reply to: Tom Lane (#11)
Re: PITR Dead horse?

Tom Lane wrote
"Marc G. Fournier" <scrappy@postgresql.org> writes:

Is this something large enough, like the win32 stuff, that having a

side

list for discussions is worth setting up?

In terms of the amount of code to be written, I expect it's larger

than

the win32 porting effort. And it should be mostly pretty separate

from

hacking the core backend, since most of what remains to do is writing
external management utilities (I think).

Yes it is! I'd like to start the discussion about PITR and try to go
through some functional requirements and how those might be implemented.
The Win32 port has a self-evident set of functional requirements; I'm
not sure that the PITR stuff is as clear - so I couldn't pass any
judgement at all (even if I did know the code well enough) on how big a
coding task that is, but I can see that the analysis and discussion is
large indeed.

I've been dissatisfied with having the separate pgsql-hackers-win32
list; I feel it just fragments the discussion, and people tend to end

up

crossposting to -hackers anyway. But a separate list for PITR work
might be a good idea despite that experience, since it seems like it'd
be a more separable project.

I'd vote for a new list dedicated to discussing "Robustness" issues,
such as PITR and the fsync/sync issues. IMHO, PostgreSQL has the
Functionality and Performance, it just needs some rock-solid analysis of
where-things-can-go-wrong with it, so that the business data centre
people will be able to use it with absolute confidence...even if the
answer is "we've got every base covered". For me, the issues about
robustness are as much to do with risk reduction and confidence building
as they are about specific features in that area. [Wow, I expect some
flames on those comments!]

The list probably would remain clearly differentiated, in the same way
[Performance] covers lots of areas not discussed in [Hackers].

Not hung up on the name either, just something that indicates
breadth-of-scope, e.g. Availability or Data Protection or Resilience
etc..; maybe the Advocates would like to name it? It might even be a
press-release: "PostgreSQL community focuses new efforts towards
Robustness features for its next major release".

Best Regards, Simon Riggs

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#12)
Re: PITR Dead horse?

"Simon Riggs" <simon@2ndquadrant.com> writes:

I'd vote for a new list dedicated to discussing "Robustness" issues,
such as PITR and the fsync/sync issues. IMHO, PostgreSQL has the
Functionality and Performance, it just needs some rock-solid analysis of
where-things-can-go-wrong with it, so that the business data centre
people will be able to use it with absolute confidence...even if the
answer is "we've got every base covered". For me, the issues about
robustness are as much to do with risk reduction and confidence building
as they are about specific features in that area. [Wow, I expect some
flames on those comments!]

You're right. Exactly where do you expect to find the expertise and
interest to do such an analysis? On pghackers, that's where. There's
no reason to invent a new mailing list for what should be a continuing
topic in pghackers. And to the extent that you were to move such a
discussion somewhere else, you'd just risk losing the attention of the
pair of eyeballs that might notice a hole in your analysis.

Not hung up on the name either, just something that indicates
breadth-of-scope, e.g. Availability or Data Protection or Resilience
etc..; maybe the Advocates would like to name it? It might even be a
press-release: "PostgreSQL community focuses new efforts towards
Robustness features for its next major release".

I think such a press release would be counterproductive, as it would
immediately make people question whether we have reliability problems.

regards, tom lane

#14Nicolai Tufar
ntufar@pisem.net
In reply to: Simon Riggs (#12)
Re: PITR Dead horse?

Totally agree. Robustness and rock-solidness are the only
things missing for PostgreSQL to become the killer of certain
commercial enterprise databases out there. And the only thing
that is missing in this respect is PITR. PITR should be there
INGRES had it in '84 and some people as why PostgreSQL does
not have it.

I am well versed in the internals of "PITR" features of a certain
leading enterprise-class database out there. And would like to
contribute (write code) to this effort as much as I can.

Best regards,
Nicolai Tufar

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
owner@postgresql.org] On Behalf Of Simon Riggs
Sent: Thursday, February 05, 2004 1:33 AM
To: 'Tom Lane'; 'Marc G. Fournier'
Cc: 'Tatsuo Ishii'; snaga@snaga.org; austin@coremetrics.com; pgsql-
hackers@postgresql.org
Subject: Re: [HACKERS] PITR Dead horse?

Tom Lane wrote
"Marc G. Fournier" <scrappy@postgresql.org> writes:

Is this something large enough, like the win32 stuff, that having

a

side

list for discussions is worth setting up?

In terms of the amount of code to be written, I expect it's larger

than

the win32 porting effort. And it should be mostly pretty separate

from

hacking the core backend, since most of what remains to do is

writing

external management utilities (I think).

Yes it is! I'd like to start the discussion about PITR and try to go
through some functional requirements and how those might be

implemented.

The Win32 port has a self-evident set of functional requirements; I'm
not sure that the PITR stuff is as clear - so I couldn't pass any
judgement at all (even if I did know the code well enough) on how big

a

coding task that is, but I can see that the analysis and discussion is
large indeed.

I've been dissatisfied with having the separate pgsql-hackers-win32
list; I feel it just fragments the discussion, and people tend to

end

up

crossposting to -hackers anyway. But a separate list for PITR work
might be a good idea despite that experience, since it seems like

it'd

be a more separable project.

I'd vote for a new list dedicated to discussing "Robustness" issues,
such as PITR and the fsync/sync issues. IMHO, PostgreSQL has the
Functionality and Performance, it just needs some rock-solid analysis

of

where-things-can-go-wrong with it, so that the business data centre
people will be able to use it with absolute confidence...even if the
answer is "we've got every base covered". For me, the issues about
robustness are as much to do with risk reduction and confidence

building

as they are about specific features in that area. [Wow, I expect some
flames on those comments!]

The list probably would remain clearly differentiated, in the same way
[Performance] covers lots of areas not discussed in [Hackers].

Not hung up on the name either, just something that indicates
breadth-of-scope, e.g. Availability or Data Protection or Resilience
etc..; maybe the Advocates would like to name it? It might even be a
press-release: "PostgreSQL community focuses new efforts towards
Robustness features for its next major release".

Best Regards, Simon Riggs

---------------------------(end of

broadcast)---------------------------

TIP 9: the planner will ignore your desire to choose an index scan if

your

Show quoted text

joining column's datatypes do not match

#15Josh Berkus
josh@agliodbs.com
In reply to: Nicolai Tufar (#14)
Re: PITR Dead horse?

Simon,

I'd vote for a new list dedicated to discussing "Robustness" issues,
such as PITR and the fsync/sync issues.

<snip>

The list probably would remain clearly differentiated, in the same way
[Performance] covers lots of areas not discussed in [Hackers].

Actually, Simon, you're welcome to bring this discussion over to PERFORMANCE.
We discuss scalability and HA on Performance frequently, and I don't feel
that the discussion you refer to would be out of place.

But Tom is right that you need the feedback of a lot of the people on Hackers
once you start discussing a code solution, so there's not much point in
starting a new mailing list that all the same people need to subscribe to.
Certainly Jan had enough trouble getting meaningful feedback on the sync
issue here; on his own list he'd still be talking to himself.

As far as promoting an image of reliability, that belongs on Advocacy. The
image and the reality don't sync much; we're already about 500% more reliable
than MS SQL Server but ask any ten CIOs what they think? That's just
marketing.

--
-Josh Berkus
Aglio Database Solutions
San Francisco

#16Dave Page
dpage@vale-housing.co.uk
In reply to: Josh Berkus (#15)
Re: PITR Dead horse?

-----Original Message-----
From: Nicolai Tufar [mailto:ntufar@pisem.net]
Sent: 05 February 2004 00:01
To: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] PITR Dead horse?

Totally agree. Robustness and rock-solidness are the only
things missing for PostgreSQL to become the killer of certain
commercial enterprise databases out there.

Well I've only been using PostgreSQL since 1997 and the *only* release I
ever had problems with was 6.3.2. We also use(d) Informix SE, DB2,
Unidata and SQL Server and only Informix and Unidata come close to the
robustness of PostgreSQL - and they're not the ones we need to worry
about.

Now I'm not saying we shouldn't be continually looking to improve
things, but I don't think this is quite the problem you imply.

Regards, Dave.

#17Dave Page
dpage@vale-housing.co.uk
In reply to: Dave Page (#16)
Re: PITR Dead horse?

-----Original Message-----
From: Nicolai Tufar [mailto:ntufar@pisem.net]
Sent: 05 February 2004 08:15
To: Dave Page
Subject: RE: [HACKERS] PITR Dead horse?

-----Original Message-----
From: Dave Page [mailto:dpage@vale-housing.co.uk] Well I've

only been

using PostgreSQL since 1997 and the *only* release

I

ever had problems with was 6.3.2. We also use(d) Informix SE, DB2,
Unidata and SQL Server and only Informix and Unidata come

close to the

robustness of PostgreSQL - and they're not the ones we need

to worry

about.

Don't know. But apparently different users will have
different demands From a database.

Of course, but I would argue that my claim that PostgreSQL is reliable
is backed up by the lack of people posting messages like 'we had a
powercut and now my DB is hosed'.

Now I'm not saying we shouldn't be continually looking to improve
things, but I don't think this is quite the problem you imply.

For the customers I am dealing with it is quite a problem, believe me.

Do they have specific problems with the reliability of PostgreSQL then?
Perhaps you could post details of how things have gone wrong for them
(assuming you haven't already - I don't recall anything on -hackers
recently).

Regards, Dave

#18Rod Taylor
rbt@rbt.ca
In reply to: Dave Page (#17)
Re: PITR Dead horse?

Don't know. But apparently different users will have
different demands From a database.

Of course, but I would argue that my claim that PostgreSQL is reliable
is backed up by the lack of people posting messages like 'we had a
powercut and now my DB is hosed'.

One thing we could use (and I have no idea how to do it) is a "This
hardware is not appropriate for a database" test kit.

Something to detect lying disks, battery backed write cache that isn't
so battery backed, etc.

--
Rod Taylor <rbt [at] rbt [dot] ca>

Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
PGP Key: http://www.rbt.ca/rbtpub.asc

#19Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Dave Page (#16)
Re: PITR Dead horse?

Dave Page wrote:

-----Original Message-----
From: Nicolai Tufar [mailto:ntufar@pisem.net]
Sent: 05 February 2004 00:01
To: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] PITR Dead horse?

Totally agree. Robustness and rock-solidness are the only
things missing for PostgreSQL to become the killer of certain
commercial enterprise databases out there.

Well I've only been using PostgreSQL since 1997 and the *only* release I
ever had problems with was 6.3.2. We also use(d) Informix SE, DB2,
Unidata and SQL Server and only Informix and Unidata come close to the
robustness of PostgreSQL - and they're not the ones we need to worry
about.

Now I'm not saying we shouldn't be continually looking to improve
things, but I don't think this is quite the problem you imply.

I assume he was talking about the lack of data recovery in cases of hard
drive failure --- we now require you restore from backup or use a
replicated machine/drive setup.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#20Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tatsuo Ishii (#3)
Re: PITR Dead horse?

Tatsuo Ishii wrote:

Has this been beaten to death now? Just curious if PITR was in Dev tree
yet. Been out of the loop. TIA.

I and my co workers are very interested in implementing PITR. We will
tackle this for 7.5 if no one objects.

I have put up a PITR project page:

http://momjian.postgresql.org/main/writings/pgsql/project

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#21Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Koichi Suzuki (#8)
Re: PITR Dead horse?

Koichi Suzuki wrote:

Hi, This is Suzuki from NTT DATA Intellilink.

I told Bruce Momjan that I and my colleagues are interested in
implementing PITR in BOF in NY LW2004. NTT's laboratory is very
interested in this issue and I'm planning to work with them. I hope we
could cooperate.

Yes, I am going to focus on this next week when I return. With Win32
moving along, PITR is my next big target. I want to get things moving.
The first step is for Tom to get the PITR WAL patches in. Then we need
to discuss what else we need and get those on the PITR project page:

http://momjian.postgresql.org/main/writings/pgsql/project

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#22Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Nicolai Tufar (#9)
Re: PITR Dead horse?

Nicolai Tufar wrote:

I would like to join this effort too.
I was afraid that people at RedHat are already
halfway though and were to release their work
shortly. But it does not seem to be the case.

We are a long way away from completion:

http://momjian.postgresql.org/main/writings/pgsql/project

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#23Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#11)
Re: PITR Dead horse?

Tom Lane wrote:

"Marc G. Fournier" <scrappy@postgresql.org> writes:

Is this something large enough, like the win32 stuff, that having a side
list for discussions is worth setting up?

In terms of the amount of code to be written, I expect it's larger than
the win32 porting effort. And it should be mostly pretty separate from
hacking the core backend, since most of what remains to do is writing
external management utilities (I think).

I've been dissatisfied with having the separate pgsql-hackers-win32
list; I feel it just fragments the discussion, and people tend to end up
crossposting to -hackers anyway. But a separate list for PITR work
might be a good idea despite that experience, since it seems like it'd
be a more separable project.

I think the win32 email list has worked well. What is has allowed is
people who want to track only win32 to get only those emails. It
doesn't help people already on hackers because hacker input is needed.

There are currently 102 Win32 subscribers, and most are not on the
hackers list.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#24Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Marc G. Fournier (#10)
Re: PITR Dead horse?

Marc G. Fournier wrote:

On Wed, 4 Feb 2004, Tatsuo Ishii wrote:

I and some other developers are also interested in.
Do you think we can work together?

Sure. Why not. I think it would be practical to decide who is the
leader of this project, though.

Is this something large enough, like the win32 stuff, that having a side
list for discussions is worth setting up?

Yes, I would like to have such a list, and will advertize it on the PITR
project page:

http://momjian.postgresql.org/main/writings/pgsql/project

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#25Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Bruce Momjian (#19)
Re: PITR Dead horse?

Bruce Momjian wrote:

Dave Page wrote:

-----Original Message-----
From: Nicolai Tufar [mailto:ntufar@pisem.net]
Sent: 05 February 2004 00:01
To: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] PITR Dead horse?

Totally agree. Robustness and rock-solidness are the only
things missing for PostgreSQL to become the killer of certain
commercial enterprise databases out there.

Well I've only been using PostgreSQL since 1997 and the *only* release I
ever had problems with was 6.3.2. We also use(d) Informix SE, DB2,
Unidata and SQL Server and only Informix and Unidata come close to the
robustness of PostgreSQL - and they're not the ones we need to worry
about.

Now I'm not saying we shouldn't be continually looking to improve
things, but I don't think this is quite the problem you imply.

I assume he was talking about the lack of data recovery in cases of hard
drive failure --- we now require you restore from backup or use a
replicated machine/drive setup.

I retract this email. He clearly was talking about PostgreSQL
reliability, and Dave is right, it is pretty much a non-issue, though
maybe mindshare needs some help.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#26Marc G. Fournier
scrappy@postgresql.org
In reply to: Bruce Momjian (#24)
Re: PITR Dead horse?

pgsql-hackers-pitr@postgresql.org

I set myself as owner, since I didn't figure it was something you really
needed added to your plate? :) Just means you don't have to go through
and do the Approvals for postings when they need it, I'll just do it as my
normal stuff ...

On Thu, 5 Feb 2004, Bruce Momjian wrote:

Marc G. Fournier wrote:

On Wed, 4 Feb 2004, Tatsuo Ishii wrote:

I and some other developers are also interested in.
Do you think we can work together?

Sure. Why not. I think it would be practical to decide who is the
leader of this project, though.

Is this something large enough, like the win32 stuff, that having a side
list for discussions is worth setting up?

Yes, I would like to have such a list, and will advertize it on the PITR
project page:

http://momjian.postgresql.org/main/writings/pgsql/project

--
Bruce Momjian                        |  http://candle.pha.pa.us
pgman@candle.pha.pa.us               |  (610) 359-1001
+  If your life is a hard drive,     |  13 Roberts Road
+  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664

#27Austin Gonyou
austin@coremetrics.com
In reply to: Bruce Momjian (#20)
Re: PITR Dead horse?

Wow. What a wonderful response. Thanks all!

On Thu, 2004-02-05 at 08:57, Bruce Momjian wrote:

Tatsuo Ishii wrote:

Has this been beaten to death now? Just curious if PITR was in Dev tree
yet. Been out of the loop. TIA.

I and my co workers are very interested in implementing PITR. We will
tackle this for 7.5 if no one objects.

I have put up a PITR project page:

http://momjian.postgresql.org/main/writings/pgsql/project

--
Austin Gonyou <austin@coremetrics.com>
Coremetrics, Inc.

#28Nicolai Tufar
ntufar@pisem.net
In reply to: Dave Page (#17)
Re: PITR Dead horse?

-----Original Message-----
From: Dave Page [mailto:dpage@vale-housing.co.uk]
Sent: Thursday, February 05, 2004 11:02 AM
To: ntufar@pisem.net; pgsql-hackers@postgresql.org
Subject: RE: [HACKERS] PITR Dead horse?
Of course, but I would argue that my claim that PostgreSQL is reliable
is backed up by the lack of people posting messages like 'we had a
powercut and now my DB is hosed'.

It's not like that. It's more like 'what will happen if we had a
powercut/
disk failure/cpu failure/memory failure, etc, etc.' and that answer I
have
to give is 'why, there is PITR of course!'. No other answer will pass in
enterprise world. Those people are not open-minded, they'd rather be
safe
than sorry.

Do they have specific problems with the reliability of PostgreSQL

then?

Perhaps you could post details of how things have gone wrong for them
(assuming you haven't already - I don't recall anything on -hackers
recently).

Nothing remarkable. PostgreSQL just works. Bu as I said before,
In enterprise world, good sleep at night is treasured above all.

Show quoted text

Regards, Dave

#29Dave Page
dpage@vale-housing.co.uk
In reply to: Nicolai Tufar (#28)
Re: PITR Dead horse?

-----Original Message-----
From: Nicolai Tufar [mailto:ntufar@pisem.net]
Sent: 05 February 2004 17:35
To: Dave Page; pgsql-hackers@postgresql.org
Subject: RE: [HACKERS] PITR Dead horse?

-----Original Message-----
From: Dave Page [mailto:dpage@vale-housing.co.uk]
Sent: Thursday, February 05, 2004 11:02 AM
To: ntufar@pisem.net; pgsql-hackers@postgresql.org
Subject: RE: [HACKERS] PITR Dead horse?
Of course, but I would argue that my claim that PostgreSQL

is reliable

is backed up by the lack of people posting messages like 'we had a
powercut and now my DB is hosed'.

It's not like that. It's more like 'what will happen if we
had a powercut/ disk failure/cpu failure/memory failure, etc,
etc.' and that answer I have to give is 'why, there is PITR
of course!'. No other answer will pass in enterprise world.
Those people are not open-minded, they'd rather be safe than sorry.

Ahh, that's not quite what I thought you meant. It sounded like you were
questioning the reliability of PostgreSQL, not it's ability to be
recovered to point of failure.

Do they have specific problems with the reliability of PostgreSQL

then?

Perhaps you could post details of how things have gone

wrong for them

(assuming you haven't already - I don't recall anything on -hackers
recently).

Nothing remarkable. PostgreSQL just works. Bu as I said
before, In enterprise world, good sleep at night is treasured
above all.

My SQL2K servers give me far more sleepless nights than PostgreSQL ever
did!

Regards, Dave.

#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dave Page (#29)
Re: PITR Dead horse?

"Dave Page" <dpage@vale-housing.co.uk> writes:

Ahh, that's not quite what I thought you meant. It sounded like you were
questioning the reliability of PostgreSQL, not it's ability to be
recovered to point of failure.

I think the waters got muddied a bit by the suggestion elsewhere in the
thread (not from Nicolai, IIRC) that we needed a mailing list to talk
about reliability issues in general. We know we need PITR to help us
become a more credible enterprise-grade database; so that discussion is
short and sweet. What people were confused about was whether there was
enough other issues to need ongoing discussion.

regards, tom lane

#31Nicolai Tufar
ntufar@pisem.net
In reply to: Dave Page (#29)
Re: PITR Dead horse?

-----Original Message-----
From: Dave Page [mailto:dpage@vale-housing.co.uk]
My SQL2K servers give me far more sleepless nights than PostgreSQL

ever

did!

You bet! I totally agree with you.
Technicians like you, me and most people on this list
Already know that PostgreSQL is stable and reliable.
It is management that needs to be convinced, and for this
we need to have PITR in feature list.

Regards, Dave.

Regards,
Nicolai

#32Austin Gonyou
austin@coremetrics.com
In reply to: Nicolai Tufar (#31)
Re: PITR Dead horse?

On Thu, 2004-02-05 at 14:00, Nicolai Tufar wrote:

-----Original Message-----
From: Dave Page [mailto:dpage@vale-housing.co.uk]
My SQL2K servers give me far more sleepless nights than PostgreSQL

ever

did!

You bet! I totally agree with you.
Technicians like you, me and most people on this list
Already know that PostgreSQL is stable and reliable.
It is management that needs to be convinced, and for this
we need to have PITR in feature list.

Regards, Dave.

As previously stated by Bruce I believe, the mindshare department needs
some work. For this, the PITR is a necessity, but also when comparing
features with other DBs that people and businesses are currently
familiar with.

--
Austin Gonyou <austin@coremetrics.com>
Coremetrics, Inc.

#33Chester Kustarz
chester@arbor.net
In reply to: Bruce Momjian (#23)
Re: PITR Dead horse?

I do not see the win32 list on http://www.postgresql.org/lists.html
How would I find out about it and join? I probably did not subscribe
to hackers when it started.

Show quoted text

On Thu, 5 Feb 2004, Bruce Momjian wrote:

I think the win32 email list has worked well. What is has allowed is
people who want to track only win32 to get only those emails. It
doesn't help people already on hackers because hacker input is needed.

#34Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Austin Gonyou (#32)
Re: PITR Dead horse?

Austin Gonyou wrote:

On Thu, 2004-02-05 at 14:00, Nicolai Tufar wrote:

-----Original Message-----
From: Dave Page [mailto:dpage@vale-housing.co.uk]
My SQL2K servers give me far more sleepless nights than PostgreSQL

ever

did!

You bet! I totally agree with you.
Technicians like you, me and most people on this list
Already know that PostgreSQL is stable and reliable.
It is management that needs to be convinced, and for this
we need to have PITR in feature list.

Regards, Dave.

As previously stated by Bruce I believe, the mindshare department needs
some work. For this, the PITR is a necessity, but also when comparing
features with other DBs that people and businesses are currently
familiar with.

PITR is required to recover all data after total hardware failure. It
isn't just a mindshare issue.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#35Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Marc G. Fournier (#26)
Re: PITR Dead horse?

Marc G. Fournier wrote:

pgsql-hackers-pitr@postgresql.org

I set myself as owner, since I didn't figure it was something you really
needed added to your plate? :) Just means you don't have to go through
and do the Approvals for postings when they need it, I'll just do it as my
normal stuff ...

OK, I have added the mailing list to the web page:

http://momjian.postgresql.org/main/writings/pgsql/project

and have subscribed myself.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#36Christopher Browne
cbbrowne@acm.org
In reply to: Austin Gonyou (#32)
Re: PITR Dead horse?

A long time ago, in a galaxy far, far away, pgman@candle.pha.pa.us (Bruce Momjian) wrote:

Austin Gonyou wrote:

As previously stated by Bruce I believe, the mindshare department needs
some work. For this, the PITR is a necessity, but also when comparing
features with other DBs that people and businesses are currently
familiar with.

PITR is required to recover all data after total hardware failure. It
isn't just a mindshare issue.

One of the valuable "use cases" of PITR is in replication, and
correspondingly, one of the valuable "use cases" of replication is in
doing major version upgrades.

As a result, a _really valuable thing_ would be for the "PITR reader"
process to be able to read data from "more elderly" versions of
PostgreSQL.

That may not prove practical, but the more flexible it is, the more
useful it certainly is...
--
"cbbrowne","@","cbbrowne.com"
http://www.ntlug.org/~cbbrowne/wp.html
Space Corps Directive #997: Work done by an officer's doppleganger in
a parallel universe cannot be claimed as overtime. -- Red Dwarf

#37scott.marlowe
scott.marlowe@ihs.com
In reply to: Rod Taylor (#18)
Re: PITR Dead horse?

On Thu, 5 Feb 2004, Rod Taylor wrote:

Don't know. But apparently different users will have
different demands From a database.

Of course, but I would argue that my claim that PostgreSQL is reliable
is backed up by the lack of people posting messages like 'we had a
powercut and now my DB is hosed'.

One thing we could use (and I have no idea how to do it) is a "This
hardware is not appropriate for a database" test kit.

Something to detect lying disks, battery backed write cache that isn't
so battery backed, etc.

but I'm not sure you can test that without power off tests... so, it
would have to be a test that kinda started up then told you to pull the
plug on the box. Even a kernel panic wouldn't detect it because the drive
would still be powered up.

Or, you could have a test that checked what kind of drive it was (IDE
versus SCSI) and maybe had a table of drives that are known to lie,
possibly even by version, should drives of the same model stop lying half
way through production due to fixes in their firmware.

I'd guess it the table would still have to be built the old fashioned way,
by doing power off tests.

#38Simon Riggs
simon@2ndquadrant.com
In reply to: Bruce Momjian (#35)
Re: [HACKERS] PITR Dead horse?

Bruce Momjian
Marc G. Fournier wrote:

pgsql-hackers-pitr@postgresql.org

I set myself as owner, since I didn't figure it was something you

really

needed added to your plate? :) Just means you don't have to go

through

and do the Approvals for postings when they need it, I'll just do it

as

my

normal stuff ...

OK, I have added the mailing list to the web page:

http://momjian.postgresql.org/main/writings/pgsql/project

and have subscribed myself.

Sorry, I've joined also - I thought Marc's post was a suggestion rather
than a reality. I checked http://www.postgresql.org/lists.html but it
wasn't listed, so I thought it didn't exist yet.

Is there a digest of all previous postings? Or another way to access
them?

Regards, Simon

#39Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Simon Riggs (#38)
Re: [HACKERS] PITR Dead horse?

Simon Riggs wrote:

Bruce Momjian
Marc G. Fournier wrote:

pgsql-hackers-pitr@postgresql.org

I set myself as owner, since I didn't figure it was something you

really

needed added to your plate? :) Just means you don't have to go

through

and do the Approvals for postings when they need it, I'll just do it

as

my

normal stuff ...

OK, I have added the mailing list to the web page:

http://momjian.postgresql.org/main/writings/pgsql/project

and have subscribed myself.

Sorry, I've joined also - I thought Marc's post was a suggestion rather
than a reality. I checked http://www.postgresql.org/lists.html but it
wasn't listed, so I thought it didn't exist yet.

Is there a digest of all previous postings? Or another way to access
them?

No one has said anything on the list yet. :-)

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#40Alvaro Herrera
alvherre@dcc.uchile.cl
In reply to: scott.marlowe (#37)
Re: PITR Dead horse?

On Mon, Feb 09, 2004 at 10:04:56AM -0700, scott.marlowe wrote:

On Thu, 5 Feb 2004, Rod Taylor wrote:

One thing we could use (and I have no idea how to do it) is a "This
hardware is not appropriate for a database" test kit.

Something to detect lying disks, battery backed write cache that isn't
so battery backed, etc.

but I'm not sure you can test that without power off tests... so, it
would have to be a test that kinda started up then told you to pull the
plug on the box. Even a kernel panic wouldn't detect it because the drive
would still be powered up.

Try UMLSIM, umlsim.sourceforge.net

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"El destino baraja y nosotros jugamos" (A. Schopenhauer)

#41Simon Riggs
simon@2ndquadrant.com
In reply to: Bruce Momjian (#39)
Re: [HACKERS] PITR Dead right

Bruce Momjian
No one has said anything on the list yet. :-)

Great. Well, I'm planning to be on-line tomorrow night to begin thinking
this through and making some proposals.

If anybody's out there and intending to help, please shout about it on
this list now - Marc's set it up for us, so we should use it, or shut it
down!

Looking forward to working with y'all.

Very best regards, Simon Riggs

#42Fred Moyer
fred@redhotpenguin.com
In reply to: Simon Riggs (#41)
Re: [HACKERS] PITR Dead right

On Wed, 2004-02-11 at 21:10, Simon Riggs wrote:

Bruce Momjian
No one has said anything on the list yet. :-)

Great. Well, I'm planning to be on-line tomorrow night to begin thinking
this through and making some proposals.

If anybody's out there and intending to help, please shout about it on
this list now - Marc's set it up for us, so we should use it, or shut it
down!

I haven't been involved on the development side of PostgreSQL yet but
have managed to make my way to this list with an interest in working
on PITR. Let me know what I can do to help, I don't mind doing grunt
work if it helps push this along.

Looking forward to working with y'all.

I am also looking forward to working on this.

Hope I can contribute to this great project,

Regards,
Fred

#43Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Koichi Suzuki (#8)
Re: PITR Dead horse?

Koichi Suzuki wrote:

Hi, This is Suzuki from NTT DATA Intellilink.

I told Bruce Momjan that I and my colleagues are interested in
implementing PITR in BOF in NY LW2004. NTT's laboratory is very
interested in this issue and I'm planning to work with them. I hope we
could cooperate.

I assume everyone is on the PITR mailing list so you can get involved in
discussions when they start:

http://momjian.postgresql.org/main/writings/pgsql/project

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#44Mark Kirkwood
markir@paradise.net.nz
In reply to: Bruce Momjian (#39)
Re: [HACKERS] PITR Dead horse?

Tom recently posted the PITR patch to PATCHES - maybe we should sent it
to this list too. That might kick start the discussion...

regards

Mark

Bruce Momjian wrote:

Show quoted text

No one has said anything on the list yet. :-)

#45Simon Riggs
simon@2ndquadrant.com
In reply to: Fred Moyer (#42)
Proposals for PITR

PITR for PostgreSQL 7.5+
===========================

I'm posting a few thoughts and plans on how to progress PITR. Initially,
I'd suggest chewing through some ideas to get some direction and then
split out some tasks to pursue individually or in teams.

I'm not claiming leadership, infallibility or anything really - I'm more
than happy to work together on these or other ideas for PITR. Any and
all comments or completely alternate views are welcomed.

It seems very likely that this design/plan comes much later than other
designs/thoughts from members of the PGDG community or elsewhere. All
ideas, corrections or offerings of partial code will be gratefully
accepted. There's no competition here - if you do it first and it works,
great. I'll buy you a beer.

I'll update this as information arrives and the picture changes:

OVERVIEW
PITR means Point-in-Time-Recovery and is an extension to the current
functionality of PostgreSQL. Achieving PITR will be a mixture of OS
features and integrated dbms features. My suggested first priority is to
sort out the latter and not reinvent-the-wheel too much on OS-provided
features.

OVERALL PITR REQUIREMENTS
- A full backup of the database
- A backup of set of time-sequenceable changes that can be used to
"rollforward" from the point that the full backup was taken to the
desired point in time.
- Best practice, possibly offered by backup management software, to
implement both types of backup using a planned & structured approach

SUGGESTED IMPLEMENTATION GOALS
1. Exactly accurate PITR for databases
2. Robustness of code
3. Close to zero performance impact on normal (backed up) system
4. Good overall performance of total recovery procedures to minimise
outage
5. Other normal PostgreSQL code givens: Berkeley licence, portable ANSI
C

GENERAL PITR RECOVERY SCENARIO
The general PITR recovery scenario requires:
A - Take a full database backup
B - Take regular log file backups
C - Restore a full backup of a database.
If you cannot do this, hide your face in shame.
If it fails, retry it until it works. If it never does, you're out of
luck. If it succeeds, but portions of the database are corrupt, it may
still be possible to continue if non-critical sections of the dbms are
still intact, such as the system tables.
D - Restore a full set of log records.
If you cannot do this, you will have to accept that you have lost data.
If it fails, retry it until it works. If it never does, at least you
have the full backup point, even if that is out of date (slightly).
E - Locate the point in the set of log files that matches the state at
which the full backup was taken.
If this cannot occur or errors, it may be because you have a set of log
files that don't match your full backup. In that case, you must accept
that the full backup you have is the best you've got.
F - Decide somehow what the point in time is you would like to recover
to
G - Issue commands to define the Point-in-Time to which you would like
to recover.
H - Rollforward from the point of the backup to the defined PIT.
If this fails because of corrupted log files, then you may choose to
either: try to carry on restoring past any corruption point, or give up
and be happy with however far you've got.

SUGGESTED FEATURES TO BE IMPLEMENTED
My suggestion is that PostgreSQL PITR team should try to provide
facilities to allow B,E,F,G and H.
Partial facilities already exist to achieve:
A - either using pg_dump or via a full OS file copy on a shutdown
database
G - rollforwad on WAL logs already possible, but need to implement "stop
at point in time logic".

It shouldn't be our role to implement file handling: backup and restore
at a later time can be done many ways, so let the user pick their
preferred method, according to their preferences, data volume, available
funds etc.

We should assume that the system onto which the restore takes place is
completely different from the system on which the backup was taken. We
may need to alter config files prior to starting the restored database.
Any other pre-requisites?

COMPARISON WITH EXISTING TODO ENTRIES
- Add entries to WAL files so it contains enough information for
recovery after a full backup
This is assumed to be complete. [I'm not sure where the role of pg_clog
is in all of this, so this may be a broken assumption.] Comments?

- Write application to archive WAL files to tape, disk, or network
Probably need to do first part, but I'm arguing not to do the copy to
tape..

- Determine how the archive writer will interact with the checkpoint
process and WAL file recycling
Required for above

- Write an application that will recover from a pg_dump or tar backup
and then recover WAL files to a specific time
Seems possible, but is possible without this initially, so do other
stuff first

SUGGESTED PHASING OF IMPLEMENTATION
Here's a suggested phasing of work, to move towards full PITR. Clearly
other ways are also possible...

Phase 0: Planning & Discussion, including writing detailed test plan

Thread 1
Phase 1.1: Implement backup & full restore from OS image (B, E)
Phase 1.2: Implement PITR (add G and modify for H)
Phase 1.3: Implement WAL log inspection facilities (to improve step F)

Thread 2
Phase 2.1: Implement step A/C using pg_dump

Overall, doing this is going to require lots of investigation and
discussion. Most importantly, its going to involve a fair amount of
detailed and fairly intrusive testing. i.e. complete database wipe and
restore.
Please don't anyone reading this think I'm going to cut acres of
code...we'll need lots of help all sorts of people, especially from
veteran PostgreSQL developers.

No time plans. We have time to get it right.

FEATURE OVERVIEWS & INVESTIGATIONS/IMPLEMENTATIONS REQUIRED
B - Backing up WAL log files
-Ordinarily, when old log segment files are no longer needed, they are
recycled (renamed to become the next segments in the numbered sequence).
This means that the data within them must be copied from there to
another location
AFTER postgres has closed that file
BEFORE it is renamed and recycled
Spotting that window of opportunity fairly accurately is important. We
will need some mechanism to recognise availability of file for copying,
then lock the file and copy it away.
Think about
-what will happen if postgres tries to reuse file while we are still
copying. Does postgres hang, waiting for copy to complete (which may be
a long time if the copy is hanging because of a disk full condition).
This may already be catered for in the code since recycle logic can
handle not-ready-to-recycle conditions (not sure).
-Is it possible to allow read-only queries to continue in this case?
-what will happen if copy fails?
-who will do the copying? Should we use another sub-process of
postmaster to do the copying, or should we try to use a separate program
entirely?
-what will happen if that process slows down? What will we do if it
fails?
-how will we notice that a WAL log is now ready for copying? How will we
communicate that to the archiver process?
-Manual indicates that current WAL format is bulky and would require
some compressed format to be implemented. Initially, I suggest ignoring
this and simply relying of OS or hardware/tape compression methods.

E - Tee up log files to backup state
Should be fairly straightforward when using a full OS file backup of a
correctly shutdown database.
-Investigate use of WAL file names, to see if any problems exist there.

G - Issue command for Recovery point in time
With full OS file backup, if the database is shutdown correctly, then we
will need a way to tell the database "you think you're up to date, but
you're not - I've added some more WAL files into the directories, so
roll forward on those now please".
-Decide when and how to issue command.
-Decide syntax for command?
RECOVER DATABASE TO TIME <TIME> if an SQL statement
Could also implement a new switch on postmaster, to tell it to come up
in recovery mode even though it thinks it doesn't need to?

H - Modify WAL rollfoward
Rollforward until point-in-time should be fairly straightforward.
Current implementation is once-started it will run until it runs out of
files to apply, so simply change the termination test. Where to store
the point-in-time variable? Global? Pass as parameter when WAL
rollforward logic invoked as a command?

A/C - using pg_dump
Following restore from pg_dump, pg_control will not be set at the same
point it was at when the backup was taken. As a result, step E would
incorrectly identify the starting position for the rollforward, which
would either result in a "cannot find correct log" message, or
attempting to rollforward with completely inappropriate WAL data. Some
means of getting round this issue needs to be investigated. Idea: modify
pg_dump to record the transactionid at the start of the pg_dump and have
it generate as the LAST SQL statement in the dump a statement to modify
the system tables (gasp!) to reflect the equivalent state at backup.
Maybe; investigation required. This route may well be complicated by
National Language issues etc, whereas the full backup method will at
least ignore all of that.

What else do we need to do?

What complications are there?

= = = = = = =

#46Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mark Kirkwood (#44)
Re: [HACKERS] PITR Dead horse?

Mark Kirkwood <markir@paradise.net.nz> writes:

Tom recently posted the PITR patch to PATCHES - maybe we should sent it
to this list too. That might kick start the discussion...

Actually, I tried to send it here first. This list bounced it because
it was over 40K (which I think is the standard limit for all the lists
except PATCHES). So, if you want to look at it, go look in the PATCHES
archives:
http://archives.postgresql.org/pgsql-patches/2004-02/msg00134.php

(BTW, Marc, I still don't see an entry for the -pitr list at
http://archives.postgresql.org/ ...)

regards, tom lane

#47Marc G. Fournier
scrappy@postgresql.org
In reply to: Tom Lane (#46)
Re: [HACKERS] PITR Dead horse?

On Thu, 12 Feb 2004, Tom Lane wrote:

(BTW, Marc, I still don't see an entry for the -pitr list at
http://archives.postgresql.org/ ...)

fixed ...

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664

#48Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#45)
Re: Proposals for PITR

"Simon Riggs" <simon@2ndquadrant.com> writes:

The general PITR recovery scenario requires:
A - Take a full database backup
B - Take regular log file backups
C - Restore a full backup of a database.

Note that pg_dump has absolutely zero to do with this. The base for a
PITR recovery scenario has to be a physical dump, not a logical dump;
else WAL replay won't work.

The conclusion that we came to in previous discussions was that the
physical dump can be taken simply by doing "tar" (or your favorite
alternative tool) on the $PGDATA directory tree --- you could optionally
exclude the pg_xlog directory but must include all else. It is okay to
do this while the database is live, so long as the series of log files
you will use to recover starts from a checkpoint that occurred before
you began the tar run. When you do this the contents of the tar archive
will be an inconsistent snapshot of the database and therefore not
directly useful. However, if you load the tar archive and then replay
WAL from a prior checkpoint to any time after completion of the tar run,
you will have brought the database back to consistency.

Part A of the problem therefore only requires tar plus enough management
software to keep track of the start and stop times of the tar run and
correlate the start time to the WAL log series. It is worth pointing
out though that you need to start up your Part B solution (archiving off
WAL files as they come free) *before* you start taking your base dump.

(When we get around to implementing tablespaces, things might get more
complicated, since you'd need to remember to archive tablespace trees
that might not live under $PGDATA. But we can ignore that refinement
for now, I think.)

We should assume that the system onto which the restore takes place is
completely different from the system on which the backup was taken.

It can't be too different, since you will need a binary-compatible
server; you won't be able to move across machine architectures this way.

- Add entries to WAL files so it contains enough information for
recovery after a full backup
This is assumed to be complete. [I'm not sure where the role of pg_clog
is in all of this, so this may be a broken assumption.] Comments?

I believe that the major problems (lack of recording of file creation/
deletion) were solved by the J.R. Nield patches I applied last week.
There may be some minor issues left to fix but I can't think of any
showstoppers.

- Write application to archive WAL files to tape, disk, or network
Probably need to do first part, but I'm arguing not to do the copy to
tape..

I'd like to somehow see this handled by a user-supplied program or
script. What we mainly need is to define a good API that lets the
archiver program understand which WAL segment files to archive when.

B - Backing up WAL log files
-Ordinarily, when old log segment files are no longer needed, they are
recycled (renamed to become the next segments in the numbered sequence).
This means that the data within them must be copied from there to
another location
AFTER postgres has closed that file
BEFORE it is renamed and recycled

My inclination would be to change the backend code so that as soon as a
WAL segment is completed, it is flagged as being ready to dump to tape
(or wherever). Possibly the easiest way to do this is to rename the
segment file somehow, perhaps "nnn" becomes "nnn.full". Then, after the
archiver process has properly dumped the file, reflag it as being dumped
(perhaps rename to "nnn.done"). Obviously there are any number of ways
we could do this flagging, and depending on an OS rename facility might
not be the best.

A segment then can be recycled when it is both (a) older than the latest
checkpoint and (b) flagged as dumped. Note that this approach allows
dumping of a file to start before the first time at which it could be
recycled. In the event of a crash and restart, WAL replay has to be
able to find the flagged segments, so the flagging mechanism can't be
one that would make this impossible.

Think about
-what will happen if postgres tries to reuse file while we are still
copying.

This is a non-problem; segments that haven't been recycled can't become
reuse targets.

-what will happen if copy fails?

This is the archiver's problem to deal with. It only gets to be a
serious problem when you run out of disk space for WAL segment files,
so in most scenarios there is time for manual intervention to fix any
such problem and restart the archiver.

-Manual indicates that current WAL format is bulky and would require
some compressed format to be implemented. Initially, I suggest ignoring
this and simply relying of OS or hardware/tape compression methods.

This is definitely something we could leave for later.

With full OS file backup, if the database is shutdown correctly, then we
will need a way to tell the database "you think you're up to date, but
you're not - I've added some more WAL files into the directories, so
roll forward on those now please".

I do not think this is an issue either, because my vision of this does
not include tar backups of shutdown databases. What will be backed up
is a live database, therefore the postmaster will definitely know that
it needs to perform WAL replay. What we will need is hooks to make sure
that the full set of required log files is available. It's entirely
possible that that set of log files exceeds available disk space, so it
needs to be possible to run WAL replay incrementally, loading and then
replaying additional log segments after deleting old ones.

Possibly we could do this with some postmaster command-line switches.
J. R. Nield's patch embodied an "interactive recovery" backend mode,
which I didn't like in detail but the general idea is not necessarily
wrong.

regards, tom lane

#49Simon Riggs
simon@2ndquadrant.com
In reply to: Tom Lane (#48)
Re: Proposals for PITR

Tom,

Thanks for your many comments and practical suggestions - most of which
I think I should be able to bash something out once I've got my new dev
env sorted.

I'll update the proposal into a design document with some of my earlier
blah taken out and all of your clarifications put in.

There's a few comments on stuff below:

Best Regards, Simon Riggs

Tom Lane [mailto:tgl@sss.pgh.pa.us] writes

- Write application to archive WAL files to tape, disk, or network
Probably need to do first part, but I'm arguing not to do the copy

to

tape..

I'd like to somehow see this handled by a user-supplied program or
script. What we mainly need is to define a good API that lets the
archiver program understand which WAL segment files to archive when.

B - Backing up WAL log files
-Ordinarily, when old log segment files are no longer needed, they

are

recycled (renamed to become the next segments in the numbered

sequence).

This means that the data within them must be copied from there to
another location
AFTER postgres has closed that file
BEFORE it is renamed and recycled

My inclination would be to change the backend code so that as soon as

a

WAL segment is completed, it is flagged as being ready to dump to tape
(or wherever). Possibly the easiest way to do this is to rename the
segment file somehow, perhaps "nnn" becomes "nnn.full". Then, after

the

archiver process has properly dumped the file, reflag it as being

dumped

(perhaps rename to "nnn.done"). Obviously there are any number of

ways

we could do this flagging, and depending on an OS rename facility

might

not be the best.

A segment then can be recycled when it is both (a) older than the

latest

checkpoint and (b) flagged as dumped. Note that this approach allows
dumping of a file to start before the first time at which it could be
recycled. In the event of a crash and restart, WAL replay has to be
able to find the flagged segments, so the flagging mechanism can't be
one that would make this impossible.

That sort of API doesn't do much for my sense of truth-and-beauty, but
it will work and allow us to get to the testing stage beyond where we
will, I'm sure, discover many things. When that knowledge is gained *we*
can refactor.

Spawning new post to think through the API in more detail.

With full OS file backup, if the database is shutdown correctly,

then we

will need a way to tell the database "you think you're up to date,

but

you're not - I've added some more WAL files into the directories, so
roll forward on those now please".

I do not think this is an issue either, because my vision of this does
not include tar backups of shutdown databases. What will be backed up
is a live database, therefore the postmaster will definitely know that
it needs to perform WAL replay. What we will need is hooks to make

sure

that the full set of required log files is available.

OK, again lets go for it on that assumption.

Longer term, I would feel more comfortable with a specific "backup
state". Relying on a side-effect of crash recovery for disaster recovery
doesn't give me a warm feeling. BUT, that feeling is for later, not now.

It's entirely
possible that that set of log files exceeds available disk space, so

it

needs to be possible to run WAL replay incrementally, loading and then
replaying additional log segments after deleting old ones.
Possibly we could do this with some postmaster command-line switches.
J. R. Nield's patch embodied an "interactive recovery" backend mode,
which I didn't like in detail but the general idea is not necessarily
wrong.

Again, yes, though I will for now aim at the assumption that recovery
can be completed within available disk space, with this as an immediate
add-on when we have something that works.

That is also the basis for a "warm standby" solution: Copy the tar to a
new system (similar as you say), then repeatedly move new WAL logs
across to it, then startup in recover-only mode.

"Recover-only" mode would be initiated by a command line switch, as you
say. This would recover all of the WAL logs, then immediately shutdown
again.

The extension to that is what Oli Sennhauser has suggested, which is to
allow the second system to come up in read-only mode.

Best Regards, Simon Riggs

#50Simon Riggs
simon@2ndquadrant.com
In reply to: Tom Lane (#48)
No title

Tom Lane [mailto:tgl@sss.pgh.pa.us]
Re: [pgsql-hackers-pitr] Proposals for PITR

"Simon Riggs" <simon@2ndquadrant.com> writes:
The general PITR recovery scenario requires:
A - Take a full database backup
B - Take regular log file backups
C - Restore a full backup of a database.

Note that pg_dump has absolutely zero to do with this. The base for a
PITR recovery scenario has to be a physical dump, not a logical dump;
else WAL replay won't work.

Yes, I agree, I only included it because it was on the TODO. I'll cut it
out of the further thinking on PITR, in case it confuses the issue in
design/usage.

Regards, Simon

#51Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Simon Riggs (#50)
Re:

Simon Riggs wrote:

Tom Lane [mailto:tgl@sss.pgh.pa.us]
Re: [pgsql-hackers-pitr] Proposals for PITR

"Simon Riggs" <simon@2ndquadrant.com> writes:
The general PITR recovery scenario requires:
A - Take a full database backup
B - Take regular log file backups
C - Restore a full backup of a database.

Note that pg_dump has absolutely zero to do with this. The base for a
PITR recovery scenario has to be a physical dump, not a logical dump;
else WAL replay won't work.

Yes, I agree, I only included it because it was on the TODO. I'll cut it
out of the further thinking on PITR, in case it confuses the issue in
design/usage.

I removed the pg_dump mention on the web page. Thanks.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#52Simon Riggs
simon@2ndquadrant.com
In reply to: Tom Lane (#48)
Archival API

Tom Lane [mailto:tgl@sss.pgh.pa.us]

Simon Riggs wrote
- Write application to archive WAL files to tape, disk, or network
Probably need to do first part, but I'm arguing not to do the copy

to

tape..

I'd like to somehow see this handled by a user-supplied program or
script. What we mainly need is to define a good API that lets the
archiver program understand which WAL segment files to archive when.

B - Backing up WAL log files
-Ordinarily, when old log segment files are no longer needed, they

are

recycled (renamed to become the next segments in the numbered

sequence).

This means that the data within them must be copied from there to
another location
AFTER postgres has closed that file
BEFORE it is renamed and recycled

My inclination would be to change the backend code so that as soon as

a

WAL segment is completed, it is flagged as being ready to dump to tape
(or wherever). Possibly the easiest way to do this is to rename the
segment file somehow, perhaps "nnn" becomes "nnn.full". Then, after

the

archiver process has properly dumped the file, reflag it as being

dumped

(perhaps rename to "nnn.done"). Obviously there are any number of

ways

we could do this flagging, and depending on an OS rename facility

might

not be the best.

Yes, that would be the correct time to begin archive.

The way the code is currently written there is a slot in MoveOfflineLogs
which looks to see if XLOG_archive_dir is set before entering a section
which is empty apart from a message. That routine doesn't get called
until we're about to recycle the files, which means we've lost our
window of opportunity to archive them. Making the number of files larger
doesn't effect that being called last.... I'm going to ignore that
"hint" and any patch will include deletion of that code to avoid later
confusion.

The log switch and close occurs during XLogWrite, when it is established
that there is no more room in the current log file for the current
segment.

The file-flagging mechanism only allows a single archiver program to
operate, so I'll structure it as a new function XLogArchiveNotify() so
we can add in extra stuff later to improve/change things. That way we
have a home for the API.

A segment then can be recycled when it is both (a) older than the

latest

checkpoint and (b) flagged as dumped. Note that this approach allows
dumping of a file to start before the first time at which it could be
recycled. In the event of a crash and restart, WAL replay has to be
able to find the flagged segments, so the flagging mechanism can't be
one that would make this impossible.

The number of WAL logs is effectively tunable anyway because it depends
on the number of checkpoint segments, so we can increase that if there
are issues with archival speed v txn rate.

The rename is always safe because the log file names never wrap.

However, I'm loathe to touch the files, in case something crashes
somewhere and we are left with recovery failing because of an
unlocatable file. (To paraphrase one of the existing code comments, only
the truly paranoid survive). A similar way is to have a "buddy" file,
which indicates whether it is full and ready for archival. i.e. when we
close file "nnn" we also write an nearly/empty file called "nnn.full".
That file can then be deleted later BY THE archiver once archival has
finished, allowing it to be recycled by InstallXLogFileSegment(). (Would
require at least 6 more file descriptors, but I'm not sure if that's an
issue).

InstallXLogFileSegment() can check for XLogArchiveBusy() to see whether
it is allowed to reuse or allocate a new one. In initial implementation
this would just test to see whether "nnn.full" still exists. This will
allow a range of behaviour to be catered for, such as long waits while
manual tape mounts are requested by the archiver etc..

So in summary, the API is:

Archiver initialises and waits on notify
Postgresql initialises
...then
Postgresql fills log, switches and close it, then calls
XLogArchiveNotify()
Archiver moves log somewhere safe, then sets state such that...
...sometime later
Postgresql checks XLogArchiveBusy() to see if its safe to recycle file
and discovers the state set by

API is completely unintrusive on current tried and tested operation, and
leaves the archiver(s) free to act as they choose, outside of the
address space of PostgreSQL. That way we don't have to update regession
tests with some destructive non-manual crash tests to show that works.

Clearly, we wouldn't want WAL logs to hang around too long, so we need
an initiation method for the archival process. Otherwise, we'll be
writing "nnn.full" notifications yet without anybody ever deleting them.
Either this could be set at startup with an archive_log_mode parameter
(OK, the names been used before, but if the cap fits, wear it) or
setting a maximum limit to number of archive logs and a few other ideas,
none of which I like.

Hmmmm...any listeners got any ideas here? How do we want this to work?

Anybody want to write a more complex archiver process to act as more
than just a test harness?

Best regards,

Simon Riggs

#53Greg Stark
gsstark@mit.edu
In reply to: scott.marlowe (#37)
Re: PITR Dead horse?

"scott.marlowe" <scott.marlowe@ihs.com> writes:

but I'm not sure you can test that without power off tests...

Well the approach that's been taken manually on the list is to look at the
timing results and conclude they're just physically impossible.

Doing this automatically could be interesting. If the tool were given a
partition to act on directly it would be able to intentionally write to blocks
in reverse order doing an fsync between each block and testing whether the
bandwidth is low enough to conclude a full rotation between each write had
been completed.

Doing the same on the filesystem would be less reliable but might also be an
interesting test since the OS might make fsync lie directly, or might have
some additional intelligence in the filesystem that forces the drive to sync
to the platters before fsync returns.

--
greg