Proposed WAL changes

Started by Tom Laneabout 25 years ago44 messageshackers
Jump to latest
#1Tom Lane
tgl@sss.pgh.pa.us

I have just sent to the pgsql-patches list a rather large set of
proposed diffs for the WAL code. These changes:

* Store two past checkpoint locations, not just one, in pg_control.
On startup, we fall back to the older checkpoint if the newer one
is unreadable. Also, a physical copy of the newest checkpoint record
is kept in pg_control for possible use in disaster recovery (ie,
complete loss of pg_xlog). Also add a version number for pg_control
itself. Remove archdir from pg_control; it ought to be a GUC
parameter, not a special case (not that it's implemented yet anyway).

* Suppress successive checkpoint records when nothing has been entered
in the WAL log since the last one. This is not so much to avoid I/O
as to make it actually useful to keep track of the last two
checkpoints. If the things are right next to each other then there's
not a lot of redundancy gained...

* Change CRC scheme to a true 64-bit CRC, not a pair of 32-bit CRCs
on alternate bytes.

* Fix XLOG record length handling so that it will work at BLCKSZ = 32k.

* Change XID allocation to work more like OID allocation, so that we
can flush XID alloc info to the log before there is any chance an XID
will appear in heap files.

* Fix a number of minor bugs, such as off-by-one logic for XLOG file
wraparound at the 4 gig mark.

* Add documentation and clean up some coding infelicities; move file
format declarations out to include files where planned contrib
utilities can get at them.

Before committing this stuff, I intend to prepare a contrib utility that
can be used to reset pg_control and pg_xlog. This is mainly for
disaster recovery purposes, but as a side benefit it will allow people
to update 7.1beta installations to this new code without doing initdb.
I need to update contrib/pg_controldata, too.

regards, tom lane

#2Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Tom Lane (#1)
RE: Proposed WAL changes

-----Original Message-----
From: Tom Lane

I have just sent to the pgsql-patches list a rather large set of
proposed diffs for the WAL code. These changes:

* Store two past checkpoint locations, not just one, in pg_control.
On startup, we fall back to the older checkpoint if the newer one
is unreadable. Also, a physical copy of the newest checkpoint record
is kept in pg_control for possible use in disaster recovery (ie,
complete loss of pg_xlog). Also add a version number for pg_control
itself. Remove archdir from pg_control; it ought to be a GUC
parameter, not a special case (not that it's implemented yet anyway).

Is archdir really a GUC parameter ?

Regards,
Hiroshi Inoue

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hiroshi Inoue (#2)
Re: Proposed WAL changes

"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:

Remove archdir from pg_control; it ought to be a GUC
parameter, not a special case (not that it's implemented yet anyway).

Is archdir really a GUC parameter ?

Why shouldn't it be? I see nothing wrong with changing it on-the-fly.

regards, tom lane

#4Zeugswetter Andreas SB
ZeugswetterA@wien.spardat.at
In reply to: Tom Lane (#3)
AW: Proposed WAL changes

Remove archdir from pg_control; it ought to be a GUC
parameter, not a special case (not that it's implemented

yet anyway).

Is archdir really a GUC parameter ?

Why shouldn't it be? I see nothing wrong with changing it on-the-fly.

Yes, I think this is a good change, like all others except XID assignment :-)

Andreas

#5Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Zeugswetter Andreas SB (#4)
RE: Proposed WAL changes

-----Original Message-----
From: Zeugswetter Andreas SB

Remove archdir from pg_control; it ought to be a GUC
parameter, not a special case (not that it's implemented

yet anyway).

Is archdir really a GUC parameter ?

Why shouldn't it be? I see nothing wrong with changing it on-the-fly.

Yes, I think this is a good change, like all others except XID
assignment :-)

Could GUC parameters be changed permanently e.g. by SET command ?

For example,
1) start postmaster
2) set archdir to '....'
3) shutdown postmaster

Does PostgreSQL remember the archdir parameter ?

Regards,
Hiroshi Inoue

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hiroshi Inoue (#5)
Re: Proposed WAL changes

"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:

Could GUC parameters be changed permanently e.g. by SET command ?

That's what postgresql.conf is for ...

regards, tom lane

#7Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Hiroshi Inoue (#5)
Re: Proposed WAL changes

Tom Lane wrote:

"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:

Could GUC parameters be changed permanently e.g. by SET command ?

That's what postgresql.conf is for ...

Do I have to send SIGHUP after changing postgresql.conf ?

Regards,
Hiroshi Inoue

#8Zeugswetter Andreas SB
ZeugswetterA@wien.spardat.at
In reply to: Hiroshi Inoue (#7)
AW: Proposed WAL changes

Could GUC parameters be changed permanently e.g. by SET command ?

For example,
1) start postmaster
2) set archdir to '....'
3) shutdown postmaster

I thought the intended way to change a GUC parameter permanently was to
edit data/postgresql.conf . No ?

Andreas

#9Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Zeugswetter Andreas SB (#8)
Re: AW: Proposed WAL changes

Zeugswetter Andreas SB wrote:

Could GUC parameters be changed permanently e.g. by SET command ?

For example,
1) start postmaster
2) set archdir to '....'
3) shutdown postmaster

I thought the intended way to change a GUC parameter permanently was to
edit data/postgresql.conf . No ?

What I've thought is to implement a new command to
change archdir under WAL's control.
If it's different from Vadim's plan I don't object.

Regards,
Hiroshi Inoue

#10Mikheev, Vadim
vmikheev@SECTORBASE.COM
In reply to: Zeugswetter Andreas SB (#8)
Re: AW: Proposed WAL changes

I thought the intended way to change a GUC parameter permanently was to
edit data/postgresql.conf . No ?

What I've thought is to implement a new command to
change archdir under WAL's control.
If it's different from Vadim's plan I don't object.

Actually, I have no concrete plans for archdir yet - this
one is for WAL based BAR we should discuss in future.
So, I don't see why to remove archdir from pg_control now.

Vadim

#11Mikheev, Vadim
vmikheev@SECTORBASE.COM
In reply to: Mikheev, Vadim (#10)
Re: Proposed WAL changes

I have just sent to the pgsql-patches list a rather large set of

Please send it to me directly - pgsql-patches' archieve is dated by Feb -:(

proposed diffs for the WAL code. These changes:

* Store two past checkpoint locations, not just one, in pg_control.
On startup, we fall back to the older checkpoint if the newer one
is unreadable. Also, a physical copy of the newest checkpoint record

And what to do if older one is unreadable too?
(Isn't it like using 2 x CRC32 instead of CRC64 ? -:))
And what to do if pg_control was lost? (We already discussed that we
should read all logs from newest to oldest ones to find checkpoint).
And why to keep old log files with older checkpoint?

is kept in pg_control for possible use in disaster recovery (ie,
complete loss of pg_xlog). Also add a version number for pg_control

Mmmm, how recovery is possible if log was lost? All what could be done
with DB in the event of corrupted/lost log is dumping data from tables
*asis*, without any guarantee about consistency. How checkpoint' content
could be useful?

I feel that the fact that

WAL can't help in the event of disk errors

is often overlooked.

itself. Remove archdir from pg_control; it ought to be a GUC
parameter, not a special case (not that it's implemented yet anyway).

I would discuss WAL based BAR management before deciding how to
store/assign archdir. On the other hand it's easy to add archdir
to pg_control later -:)

* Change CRC scheme to a true 64-bit CRC, not a pair of 32-bit CRCs
on alternate bytes.

Great if you've found reliable CRC64 impl!

* Fix XLOG record length handling so that it will work at BLCKSZ = 32k.

Case I've overlooked -:(
(Though, I always considered BLCKSZ > 8K as temp hack -:))

* Change XID allocation to work more like OID allocation, so that we
can flush XID alloc info to the log before there is any chance an XID
will appear in heap files.

I didn't read you postings about this yet.

* Add documentation and clean up some coding infelicities; move file
format declarations out to include files where planned contrib
utilities can get at them.

Thanks for that!

Before committing this stuff, I intend to prepare a contrib utility that
can be used to reset pg_control and pg_xlog. This is mainly for
disaster recovery purposes, but as a side benefit it will allow people

Once again, I would call this "disaster *dump* purposes" -:)
After such operation DB shouldn't be used for anything but dump!

to update 7.1beta installations to this new code without doing initdb.
I need to update contrib/pg_controldata, too.

Vadim

#12Bruce Momjian
bruce@momjian.us
In reply to: Mikheev, Vadim (#10)
Re: AW: Proposed WAL changes

I thought the intended way to change a GUC parameter permanently was to
edit data/postgresql.conf . No ?

What I've thought is to implement a new command to
change archdir under WAL's control.
If it's different from Vadim's plan I don't object.

Actually, I have no concrete plans for archdir yet - this
one is for WAL based BAR we should discuss in future.
So, I don't see why to remove archdir from pg_control now.

Maybe we can get BAR in 7.1.X so maybe we should have the option to add
it.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hiroshi Inoue (#7)
Re: Proposed WAL changes

Hiroshi Inoue <Inoue@tpf.co.jp> writes:

Do I have to send SIGHUP after changing postgresql.conf ?

In general, yes. I think that for the (still vaporware) archdir option,
you might not need to: archdir will only be looked at by the checkpoint
subprocess, and I think that a newly spawned backend will reread
postgresql.conf anyway. Peter, is that correct?

regards, tom lane

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mikheev, Vadim (#10)
Re: AW: Proposed WAL changes

"Vadim Mikheev" <vmikheev@sectorbase.com> writes:

So, I don't see why to remove archdir from pg_control now.

I didn't like the space consumption. I think it's important that the
pg_control data fit in less than 512 bytes so that it doesn't cross
physical sectors on the disk. This reduces the odds of being left
with a corrupted pg_control due to partial write during power loss.

That's a second-order consideration, possibly, but I can see no
redeeming social advantage whatever to having archdir in pg_control
rather than in postgresql.conf where all the other system parameters
live. Unless you've got one, it's coming out...

regards, tom lane

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mikheev, Vadim (#11)
Re: Proposed WAL changes

"Vadim Mikheev" <vmikheev@sectorbase.com> writes:

I have just sent to the pgsql-patches list a rather large set of

Please send it to me directly - pgsql-patches' archieve is dated by Feb -:(

Done under separate cover.

proposed diffs for the WAL code. These changes:

* Store two past checkpoint locations, not just one, in pg_control.
On startup, we fall back to the older checkpoint if the newer one
is unreadable. Also, a physical copy of the newest checkpoint record

And what to do if older one is unreadable too?
(Isn't it like using 2 x CRC32 instead of CRC64 ? -:))

Then you lose --- but two checkpoints gives you twice the chance of
recovery (probably more, actually, since it's much more likely that
the previous checkpoint will have reached disk safely).

And what to do if pg_control was lost? (We already discussed that we
should read all logs from newest to oldest ones to find checkpoint).

If you have valid WAL files and broken pg_control, then reading the WAL
files is a way to recover. If you have valid pg_control and broken WAL
files, you have a big problem, but using pg_control to generate a new
empty WAL will at least let you get at your heap files.

And why to keep old log files with older checkpoint?

Not much point in remembering the older checkpoint location if the
associated WAL file is removed...

Mmmm, how recovery is possible if log was lost? All what could be done
with DB in the event of corrupted/lost log is dumping data from tables
*asis*, without any guarantee about consistency.

Exactly. That is still better than not being able to dump the data at
all.

* Change XID allocation to work more like OID allocation, so that we
can flush XID alloc info to the log before there is any chance an XID
will appear in heap files.

I didn't read you postings about this yet.

See later discussion --- Andreas convinced me that flushing NEXTXID
records to disk isn't really needed after all. (I didn't take the flush
out of my patch yet, but will do so.) I still want to leave the NEXTXID
records in there, though, because I think that XID and OID assignment
ought to work as nearly alike as possible.

Before committing this stuff, I intend to prepare a contrib utility that
can be used to reset pg_control and pg_xlog. This is mainly for
disaster recovery purposes, but as a side benefit it will allow people

Once again, I would call this "disaster *dump* purposes" -:)
After such operation DB shouldn't be used for anything but dump!

Fair enough. But we need it.

regards, tom lane

#16The Hermit Hacker
scrappy@hub.org
In reply to: Mikheev, Vadim (#11)
Re: Proposed WAL changes

On Wed, 7 Mar 2001, Vadim Mikheev wrote:

I have just sent to the pgsql-patches list a rather large set of

Please send it to me directly - pgsql-patches' archieve is dated by Feb -:(

Huh?

http://www.postgresql.org/mhonarc/pgsql-patches/2001-03/index.html

#17Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#13)
Re: Proposed WAL changes

Tom Lane writes:

Hiroshi Inoue <Inoue@tpf.co.jp> writes:

Do I have to send SIGHUP after changing postgresql.conf ?

In general, yes. I think that for the (still vaporware) archdir option,
you might not need to: archdir will only be looked at by the checkpoint
subprocess, and I think that a newly spawned backend will reread
postgresql.conf anyway. Peter, is that correct?

Nope. The configuration file is only read at postmaster start and after
SIGHUP. If any starting backend would read it automatically, the admin
could never be sure about his edits.

--
Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/

In reply to: Mikheev, Vadim (#11)
Re: Proposed WAL changes

"Vadim Mikheev" <vmikheev@sectorbase.com> writes:

I feel that the fact that

WAL can't help in the event of disk errors

is often overlooked.

This is true in general. But, nevertheless, WAL can be written to
protect against predictable disk errors, when possible. Failing to
write a couple of disk blocks when the system crashes is a reasonably
predictable disk error. WAL should ideally be written to work
correctly in that situation.

Ian

---------------------------(end of broadcast)---------------------------
TIP 102: An atom-blaster is a good weapon, but it can point both ways.
-- Isaac Asimov

#19Mikheev, Vadim
vmikheev@SECTORBASE.COM
In reply to: Ian Lance Taylor (#18)
RE: AW: Proposed WAL changes

What I've thought is to implement a new command to
change archdir under WAL's control.
If it's different from Vadim's plan I don't object.

Actually, I have no concrete plans for archdir yet - this
one is for WAL based BAR we should discuss in future.
So, I don't see why to remove archdir from pg_control now.

Maybe we can get BAR in 7.1.X so maybe we should have the
option to add it.

So, it's better to leave archdir in pg_control now - if we'll
decide that GUC is better place then we'll just ignore archdir
in pg_control. But if it will be better to have it in pg_control
then we'll not be able to add it there.

Vadim

#20Mikheev, Vadim
vmikheev@SECTORBASE.COM
In reply to: Mikheev, Vadim (#19)
RE: Proposed WAL changes

I have just sent to the pgsql-patches list a rather large set of

Please send it to me directly - pgsql-patches' archieve is

dated by Feb -:(

Huh?

http://www.postgresql.org/mhonarc/pgsql-patches/2001-03/index.html

But there was nothing there yesterday -:)
(Seems archive wasn't updated that time).

Vadim

#21Mikheev, Vadim
vmikheev@SECTORBASE.COM
In reply to: Mikheev, Vadim (#20)
#22Mikheev, Vadim
vmikheev@SECTORBASE.COM
In reply to: Mikheev, Vadim (#21)
#23Alex Pilosov
alex@pilosoft.com
In reply to: Mikheev, Vadim (#21)
#24Nathan Myers
ncm@zembu.com
In reply to: Tom Lane (#15)
#25Nathan Myers
ncm@zembu.com
In reply to: Mikheev, Vadim (#21)
#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mikheev, Vadim (#19)
#27Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Mikheev, Vadim (#19)
#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hiroshi Inoue (#27)
In reply to: Mikheev, Vadim (#21)
#30Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Mikheev, Vadim (#19)
#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hiroshi Inoue (#30)
#32Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Mikheev, Vadim (#19)
#33Mikheev, Vadim
vmikheev@SECTORBASE.COM
In reply to: Hiroshi Inoue (#32)
#34Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mikheev, Vadim (#33)
#35Mikheev, Vadim
vmikheev@SECTORBASE.COM
In reply to: Tom Lane (#34)
#36Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mikheev, Vadim (#35)
#37Mikheev, Vadim
vmikheev@SECTORBASE.COM
In reply to: Tom Lane (#36)
In reply to: Mikheev, Vadim (#33)
#39Mikheev, Vadim
vmikheev@SECTORBASE.COM
In reply to: Mikheev, Vadim (#33)
#40Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mikheev, Vadim (#37)
#41Mikheev, Vadim
vmikheev@SECTORBASE.COM
In reply to: Tom Lane (#40)
#42Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mikheev, Vadim (#41)
#43Mikheev, Vadim
vmikheev@SECTORBASE.COM
In reply to: Tom Lane (#42)
#44Christopher Masto
chris@netmonger.net
In reply to: Tom Lane (#1)