WAL & RC1 status
I am *not* feeling good about pushing out an RC1 release candidate
today.
I've been going through the WAL code, trying to understand it and
document it. I've found a number of minor problems and several major
ones ("major" meaning "can't really fix without an incompatible file
format change, hence initdb"). I've reported the major problems to
the mailing lists but gotten almost no feedback about what to do.
In addition, I'm still looking for the bug that I originally went in to
find: Scott Parish's report of being unable to restart after a normal
shutdown of beta4. Examination of his WAL log shows some pretty serious
lossage (see attached dump). My current theory is that the
buffer-slinging logic in xlog.c dropped one or more whole buffers' worth
of log records, but I haven't figured out exactly how.
I want to veto putting out an RC1 until these issues are resolved...
comments?
regards, tom lane
...
0/00599890: prv 0/00599854; xprv 0/00599854; xid 18871; RM 10 info 00 len 65
0/005998F4: prv 0/00599890; xprv 0/00599890; xid 18871; RM 11 info 90 len 50
0/00599948: prv 0/005998F4; xprv 0/005998F4; xid 18871; RM 1 info 00 len 4
commit: 2001-02-26 17:19:57
0/0059996C: prv 0/00599948; xprv 0/00000000; xid 0; RM 0 info 00 len 32
checkpoint: redo 0/0059996C; undo 0/00000000; sui 29; nextxid 18903; nextoid 35195; online
-- this is the last normal-looking checkpoint record. Judging from the
-- commit timestamps surrounding prior checkpoints, checkpoints were
-- happening every five minutes approximately on the 5-minute mark, so
-- this one happened about 17:20. (There really should be a timestamp
-- in the checkpoint records...)
0/005999AC: prv 0/0059996C; xprv 0/00000000; xid 18923; RM 10 info 08 len 8226; bkpb 1
0/0059B9FC: prv 0/005999AC; xprv 0/005999AC; xid 18923; RM 11 info 98 len 8226; bkpb 1
0/0059DA4C: prv 0/0059B9FC; xprv 0/0059B9FC; xid 18923; RM 10 info 00 len 72
0/0059DAB4: prv 0/0059DA4C; xprv 0/0059DA4C; xid 18923; RM 11 info 90 len 26
0/0059DAF0: prv 0/0059DAB4; xprv 0/0059DAB4; xid 18923; RM 10 info 00 len 72
0/0059DB58: prv 0/0059DAF0; xprv 0/0059DAF0; xid 18923; RM 11 info 90 len 26
0/0059DB94: prv 0/0059DB58; xprv 0/0059DB58; xid 18923; RM 10 info 00 len 72
0/0059DBFC: prv 0/0059DB94; xprv 0/0059DB94; xid 18923; RM 11 info 90 len 26
0/0059DC38: prv 0/0059DBFC; xprv 0/0059DBFC; xid 18923; RM 10 info 08 len 8226; bkpb 1
0/0059FC88: prv 0/0059DC38; xprv 0/0059DC38; xid 18923; RM 11 info 98 len 8226; bkpb 1
0/005A1CD8: prv 0/0059FC88; xprv 0/0059FC88; xid 18923; RM 1 info 00 len 4
commit: 2001-02-26 17:21:10
0/005A1CFC: prv 0/005A1CD8; xprv 0/00000000; xid 18951; RM 15 info 00 len 100
0/005A1D80: prv 0/005A1CFC; xprv 0/005A1CFC; xid 18951; RM 10 info 00 len 72
0/005A1DE8: prv 0/005A1D80; xprv 0/005A1D80; xid 18951; RM 11 info 90 len 26
0/005A1E24: prv 0/005A1DE8; xprv 0/005A1DE8; xid 18951; RM 10 info 00 len 72
0/005A1E8C: prv 0/005A1E24; xprv 0/005A1E24; xid 18951; RM 11 info 90 len 26
0/005A1EC8: prv 0/005A1E8C; xprv 0/005A1E8C; xid 18951; RM 10 info 00 len 72
0/005A1F30: prv 0/005A1EC8; xprv 0/005A1EC8; xid 18951; RM 11 info 90 len 26
0/005A1F6C: prv 0/005A1F30; xprv 0/005A1F30; xid 18951; RM 10 info 00 len 72
0/005A1FD4: prv 0/005A1F6C; xprv 0/005A1F6C; xid 18951; RM 11 info 90 len 26
0/005A201C: prv 0/005A1FD4; xprv 0/005A1FD4; xid 18951; RM 10 info 00 len 65
0/005A2080: prv 0/005A201C; xprv 0/005A201C; xid 18951; RM 11 info 98 len 8226; bkpb 1
0/005A40D0: prv 0/005A2080; xprv 0/005A2080; xid 18951; RM 1 info 00 len 4
commit: 2001-02-26 17:21:33
0/005A40F4: prv 0/005A40D0; xprv 0/00000000; xid 18986; RM 10 info 00 len 72
0/005A415C: prv 0/005A40F4; xprv 0/005A40F4; xid 18986; RM 11 info 90 len 26
0/005A4198: prv 0/005A415C; xprv 0/005A415C; xid 18986; RM 10 info 00 len 72
0/005A4200: prv 0/005A4198; xprv 0/005A4198; xid 18986; RM 11 info 90 len 26
0/005A423C: prv 0/005A4200; xprv 0/005A4200; xid 18986; RM 10 info 00 len 72
0/005A42A4: prv 0/005A423C; xprv 0/005A423C; xid 18986; RM 11 info 90 len 26
0/005A42E0: prv 0/005A42A4; xprv 0/005A42A4; xid 18986; RM 10 info 00 len 72
0/005A4348: prv 0/005A42E0; xprv 0/005A42E0; xid 18986; RM 11 info 90 len 26
0/005A4384: prv 0/005A4348; xprv 0/005A4348; xid 18986; RM 10 info 00 len 65
0/005A43E8: prv 0/005A4384; xprv 0/005A4384; xid 18986; RM 11 info 90 len 50
0/005A443C: prv 0/005A43E8; xprv 0/005A43E8; xid 18986; RM 1 info 00 len 4
commit: 2001-02-26 17:22:20
0/005A4460: prv 0/005A443C; xprv 0/00000000; xid 19020; RM 10 info 00 len 72
0/005A44C8: prv 0/005A4460; xprv 0/005A4460; xid 19020; RM 11 info 90 len 26
0/005A4504: prv 0/005A44C8; xprv 0/005A44C8; xid 19020; RM 10 info 00 len 72
0/005A456C: prv 0/005A4504; xprv 0/005A4504; xid 19020; RM 11 info 90 len 26
0/005A45A8: prv 0/005A456C; xprv 0/005A456C; xid 19020; RM 10 info 00 len 72
0/005A4610: prv 0/005A45A8; xprv 0/005A45A8; xid 19020; RM 11 info 90 len 26
0/005A464C: prv 0/005A4610; xprv 0/005A4610; xid 19020; RM 10 info 00 len 72
0/005A46B4: prv 0/005A464C; xprv 0/005A464C; xid 19020; RM 11 info 90 len 26
0/005A46F0: prv 0/005A46B4; xprv 0/005A46B4; xid 19020; RM 10 info 00 len 65
0/005A4754: prv 0/005A46F0; xprv 0/005A46F0; xid 19020; RM 11 info 90 len 50
0/005A47A8: prv 0/005A4754; xprv 0/005A4754; xid 19020; RM 1 info 00 len 4
commit: 2001-02-26 17:24:34
0/005A47CC: prv 0/005A47A8; xprv 0/00000000; xid 19115; RM 10 info 00 len 76
0/005A4838: prv 0/005A47CC; xprv 0/005A47CC; xid 19115; RM 11 info 90 len 26
0/005A4874: prv 0/005A4838; xprv 0/005A4838; xid 19115; RM 10 info 00 len 80
0/005A48E4: prv 0/005A4874; xprv 0/005A4874; xid 19115; RM 11 info 90 len 26
0/005A4920: prv 0/005A48E4; xprv 0/005A48E4; xid 19115; RM 10 info 00 len 76
0/005A498C: prv 0/005A4920; xprv 0/005A4920; xid 19115; RM 11 info 90 len 26
0/005A49C8: prv 0/005A498C; xprv 0/005A498C; xid 19115; RM 10 info 00 len 76
0/005A4A34: prv 0/005A49C8; xprv 0/005A49C8; xid 19115; RM 11 info 90 len 26
0/005A4A70: prv 0/005A4A34; xprv 0/005A4A34; xid 19115; RM 10 info 00 len 65
0/005A4AD4: prv 0/005A4A70; xprv 0/005A4A70; xid 19115; RM 11 info 90 len 50
0/005A4B28: prv 0/005A4AD4; xprv 0/005A4AD4; xid 19115; RM 1 info 00 len 4
commit: 2001-02-26 17:26:02
ReadRecord: record with zero len at 0/005A4B4C
-- My dump program is unhappy here because the rest of the page is zero.
-- Given that there is a continuation record at the start of the next
-- page, there certainly should have been record(s) here. But it's
-- worse than that: check the commit timestamps and the xid numbers
-- before and after the discontinuity. Did time go backwards here?
-- Also notice the back-pointers in the first valid record on the next
-- page; they point not into the zeroed space, which would suggest a
-- mere failure to write a buffer after filling it, but into the middle
-- of one of the valid records on the prior page. It almost looks like
-- page 5A6000 came from a completely different run than page 5A4000.
Unexpected page info flags 0001 at offset 5A6000
Skipping unexpected continuation record at offset 5A6000
0/005A6904: prv 0/005A48B4(?); xprv 0/005A48B4; xid 19047; RM 11 info 98 len 8226; bkpb 1
0/005A8954: prv 0/005A6904; xprv 0/005A6904; xid 19047; RM 10 info 00 len 72
0/005A89BC: prv 0/005A8954; xprv 0/005A8954; xid 19047; RM 11 info 90 len 26
0/005A89F8: prv 0/005A89BC; xprv 0/005A89BC; xid 19047; RM 10 info 00 len 72
0/005A8A60: prv 0/005A89F8; xprv 0/005A89F8; xid 19047; RM 11 info 90 len 26
0/005A8A9C: prv 0/005A8A60; xprv 0/005A8A60; xid 19047; RM 10 info 00 len 72
0/005A8B04: prv 0/005A8A9C; xprv 0/005A8A9C; xid 19047; RM 11 info 90 len 26
0/005A8B40: prv 0/005A8B04; xprv 0/005A8B04; xid 19047; RM 10 info 08 len 8226; bkpb 1
0/005AAB90: prv 0/005A8B40; xprv 0/005A8B40; xid 19047; RM 11 info 98 len 8226; bkpb 1
0/005ACBE0: prv 0/005AAB90; xprv 0/005AAB90; xid 19047; RM 1 info 00 len 4
commit: 2001-02-26 17:25:38
0/005ACC04: prv 0/005ACBE0; xprv 0/00000000; xid 19088; RM 10 info 00 len 72
0/005ACC6C: prv 0/005ACC04; xprv 0/005ACC04; xid 19088; RM 11 info 90 len 26
0/005ACCA8: prv 0/005ACC6C; xprv 0/005ACC6C; xid 19088; RM 10 info 00 len 72
0/005ACD10: prv 0/005ACCA8; xprv 0/005ACCA8; xid 19088; RM 11 info 90 len 26
0/005ACD4C: prv 0/005ACD10; xprv 0/005ACD10; xid 19088; RM 10 info 00 len 72
0/005ACDB4: prv 0/005ACD4C; xprv 0/005ACD4C; xid 19088; RM 11 info 90 len 26
0/005ACDF0: prv 0/005ACDB4; xprv 0/005ACDB4; xid 19088; RM 10 info 00 len 72
0/005ACE58: prv 0/005ACDF0; xprv 0/005ACDF0; xid 19088; RM 11 info 90 len 26
0/005ACE94: prv 0/005ACE58; xprv 0/005ACE58; xid 19088; RM 10 info 00 len 65
0/005ACEF8: prv 0/005ACE94; xprv 0/005ACE94; xid 19088; RM 11 info 90 len 50
0/005ACF4C: prv 0/005ACEF8; xprv 0/005ACEF8; xid 19088; RM 1 info 00 len 4
commit: 2001-02-26 17:26:43
0/005ACF70: prv 0/005ACF4C; xprv 0/00000000; xid 19109; RM 10 info 00 len 72
0/005ACFD8: prv 0/005ACF70; xprv 0/005ACF70; xid 19109; RM 11 info 90 len 26
0/005AD014: prv 0/005ACFD8; xprv 0/005ACFD8; xid 19109; RM 10 info 00 len 72
0/005AD07C: prv 0/005AD014; xprv 0/005AD014; xid 19109; RM 11 info 90 len 26
0/005AD0B8: prv 0/005AD07C; xprv 0/005AD07C; xid 19109; RM 10 info 00 len 72
0/005AD120: prv 0/005AD0B8; xprv 0/005AD0B8; xid 19109; RM 11 info 90 len 26
0/005AD15C: prv 0/005AD120; xprv 0/005AD120; xid 19109; RM 10 info 00 len 72
0/005AD1C4: prv 0/005AD15C; xprv 0/005AD15C; xid 19109; RM 11 info 90 len 26
0/005AD200: prv 0/005AD1C4; xprv 0/005AD1C4; xid 19109; RM 10 info 00 len 65
0/005AD264: prv 0/005AD200; xprv 0/005AD200; xid 19109; RM 11 info 98 len 8226; bkpb 1
0/005AF2B4: prv 0/005AD264; xprv 0/005AD264; xid 19109; RM 1 info 00 len 4
commit: 2001-02-26 17:26:59
0/005AF2D8: prv 0/005AF2B4; xprv 0/00000000; xid 19224; RM 10 info 00 len 72
0/005AF340: prv 0/005AF2D8; xprv 0/005AF2D8; xid 19224; RM 11 info 90 len 26
0/005AF37C: prv 0/005AF340; xprv 0/005AF340; xid 19224; RM 10 info 00 len 72
0/005AF3E4: prv 0/005AF37C; xprv 0/005AF37C; xid 19224; RM 11 info 90 len 26
0/005AF420: prv 0/005AF3E4; xprv 0/005AF3E4; xid 19224; RM 10 info 00 len 72
0/005AF488: prv 0/005AF420; xprv 0/005AF420; xid 19224; RM 11 info 90 len 26
0/005AF4C4: prv 0/005AF488; xprv 0/005AF488; xid 19224; RM 10 info 00 len 72
0/005AF52C: prv 0/005AF4C4; xprv 0/005AF4C4; xid 19224; RM 11 info 90 len 26
0/005AF568: prv 0/005AF52C; xprv 0/005AF52C; xid 19224; RM 10 info 00 len 65
0/005AF5CC: prv 0/005AF568; xprv 0/005AF568; xid 19224; RM 11 info 90 len 50
0/005AF620: prv 0/005AF5CC; xprv 0/005AF5CC; xid 19224; RM 1 info 00 len 4
commit: 2001-02-26 17:28:39
0/005AF644: prv 0/005AF620; xprv 0/00000000; xid 19229; RM 10 info 00 len 72
0/005AF6AC: prv 0/005AF644; xprv 0/005AF644; xid 19229; RM 11 info 90 len 26
0/005AF6E8: prv 0/005AF6AC; xprv 0/005AF6AC; xid 19229; RM 10 info 00 len 72
0/005AF750: prv 0/005AF6E8; xprv 0/005AF6E8; xid 19229; RM 11 info 90 len 26
0/005AF78C: prv 0/005AF750; xprv 0/005AF750; xid 19229; RM 10 info 00 len 72
0/005AF7F4: prv 0/005AF78C; xprv 0/005AF78C; xid 19229; RM 11 info 90 len 26
0/005AF830: prv 0/005AF7F4; xprv 0/005AF7F4; xid 19229; RM 10 info 00 len 72
0/005AF898: prv 0/005AF830; xprv 0/005AF830; xid 19229; RM 11 info 90 len 26
0/005AF8D4: prv 0/005AF898; xprv 0/005AF898; xid 19229; RM 10 info 00 len 65
0/005AF938: prv 0/005AF8D4; xprv 0/005AF8D4; xid 19229; RM 11 info 90 len 50
0/005AF98C: prv 0/005AF938; xprv 0/005AF938; xid 19229; RM 1 info 00 len 4
commit: 2001-02-26 17:28:50
0/005AF9B0: prv 0/005AF98C; xprv 0/00000000; xid 0; RM 0 info 00 len 32
checkpoint: redo 0/005AF9B0; undo 0/00000000; sui 30; nextxid 19243; nextoid 43387; online
-- This is the only checkpoint record present in the log after the
-- normal-looking one at 17:20. There should have been checkpoints
-- at 17:25, 17:30, 17:35, 17:40, 17:45, not to mention one from the
-- eventual shutdown which seems to have been done around 17:49.
-- From the surrounding timestamps this one must be either 17:30 or 17:35.
-- What's even nastier (and the immediate cause of Scott's inability to
-- restart) is that the pg_control file's checkPoint pointer points to
-- 0/005AF9F0, which is *not* the location of this checkpoint, but of
-- the record after it.
-- Is that meaningful, or just random coincidence? Can't tell yet.
-- Oh BTW, the timestamp in the pg_control file is 2001-02-26 17:34:09,
-- which does not correspond to any scheduled checkpoint.
0/005AF9F0: prv 0/005AF9B0; xprv 0/00000000; xid 19444; RM 10 info 08 len 8226; bkpb 1
0/005B1A40: prv 0/005AF9F0; xprv 0/005AF9F0; xid 19444; RM 11 info 98 len 8226; bkpb 1
0/005B3A90: prv 0/005B1A40; xprv 0/005B1A40; xid 19444; RM 10 info 00 len 80
0/005B3B00: prv 0/005B3A90; xprv 0/005B3A90; xid 19444; RM 11 info 90 len 26
0/005B3B3C: prv 0/005B3B00; xprv 0/005B3B00; xid 19444; RM 10 info 00 len 72
0/005B3BA4: prv 0/005B3B3C; xprv 0/005B3B3C; xid 19444; RM 11 info 90 len 26
0/005B3BE0: prv 0/005B3BA4; xprv 0/005B3BA4; xid 19444; RM 10 info 00 len 72
0/005B3C48: prv 0/005B3BE0; xprv 0/005B3BE0; xid 19444; RM 11 info 90 len 26
0/005B3C84: prv 0/005B3C48; xprv 0/005B3C48; xid 19444; RM 10 info 08 len 8226; bkpb 1
0/005B5CD4: prv 0/005B3C84; xprv 0/005B3C84; xid 19444; RM 11 info 98 len 8226; bkpb 1
0/005B7D24: prv 0/005B5CD4; xprv 0/005B5CD4; xid 19444; RM 1 info 00 len 4
commit: 2001-02-26 17:35:13
0/005B7D48: prv 0/005B7D24; xprv 0/00000000; xid 19495; RM 10 info 00 len 72
0/005B7DB0: prv 0/005B7D48; xprv 0/005B7D48; xid 19495; RM 11 info 90 len 26
0/005B7DEC: prv 0/005B7DB0; xprv 0/005B7DB0; xid 19495; RM 10 info 00 len 72
0/005B7E54: prv 0/005B7DEC; xprv 0/005B7DEC; xid 19495; RM 11 info 90 len 26
0/005B7E90: prv 0/005B7E54; xprv 0/005B7E54; xid 19495; RM 10 info 00 len 72
0/005B7EF8: prv 0/005B7E90; xprv 0/005B7E90; xid 19495; RM 11 info 90 len 26
0/005B7F34: prv 0/005B7EF8; xprv 0/005B7EF8; xid 19495; RM 10 info 00 len 72
0/005B7F9C: prv 0/005B7F34; xprv 0/005B7F34; xid 19495; RM 11 info 90 len 26
0/005B7FD8: prv 0/005B7F9C; xprv 0/005B7F9C; xid 19495; RM 10 info 00 len 69
0/005B804C: prv 0/005B7FD8; xprv 0/005B7FD8; xid 19495; RM 11 info 98 len 8226; bkpb 1
0/005BA09C: prv 0/005B804C; xprv 0/005B804C; xid 19495; RM 1 info 00 len 4
commit: 2001-02-26 17:36:32
0/005BA0C0: prv 0/005BA09C; xprv 0/00000000; xid 19527; RM 10 info 00 len 72
0/005BA128: prv 0/005BA0C0; xprv 0/005BA0C0; xid 19527; RM 11 info 90 len 26
0/005BA164: prv 0/005BA128; xprv 0/005BA128; xid 19527; RM 10 info 00 len 76
0/005BA1D0: prv 0/005BA164; xprv 0/005BA164; xid 19527; RM 11 info 90 len 26
0/005BA20C: prv 0/005BA1D0; xprv 0/005BA1D0; xid 19527; RM 10 info 00 len 72
0/005BA274: prv 0/005BA20C; xprv 0/005BA20C; xid 19527; RM 11 info 90 len 26
0/005BA2B0: prv 0/005BA274; xprv 0/005BA274; xid 19527; RM 10 info 00 len 72
0/005BA318: prv 0/005BA2B0; xprv 0/005BA2B0; xid 19527; RM 11 info 90 len 26
0/005BA354: prv 0/005BA318; xprv 0/005BA318; xid 19527; RM 10 info 00 len 65
0/005BA3B8: prv 0/005BA354; xprv 0/005BA354; xid 19527; RM 11 info 90 len 50
0/005BA40C: prv 0/005BA3B8; xprv 0/005BA3B8; xid 19527; RM 1 info 00 len 4
commit: 2001-02-26 17:37:59
0/005BA430: prv 0/005BA40C; xprv 0/00000000; xid 19540; RM 10 info 00 len 72
0/005BA498: prv 0/005BA430; xprv 0/005BA430; xid 19540; RM 11 info 90 len 26
0/005BA4D4: prv 0/005BA498; xprv 0/00000000; xid 19540; RM 15 info 00 len 100
0/005BA558: prv 0/005BA4D4; xprv 0/005BA4D4; xid 19540; RM 10 info 00 len 76
0/005BA5C4: prv 0/005BA558; xprv 0/005BA558; xid 19540; RM 11 info 90 len 26
0/005BA600: prv 0/005BA5C4; xprv 0/005BA5C4; xid 19540; RM 10 info 00 len 72
0/005BA668: prv 0/005BA600; xprv 0/005BA600; xid 19540; RM 11 info 90 len 26
0/005BA6A4: prv 0/005BA668; xprv 0/005BA668; xid 19540; RM 10 info 00 len 72
0/005BA70C: prv 0/005BA6A4; xprv 0/005BA6A4; xid 19540; RM 11 info 90 len 26
0/005BA748: prv 0/005BA70C; xprv 0/005BA70C; xid 19540; RM 10 info 00 len 65
0/005BA7AC: prv 0/005BA748; xprv 0/005BA748; xid 19540; RM 11 info 90 len 50
0/005BA800: prv 0/005BA7AC; xprv 0/005BA7AC; xid 19540; RM 1 info 00 len 4
commit: 2001-02-26 17:39:03
0/005BA824: prv 0/005BA800; xprv 0/00000000; xid 19605; RM 10 info 00 len 72
0/005BA88C: prv 0/005BA824; xprv 0/005BA824; xid 19605; RM 11 info 90 len 26
0/005BA8C8: prv 0/005BA88C; xprv 0/005BA88C; xid 19605; RM 10 info 00 len 72
0/005BA930: prv 0/005BA8C8; xprv 0/005BA8C8; xid 19605; RM 11 info 90 len 26
0/005BA96C: prv 0/005BA930; xprv 0/005BA930; xid 19605; RM 10 info 00 len 72
0/005BA9D4: prv 0/005BA96C; xprv 0/005BA96C; xid 19605; RM 11 info 90 len 26
0/005BAA10: prv 0/005BA9D4; xprv 0/005BA9D4; xid 19605; RM 10 info 00 len 72
0/005BAA78: prv 0/005BAA10; xprv 0/005BAA10; xid 19605; RM 11 info 90 len 26
0/005BAAB4: prv 0/005BAA78; xprv 0/005BAA78; xid 19605; RM 10 info 00 len 65
0/005BAB18: prv 0/005BAAB4; xprv 0/005BAAB4; xid 19605; RM 11 info 90 len 50
0/005BAB6C: prv 0/005BAB18; xprv 0/005BAB18; xid 19605; RM 1 info 00 len 4
commit: 2001-02-26 17:41:09
0/005BAB90: prv 0/005BAB6C; xprv 0/00000000; xid 19610; RM 10 info 00 len 72
0/005BABF8: prv 0/005BAB90; xprv 0/005BAB90; xid 19610; RM 11 info 90 len 26
0/005BAC34: prv 0/005BABF8; xprv 0/005BABF8; xid 19610; RM 10 info 00 len 72
0/005BAC9C: prv 0/005BAC34; xprv 0/005BAC34; xid 19610; RM 11 info 90 len 26
0/005BACD8: prv 0/005BAC9C; xprv 0/005BAC9C; xid 19610; RM 10 info 00 len 72
0/005BAD40: prv 0/005BACD8; xprv 0/005BACD8; xid 19610; RM 11 info 90 len 26
0/005BAD7C: prv 0/005BAD40; xprv 0/005BAD40; xid 19610; RM 10 info 00 len 72
0/005BADE4: prv 0/005BAD7C; xprv 0/005BAD7C; xid 19610; RM 11 info 90 len 26
0/005BAE20: prv 0/005BADE4; xprv 0/005BADE4; xid 19610; RM 10 info 00 len 65
0/005BAE84: prv 0/005BAE20; xprv 0/005BAE20; xid 19610; RM 11 info 90 len 50
0/005BAED8: prv 0/005BAE84; xprv 0/005BAE84; xid 19610; RM 1 info 00 len 4
commit: 2001-02-26 17:41:11
0/005BAEFC: prv 0/005BAED8; xprv 0/00000000; xid 19718; RM 10 info 00 len 72
0/005BAF64: prv 0/005BAEFC; xprv 0/005BAEFC; xid 19718; RM 11 info 90 len 26
0/005BAFA0: prv 0/005BAF64; xprv 0/005BAF64; xid 19718; RM 10 info 00 len 72
0/005BB008: prv 0/005BAFA0; xprv 0/005BAFA0; xid 19718; RM 11 info 90 len 26
0/005BB044: prv 0/005BB008; xprv 0/005BB008; xid 19718; RM 10 info 00 len 72
0/005BB0AC: prv 0/005BB044; xprv 0/005BB044; xid 19718; RM 11 info 90 len 26
0/005BB0E8: prv 0/005BB0AC; xprv 0/005BB0AC; xid 19718; RM 10 info 00 len 72
0/005BB150: prv 0/005BB0E8; xprv 0/005BB0E8; xid 19718; RM 11 info 90 len 26
0/005BB18C: prv 0/005BB150; xprv 0/005BB150; xid 19718; RM 10 info 00 len 65
0/005BB1F0: prv 0/005BB18C; xprv 0/005BB18C; xid 19718; RM 11 info 90 len 50
0/005BB244: prv 0/005BB1F0; xprv 0/005BB1F0; xid 19718; RM 1 info 00 len 4
commit: 2001-02-26 17:44:57
0/005BB268: prv 0/005BB244; xprv 0/00000000; xid 19775; RM 10 info 00 len 72
0/005BB2D0: prv 0/005BB268; xprv 0/005BB268; xid 19775; RM 11 info 90 len 26
0/005BB30C: prv 0/005BB2D0; xprv 0/005BB2D0; xid 19775; RM 10 info 00 len 72
0/005BB374: prv 0/005BB30C; xprv 0/005BB30C; xid 19775; RM 11 info 90 len 26
0/005BB3B0: prv 0/005BB374; xprv 0/005BB374; xid 19775; RM 10 info 00 len 72
0/005BB418: prv 0/005BB3B0; xprv 0/005BB3B0; xid 19775; RM 11 info 90 len 26
0/005BB454: prv 0/005BB418; xprv 0/005BB418; xid 19775; RM 10 info 00 len 72
0/005BB4BC: prv 0/005BB454; xprv 0/005BB454; xid 19775; RM 11 info 90 len 26
0/005BB4F8: prv 0/005BB4BC; xprv 0/005BB4BC; xid 19775; RM 10 info 00 len 65
0/005BB55C: prv 0/005BB4F8; xprv 0/005BB4F8; xid 19775; RM 11 info 90 len 50
0/005BB5B0: prv 0/005BB55C; xprv 0/005BB55C; xid 19775; RM 1 info 00 len 4
commit: 2001-02-26 17:47:38
0/005BB5D4: prv 0/005BB5B0; xprv 0/00000000; xid 19827; RM 10 info 00 len 72
0/005BB63C: prv 0/005BB5D4; xprv 0/005BB5D4; xid 19827; RM 11 info 90 len 26
0/005BB678: prv 0/005BB63C; xprv 0/005BB63C; xid 19827; RM 10 info 00 len 72
0/005BB6E0: prv 0/005BB678; xprv 0/005BB678; xid 19827; RM 11 info 90 len 26
0/005BB71C: prv 0/005BB6E0; xprv 0/005BB6E0; xid 19827; RM 10 info 00 len 72
0/005BB784: prv 0/005BB71C; xprv 0/005BB71C; xid 19827; RM 11 info 90 len 26
0/005BB7C0: prv 0/005BB784; xprv 0/005BB784; xid 19827; RM 10 info 00 len 72
0/005BB828: prv 0/005BB7C0; xprv 0/005BB7C0; xid 19827; RM 11 info 90 len 26
0/005BB864: prv 0/005BB828; xprv 0/005BB828; xid 19827; RM 10 info 00 len 65
0/005BB8C8: prv 0/005BB864; xprv 0/005BB864; xid 19827; RM 11 info 90 len 50
0/005BB91C: prv 0/005BB8C8; xprv 0/005BB8C8; xid 19827; RM 1 info 00 len 4
commit: 2001-02-26 17:49:00
0/005BB940: prv 0/005BB91C; xprv 0/00000000; xid 19832; RM 10 info 00 len 72
0/005BB9A8: prv 0/005BB940; xprv 0/005BB940; xid 19832; RM 11 info 90 len 26
0/005BB9E4: prv 0/005BB9A8; xprv 0/005BB9A8; xid 19832; RM 10 info 00 len 72
0/005BBA4C: prv 0/005BB9E4; xprv 0/005BB9E4; xid 19832; RM 11 info 90 len 26
0/005BBA88: prv 0/005BBA4C; xprv 0/005BBA4C; xid 19832; RM 10 info 00 len 72
0/005BBAF0: prv 0/005BBA88; xprv 0/005BBA88; xid 19832; RM 11 info 90 len 26
0/005BBB2C: prv 0/005BBAF0; xprv 0/005BBAF0; xid 19832; RM 10 info 00 len 72
0/005BBB94: prv 0/005BBB2C; xprv 0/005BBB2C; xid 19832; RM 11 info 90 len 26
0/005BBBD0: prv 0/005BBB94; xprv 0/005BBB94; xid 19832; RM 10 info 00 len 65
0/005BBC34: prv 0/005BBBD0; xprv 0/005BBBD0; xid 19832; RM 11 info 90 len 50
0/005BBC88: prv 0/005BBC34; xprv 0/005BBC34; xid 19832; RM 1 info 00 len 4
commit: 2001-02-26 17:49:06
ReadRecord: record with zero len at 0/005BBCAC
-- this is where the log actually ends --- zeroes from here out.
I am *not* feeling good about pushing out an RC1 release candidate
today.I've been going through the WAL code, trying to understand it and
document it. I've found a number of minor problems and several major
ones ("major" meaning "can't really fix without an incompatible file
format change, hence initdb"). I've reported the major problems to
the mailing lists but gotten almost no feedback about what to do.In addition, I'm still looking for the bug that I originally went in to
find: Scott Parish's report of being unable to restart after a normal
shutdown of beta4. Examination of his WAL log shows some pretty serious
lossage (see attached dump). My current theory is that the
buffer-slinging logic in xlog.c dropped one or more whole buffers' worth
of log records, but I haven't figured out exactly how.I want to veto putting out an RC1 until these issues are resolved...
comments?
I was not sure how to respond. Requiring an initdb at this stage seems
like it could be a pretty major blow to beta testers. However, if we
will have 7.1 problems with WAL that can not be fixed without a file
format change, we will have problems down the road. Is there a version
number in the WAL file? Can we put conditional code in there to create
new log file records with an updated format?
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Is there a version number in the WAL file?
catversion.h will do fine, no?
Can we put conditional code in there to create
new log file records with an updated format?
The WAL stuff is *far* too complex already. I've spent a week studying
it and I only partially understand it. I will not consent to trying to
support multiple log file formats concurrently.
regards, tom lane
On Fri, 2 Mar 2001, Tom Lane wrote:
I am *not* feeling good about pushing out an RC1 release candidate
today.I've been going through the WAL code, trying to understand it and
document it. I've found a number of minor problems and several major
ones ("major" meaning "can't really fix without an incompatible file
format change, hence initdb"). I've reported the major problems to
the mailing lists but gotten almost no feedback about what to do.In addition, I'm still looking for the bug that I originally went in to
find: Scott Parish's report of being unable to restart after a normal
shutdown of beta4. Examination of his WAL log shows some pretty serious
lossage (see attached dump). My current theory is that the
buffer-slinging logic in xlog.c dropped one or more whole buffers' worth
of log records, but I haven't figured out exactly how.I want to veto putting out an RC1 until these issues are resolved...
comments?
Will second it ... Vadim is supposed to be back on the 6th, and Peter has
a couple of changes to configure he wants to do this weekend for the JDBC
stuff ... Thomas and I are in SF the end of next week for some meetings,
so if you can pop off a summary of what you've found to either of us, and
assuming that Vadim doesn't get caught up by then, we can bring them up
"in person" at that time ... ?
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Is there a version number in the WAL file?
catversion.h will do fine, no?
Can we put conditional code in there to create
new log file records with an updated format?The WAL stuff is *far* too complex already. I've spent a week studying
it and I only partially understand it. I will not consent to trying to
support multiple log file formats concurrently.
Well, I was thinking a few things. Right now, if we update the
catversion.h, we will require a dump/reload. If we can update just the
WAL version stamp, that will allow us to fix WAL format problems without
requiring people to dump/reload. I can imagine this would be valuable
if we find we need to make changes in 7.1.1, where we can not require
dump/reload.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Well, I was thinking a few things. Right now, if we update the
catversion.h, we will require a dump/reload. If we can update just the
WAL version stamp, that will allow us to fix WAL format problems without
requiring people to dump/reload.
Since there is not a separate WAL version stamp, introducing one now
would certainly force an initdb. I don't mind adding one if you think
it's useful; another 4 bytes in pg_control won't hurt anything. But
it's not going to save anyone's bacon on this cycle.
At least one of my concerns (single point of failure) would require a
change to the layout of pg_control, which would force initdb anyway.
Anyone want to propose a third version# for pg_control?
regards, tom lane
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Well, I was thinking a few things. Right now, if we update the
catversion.h, we will require a dump/reload. If we can update just the
WAL version stamp, that will allow us to fix WAL format problems without
requiring people to dump/reload.Since there is not a separate WAL version stamp, introducing one now
would certainly force an initdb. I don't mind adding one if you think
it's useful; another 4 bytes in pg_control won't hurt anything. But
it's not going to save anyone's bacon on this cycle.
Having a version number of binary files has saved me many times because
I can add a little 'if' to allow upward binary compatibility without
breaking old binary files. I think we should have one.
I see our btree files, but I don't see one in heap. I am going to
recommend that for 7.2. All our files should have versions just in case
we ever need it. Some day, we may be able to skip dump/reload for major
versions.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
I've been going through the WAL code, trying to understand it and
document it. I've found a number of minor problems and several major
ones ("major" meaning "can't really fix without an incompatible file
format change, hence initdb"). I've reported the major problems to
the mailing lists but gotten almost no feedback about what to do.
Sorry for the "no feedback", but I've assumed that this will be more
productively discussed with Vadim in the loop. I don't disagree with
your observations, but of course that is from a position of happy
ignorance :)
... I want to veto putting out an RC1 until these issues are resolved...
comments?
OK with me.
- Thomas
From: Bruce Momjian [SMTP:pgman@candle.pha.pa.us]
Sent: Friday, March 02, 2001 9:54 AM
To: Tom Lane
Cc: pgsql-core@postgresql.org; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] WAL & RC1 statusBruce Momjian <pgman@candle.pha.pa.us> writes:
Is there a version number in the WAL file?
catversion.h will do fine, no?
Can we put conditional code in there to create
new log file records with an updated format?
While it may be unfortunate to have to do an initdb at this point in
the beta cycle, it is a beta and that is part of the deal. Postgre has the
reputation of being the highest quality opensource database and we should do
nothing to tarnish that. Release it when it's ready and not before.
Import Notes
Resolved by subject fallback
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Well, I was thinking a few things. Right now, if we update the
catversion.h, we will require a dump/reload. If we can update just the
WAL version stamp, that will allow us to fix WAL format problems without
requiring people to dump/reload.Since there is not a separate WAL version stamp, introducing one now
would certainly force an initdb. I don't mind adding one if you think
it's useful; another 4 bytes in pg_control won't hurt anything. But
it's not going to save anyone's bacon on this cycle.At least one of my concerns (single point of failure) would require a
change to the layout of pg_control, which would force initdb anyway.
Anyone want to propose a third version# for pg_control?
I now remember Hiroshi complaining about major WAL problems also,
particularly corrupt WAL files preventing the database from starting.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
On Fri, Mar 02, 2001 at 10:54:04AM -0500, Bruce Momjian wrote:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Is there a version number in the WAL file?
catversion.h will do fine, no?
Can we put conditional code in there to create
new log file records with an updated format?The WAL stuff is *far* too complex already. I've spent a week studying
it and I only partially understand it. I will not consent to trying to
support multiple log file formats concurrently.Well, I was thinking a few things. Right now, if we update the
catversion.h, we will require a dump/reload. If we can update just the
WAL version stamp, that will allow us to fix WAL format problems without
requiring people to dump/reload. I can imagine this would be valuable
if we find we need to make changes in 7.1.1, where we can not require
dump/reload.
It Seems to Me that after an orderly shutdown, the WAL files should be,
effectively, slag -- they should contain no deltas from the current
table contents. In practice that means the only part of the format that
*should* matter is whatever it takes to discover that they really are
slag.
That *should* mean that, at worst, a change to the WAL file format should
only require doing an orderly shutdown, and then (perhaps) running a simple
program to generate a new-format empty WAL. It ought not to require an
initdb.
Of course the details of the current implementation may interfere with
that ideal, but it seems a worthy goal for the next beta, if it's not
possible already. Given the opportunity to change the current WAL format,
it ought to be possible to avoid even needing to run a program to generate
an empty WAL.
Nathan Myers
ncm@zembu.com
It Seems to Me that after an orderly shutdown, the WAL files should be,
effectively, slag -- they should contain no deltas from the current
table contents. In practice that means the only part of the format that
*should* matter is whatever it takes to discover that they really are
slag.
That *should* mean that, at worst, a change to the WAL file format should
only require doing an orderly shutdown, and then (perhaps) running a simple
program to generate a new-format empty WAL. It ought not to require an
initdb.Of course the details of the current implementation may interfere with
that ideal, but it seems a worthy goal for the next beta, if it's not
possible already. Given the opportunity to change the current WAL format,
it ought to be possible to avoid even needing to run a program to generate
an empty WAL.
This was my question too. If we are just changing WAL, why can't we
just have them stop the postmaster, install the new binaries, and
restart.
Tom told me on the phone that there was a magic number in the WAL log
file, and I see it now:
#define XLOG_PAGE_MAGIC 0x17345168
Couldn't we just have our new beta ignore WAL pages with this entry,
knowing that startup/shutdown creates new WAL files anyway,
Aside from inconveniencing the beta users, people can do testing easier
if we don't require a dump/reload for every WAL format change.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
ncm@zembu.com (Nathan Myers) writes:
It Seems to Me that after an orderly shutdown, the WAL files should be,
effectively, slag -- they should contain no deltas from the current
table contents. In practice that means the only part of the format that
*should* matter is whatever it takes to discover that they really are
slag.
That *should* mean that, at worst, a change to the WAL file format should
only require doing an orderly shutdown, and then (perhaps) running a simple
program to generate a new-format empty WAL. It ought not to require an
initdb.
Excellent point, considering that we were already thinking of making a
handy-dandy little utility to remove broken WAL files... Shouldn't take
much more than that to build something that also reformats pg_control.
Thanks for the suggestion!
regards, tom lane
I've reported the major problems to the mailing lists
but gotten almost no feedback about what to do.
I can't comment without access to code -:(
commit: 2001-02-26 17:19:57
0/0059996C: prv 0/00599948; xprv 0/00000000; xid 0;
RM 0 info 00 len 32
checkpoint: redo 0/0059996C; undo 0/00000000; sui 29;
nextxid 18903; nextoid 35195; online
-- this is the last normal-looking checkpoint record.
-- Judging from the commit timestamps surrounding prior
-- checkpoints, checkpoints were happening every five
-- minutes approximately on the 5-minute mark, so
You can't count on this: postmaster runs checkpoint
"maker" in 5 minutes *after* prev checkpoint was created,
not from the moment "maker" started. And checkpoint can
take *minutes*.
-- this one happened about 17:20.
-- (There really should be a timestamp
-- in the checkpoint records...)
Agreed.
commit: 2001-02-26 17:26:02
ReadRecord: record with zero len at 0/005A4B4C
-- My dump program is unhappy here because the rest
-- of the page is zero. Given that there is a
-- continuation record at the start of the next
-- page, there certainly should have been record(s)
-- here. But it's worse than that: check the commit
-- timestamps and the xid numbers before and after the
-- discontinuity. Did time go backwards here?
Commit timestamps are created *before* XLogInsert call,
which can suspend backend for some time (in multi-user
env). Random xid-s are also ok, generally.
-- Also notice the back-pointers in the first valid
-- record on the next page; they point not into the
-- zeroed space, which would suggest a mere failure
-- to write a buffer after filling it, but into the
-- middle of one of the valid records on the prior
-- page. It almost looks like page 5A6000 came from
-- a completely different run than page 5A4000.
Unexpected page info flags 0001 at offset 5A6000
Skipping unexpected continuation record at offset 5A6000
0/005A6904: prv 0/005A48B4(?); xprv 0/005A48B4; xid 19047;
^^^^^^^^^^ ^^^^^^^^^^
Same. So, TX 19047 really inserted record at 0/005A48B4
position.
-- What's even nastier (and the immediate cause of
-- Scott's inability to restart) is that the pg_control
-- file's checkPoint pointer points to 0/005AF9F0, which
-- is *not* the location of this checkpoint, but of
-- the record after it.
Well, well. Checkpoint position is taken from
MyLastRecord - I wonder how could this internal var
take "invalid" data from concurrent backend.
Ok, we're leaving Krasnoyarsk in 8 hrs and should
arrive SF Feb 5 ~ 10pm.
Vadim
-----------------------------------------------
FREE! The World's Best Email Address @email.com
Reserve your name now at http://www.email.com
Import Notes
Resolved by subject fallback
Vadim Mikheev <vadim4o@email.com> writes:
-- Judging from the commit timestamps surrounding prior
-- checkpoints, checkpoints were happening every five
-- minutes approximately on the 5-minute mark, so
You can't count on this: postmaster runs checkpoint
"maker" in 5 minutes *after* prev checkpoint was created,
not from the moment "maker" started. And checkpoint can
take *minutes*.
Good point, although with so little going on (this is the *whole*
relevant section of the log), that seems unlikely.
-- here. But it's worse than that: check the commit
-- timestamps and the xid numbers before and after the
-- discontinuity. Did time go backwards here?
Commit timestamps are created *before* XLogInsert call,
which can suspend backend for some time (in multi-user
env). Random xid-s are also ok, generally.
Hmm ... maybe. Though again, this installation doesn't seem to have
been busy enough to cause a commit to be delayed for very long.
What I realized after posting that analysis is that the last checkpoint
record has SUI 30 whereas the earlier ones have SUI 29 ... so there was
a system restart in there somewhere. That still leaves me wondering
about the discontinuity and broken back-link, but it may account for
the "missing" checkpoint records --- perhaps they weren't generated
because the system wasn't up the entire interval.
-- What's even nastier (and the immediate cause of
-- Scott's inability to restart) is that the pg_control
-- file's checkPoint pointer points to 0/005AF9F0, which
-- is *not* the location of this checkpoint, but of
-- the record after it.
Well, well. Checkpoint position is taken from
MyLastRecord - I wonder how could this internal var
take "invalid" data from concurrent backend.
I have not been able to figure that one out either.
Ok, we're leaving Krasnoyarsk in 8 hrs and should
arrive SF Feb 5 ~ 10pm.
Have a safe trip!
regards, tom lane
Since there is not a separate WAL version stamp, introducing one now
would certainly force an initdb. I don't mind adding one if you think
it's useful; another 4 bytes in pg_control won't hurt anything. But
it's not going to save anyone's bacon on this cycle.
Yes, if initdb, that would probably be a good idea.
Imho the initdb now is not a real issue, since all beta testers
know that for serious issues there might be an initdb after beta started.
At least one of my concerns (single point of failure) would require a
change to the layout of pg_control, which would force initdb anyway.
Was that the "only one checkpoint back in time in pg_control" issue ?
One issue about too many checkpoints in pg_control, is that you then need
to keep more logs, and in my pgbench tests the log space was a real issue,
even for the one checkpoint case. I think a utility to recreate a busted pg_control
would add a lot more stability, than one more checkpoint in pg_control.
We should probably have additional criteria to time, that can trigger a
checkpoint, like N logs filled since last checkpoint. I do not think
reducing the checkpoint interval is a solution for once in a while heavy activity.
Andreas
Import Notes
Resolved by subject fallback
Zeugswetter Andreas SB <ZeugswetterA@Wien.Spardat.at> writes:
At least one of my concerns (single point of failure) would require a
change to the layout of pg_control, which would force initdb anyway.
Was that the "only one checkpoint back in time in pg_control" issue ?
Yes.
One issue about too many checkpoints in pg_control, is that you then
need to keep more logs, and in my pgbench tests the log space was a
real issue, even for the one checkpoint case. I think a utility to
recreate a busted pg_control would add a lot more stability, than one
more checkpoint in pg_control.
Well, there is a big difference between 1 and 2 checkpoints stored in
pg_control. I don't intend to go further than 2. But I disagree about
a log-reset utility being more useful than an extra checkpoint. The
utility would be for manual recovery after a disaster, and it wouldn't
offer 100% recovery: you couldn't be sure that the last few transactions
had been applied atomically, ie, all or none. (Perhaps pg_log got
updated to show them committed, but not all of their tuple changes made
it to disk; how will you know?) If you can back up to the prior
checkpoint and then roll forward, you *do* have a shot at guaranteeing
a consistent database state after loss of the primary checkpoint.
We should probably have additional criteria to time, that can trigger a
checkpoint, like N logs filled since last checkpoint.
Perhaps. I don't have time to work on that now, but we can certainly
improve the strategy in future releases.
regards, tom lane
Zeugswetter Andreas SB <ZeugswetterA@Wien.Spardat.at> writes:
At least one of my concerns (single point of failure) would require a
change to the layout of pg_control, which would force initdb anyway.Was that the "only one checkpoint back in time in pg_control" issue ?
Yes.
Is changing pg_control the thing that is going to require the initdb?
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
One issue about too many checkpoints in pg_control, is that you then
need to keep more logs, and in my pgbench tests the log space was a
real issue, even for the one checkpoint case. I think a utility to
recreate a busted pg_control would add a lot more stability, than one
more checkpoint in pg_control.Well, there is a big difference between 1 and 2 checkpoints stored in
pg_control. I don't intend to go further than 2. But I disagree about
a log-reset utility being more useful than an extra checkpoint.
Yes I agree, I thought there was already one additional checkpoint info in
pg_control.
The
utility would be for manual recovery after a disaster, and it wouldn't
offer 100% recovery: you couldn't be sure that the last few transactions
had been applied atomically, ie, all or none. (Perhaps pg_log got
updated to show them committed, but not all of their tuple changes made
it to disk; how will you know?) If you can back up to the prior
checkpoint and then roll forward, you *do* have a shot at guaranteeing
a consistent database state after loss of the primary checkpoint.
Yes, but a consistent db can only be guaranteed if all txlog logs up to the
crash are eighter rolled forward or at least the physical log pages are written
back to disk.
The consequence is imho, that a good utility to reset the logs should keep
all "physical log" pages, and only clear the log from all other records
[optionally starting at the position that hinders rollforward].
Andreas
Import Notes
Resolved by subject fallback