New XLOG record indicating WAL-skipping
On Wed, Dec 9, 2009 at 6:25 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
Here is the patch:
- Write an XLOG UNLOGGED record in WAL if WAL-logging is skipped for only
the reason that WAL archiving is not enabled and such record has not been
written yet.- Cause archive recovery to end if an XLOG UNLOGGED record is found during
it.
Here's an updated version of my "New XLOG record indicating WAL-skipping" patch.
http://archives.postgresql.org/pgsql-hackers/2009-12/msg00788.php
This is rebased to CVS HEAD.
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
Attachments:
log_unlogged_op_0105.patchtext/x-patch; charset=US-ASCII; name=log_unlogged_op_0105.patchDownload
*** a/src/backend/access/heap/heapam.c
--- b/src/backend/access/heap/heapam.c
***************
*** 1976,1981 **** heap_insert(Relation relation, HeapTuple tup, CommandId cid,
--- 1976,1988 ----
PageSetTLI(page, ThisTimeLineID);
}
+ /*
+ * Write an XLOG UNLOGGED record if WAL-logging is skipped for the reason
+ * that WAL archiving is not enabled.
+ */
+ if (options & HEAP_INSERT_SKIP_WAL && !relation->rd_istemp)
+ XLogSkipLogging();
+
END_CRIT_SECTION();
UnlockReleaseBuffer(buffer);
*** a/src/backend/access/nbtree/nbtsort.c
--- b/src/backend/access/nbtree/nbtsort.c
***************
*** 215,220 **** _bt_leafbuild(BTSpool *btspool, BTSpool *btspool2)
--- 215,227 ----
*/
wstate.btws_use_wal = XLogArchivingActive() && !wstate.index->rd_istemp;
+ /*
+ * Write an XLOG UNLOGGED record if WAL-logging is skipped for the reason
+ * that WAL archiving is not enabled.
+ */
+ if (!XLogArchivingActive() && !wstate.index->rd_istemp)
+ XLogSkipLogging();
+
/* reserve the metapage */
wstate.btws_pages_alloced = BTREE_METAPAGE + 1;
wstate.btws_pages_written = 0;
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***************
*** 560,565 **** XLogInsert(RmgrId rmid, uint8 info, XLogRecData *rdata)
--- 560,566 ----
bool updrqst;
bool doPageWrites;
bool isLogSwitch = (rmid == RM_XLOG_ID && info == XLOG_SWITCH);
+ bool isLogUnlogged = (rmid == RM_XLOG_ID && info == XLOG_UNLOGGED);
/* cross-check on whether we should be here or not */
if (!XLogInsertAllowed())
***************
*** 709,717 **** begin:;
* error checking in ReadRecord. This means that all callers of
* XLogInsert must supply at least some not-in-a-buffer data. However, we
* make an exception for XLOG SWITCH records because we don't want them to
! * ever cross a segment boundary.
*/
! if (len == 0 && !isLogSwitch)
elog(PANIC, "invalid xlog record length %u", len);
START_CRIT_SECTION();
--- 710,718 ----
* error checking in ReadRecord. This means that all callers of
* XLogInsert must supply at least some not-in-a-buffer data. However, we
* make an exception for XLOG SWITCH records because we don't want them to
! * ever cross a segment boundary. Also XLOG UNLOGGED records are exception.
*/
! if (len == 0 && !isLogSwitch && !isLogUnlogged)
elog(PANIC, "invalid xlog record length %u", len);
START_CRIT_SECTION();
***************
*** 3593,3600 **** ReadRecord(XLogRecPtr *RecPtr, int emode)
got_record:;
/*
! * xl_len == 0 is bad data for everything except XLOG SWITCH, where it is
! * required.
*/
if (record->xl_rmid == RM_XLOG_ID && record->xl_info == XLOG_SWITCH)
{
--- 3594,3601 ----
got_record:;
/*
! * xl_len == 0 is bad data for everything except XLOG SWITCH and XLOG UNLOGGED,
! * where it is required.
*/
if (record->xl_rmid == RM_XLOG_ID && record->xl_info == XLOG_SWITCH)
{
***************
*** 3606,3611 **** got_record:;
--- 3607,3622 ----
goto next_record_is_invalid;
}
}
+ else if (record->xl_rmid == RM_XLOG_ID && record->xl_info == XLOG_UNLOGGED)
+ {
+ if (record->xl_len != 0)
+ {
+ ereport(emode,
+ (errmsg("invalid xlog unlogged operation record at %X/%X",
+ RecPtr->xlogid, RecPtr->xrecoff)));
+ goto next_record_is_invalid;
+ }
+ }
else if (record->xl_len == 0)
{
ereport(emode,
***************
*** 3801,3806 **** got_record:;
--- 3812,3830 ----
*/
readOff = XLogSegSize - XLOG_BLCKSZ;
}
+
+ /*
+ * Special processing if it's an XLOG UNLOGGED record and we are doing
+ * an archive recovery.
+ */
+ if (record->xl_rmid == RM_XLOG_ID && record->xl_info == XLOG_UNLOGGED &&
+ InArchiveRecovery)
+ {
+ ereport(emode,
+ (errmsg("unlogged operation record is found at %X/%X",
+ RecPtr->xlogid, RecPtr->xrecoff)));
+ goto next_record_is_invalid;
+ }
return (XLogRecord *) buffer;
next_record_is_invalid:;
***************
*** 7204,7209 **** RequestXLogSwitch(void)
--- 7228,7263 ----
}
/*
+ * Write an XLOG UNLOGGED record.
+ */
+ void
+ XLogSkipLogging(void)
+ {
+ XLogRecData rdata;
+ static bool skipped = false;
+
+ /*
+ * If an XLOG UNLOGGED record has already been written since
+ * postmaster has started, we need to do nothing here.
+ *
+ * We can reduce the number of an XLOG UNLOGGED records written
+ * by sharing the flag 'skipped' between backends. But this is
+ * not worthwhile.
+ */
+ if (skipped)
+ return;
+ skipped = true;
+
+ /* XLOG UNLOGGED, alone among xlog record types, has no data */
+ rdata.buffer = InvalidBuffer;
+ rdata.data = NULL;
+ rdata.len = 0;
+ rdata.next = NULL;
+
+ XLogInsert(RM_XLOG_ID, XLOG_UNLOGGED, &rdata);
+ }
+
+ /*
* XLOG resource manager's routines
*
* Definitions of info values are in include/catalog/pg_control.h, though
***************
*** 7345,7350 **** xlog_redo(XLogRecPtr lsn, XLogRecord *record)
--- 7399,7408 ----
LWLockRelease(ControlFileLock);
}
}
+ else if (info == XLOG_UNLOGGED)
+ {
+ /* nothing to do here */
+ }
}
void
***************
*** 7394,7399 **** xlog_desc(StringInfo buf, uint8 xl_info, char *rec)
--- 7452,7461 ----
appendStringInfo(buf, "backup end: %X/%X",
startpoint.xlogid, startpoint.xrecoff);
}
+ else if (info == XLOG_UNLOGGED)
+ {
+ appendStringInfo(buf, "xlog unlogged");
+ }
else
appendStringInfo(buf, "UNKNOWN");
}
*** a/src/backend/commands/cluster.c
--- b/src/backend/commands/cluster.c
***************
*** 801,806 **** copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex)
--- 801,813 ----
*/
use_wal = XLogArchivingActive() && !NewHeap->rd_istemp;
+ /*
+ * Write an XLOG UNLOGGED record if WAL-logging is skipped for the reason
+ * that WAL archiving is not enabled.
+ */
+ if (!XLogArchivingActive() && !NewHeap->rd_istemp)
+ XLogSkipLogging();
+
/* use_wal off requires rd_targblock be initially invalid */
Assert(NewHeap->rd_targblock == InvalidBlockNumber);
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
***************
*** 7050,7055 **** copy_relation_data(SMgrRelation src, SMgrRelation dst,
--- 7050,7062 ----
*/
use_wal = XLogArchivingActive() && !istemp;
+ /*
+ * Write an XLOG UNLOGGED record if WAL-logging is skipped for the reason
+ * that WAL archiving is not enabled.
+ */
+ if (!XLogArchivingActive() && !istemp)
+ XLogSkipLogging();
+
nblocks = smgrnblocks(src, forkNum);
for (blkno = 0; blkno < nblocks; blkno++)
*** a/src/include/access/xlog.h
--- b/src/include/access/xlog.h
***************
*** 260,265 **** extern XLogRecPtr GetRedoRecPtr(void);
--- 260,267 ----
extern XLogRecPtr GetInsertRecPtr(void);
extern void GetNextXidAndEpoch(TransactionId *xid, uint32 *epoch);
+ extern void XLogSkipLogging(void);
+
extern void StartupProcessMain(void);
#endif /* XLOG_H */
*** a/src/include/catalog/pg_control.h
--- b/src/include/catalog/pg_control.h
***************
*** 63,68 **** typedef struct CheckPoint
--- 63,69 ----
#define XLOG_NEXTOID 0x30
#define XLOG_SWITCH 0x40
#define XLOG_BACKUP_END 0x50
+ #define XLOG_UNLOGGED 0x60
/* System status indicator */
Fujii Masao wrote:
On Wed, Dec 9, 2009 at 6:25 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
Here is the patch:
- Write an XLOG UNLOGGED record in WAL if WAL-logging is skipped for only
the reason that WAL archiving is not enabled and such record has not been
written yet.- Cause archive recovery to end if an XLOG UNLOGGED record is found during
it.Here's an updated version of my "New XLOG record indicating WAL-skipping" patch.
http://archives.postgresql.org/pgsql-hackers/2009-12/msg00788.php
Thanks!
I don't like special-casing UNLOGGED records in XLogInsert and
ReadRecord(). Those functions are complicated enough already. The
special handling from XLogInsert() (and a few other places) is only
required because the UNLOGGED records carry no payload. That's easy to
avoid, just add some payload to them, doesn't matter what it is. And I
don't think ReadRecord() is the right place to emit the errors/warnings,
that belongs naturally in xlog_redo().
It might be useful to add some information in the records telling why
WAL-logging was skipped. It might turn out to be useful in debugging.
That also conveniently adds payload to the records, to avoid the
special-casing in XLogInsert() :-).
I think it's a premature optimization to skip writing the records if
we've written in the same session already. Especially with the 'reason'
information added to the records, it's nice to have a record of each
such operation. All operations that skip WAL-logging are heavy enough
that an additional WAL record will make no difference. I can see that it
was required to avoid the flooding from heap_insert(), but we can move
the XLogSkipLogging() call from heap_insert() to heap_sync().
Attached is an updated patch, doing the above. Am I missing anything?
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Attachments:
log_unlogged_op_0115.patchtext/x-diff; name=log_unlogged_op_0115.patchDownload
? GNUmakefile
? b
? config.log
? config.status
? config.status.lineno
? configure.lineno
? gin-splay-1.patch
? gin-splay-2.patch
? gin-splay-3.patch
? md-1.c
? md-1.patch
? temp-file-resowner-2.patch
? contrib/pgbench/fsynctest
? contrib/pgbench/fsynctest.c
? contrib/pgbench/fsynctestfile
? contrib/spi/.deps
? doc/src/sgml/HTML.index
? doc/src/sgml/bookindex.sgml
? doc/src/sgml/features-supported.sgml
? doc/src/sgml/features-unsupported.sgml
? doc/src/sgml/version.sgml
? src/Makefile.global
? src/backend/aaa.patch
? src/backend/postgres
? src/backend/access/common/.deps
? src/backend/access/gin/.deps
? src/backend/access/gist/.deps
? src/backend/access/hash/.deps
? src/backend/access/heap/.deps
? src/backend/access/index/.deps
? src/backend/access/nbtree/.deps
? src/backend/access/transam/.deps
? src/backend/bootstrap/.deps
? src/backend/catalog/.deps
? src/backend/commands/.deps
? src/backend/executor/.deps
? src/backend/foreign/.deps
? src/backend/foreign/dummy/.deps
? src/backend/foreign/postgresql/.deps
? src/backend/lib/.deps
? src/backend/libpq/.deps
? src/backend/main/.deps
? src/backend/nodes/.deps
? src/backend/optimizer/geqo/.deps
? src/backend/optimizer/path/.deps
? src/backend/optimizer/plan/.deps
? src/backend/optimizer/prep/.deps
? src/backend/optimizer/util/.deps
? src/backend/parser/.deps
? src/backend/po/af.mo
? src/backend/po/cs.mo
? src/backend/po/de.mo
? src/backend/po/es.mo
? src/backend/po/fr.mo
? src/backend/po/hr.mo
? src/backend/po/hu.mo
? src/backend/po/it.mo
? src/backend/po/ja.mo
? src/backend/po/ko.mo
? src/backend/po/nb.mo
? src/backend/po/nl.mo
? src/backend/po/pl.mo
? src/backend/po/pt_BR.mo
? src/backend/po/ro.mo
? src/backend/po/ru.mo
? src/backend/po/sk.mo
? src/backend/po/sl.mo
? src/backend/po/sv.mo
? src/backend/po/tr.mo
? src/backend/po/zh_CN.mo
? src/backend/po/zh_TW.mo
? src/backend/port/.deps
? src/backend/postmaster/.deps
? src/backend/regex/.deps
? src/backend/replication/.deps
? src/backend/replication/walreceiver/.deps
? src/backend/rewrite/.deps
? src/backend/snowball/.deps
? src/backend/snowball/snowball_create.sql
? src/backend/storage/buffer/.deps
? src/backend/storage/file/.deps
? src/backend/storage/freespace/.deps
? src/backend/storage/ipc/.deps
? src/backend/storage/large_object/.deps
? src/backend/storage/lmgr/.deps
? src/backend/storage/page/.deps
? src/backend/storage/smgr/.deps
? src/backend/tcop/.deps
? src/backend/tsearch/.deps
? src/backend/utils/.deps
? src/backend/utils/probes.h
? src/backend/utils/adt/.deps
? src/backend/utils/cache/.deps
? src/backend/utils/error/.deps
? src/backend/utils/fmgr/.deps
? src/backend/utils/hash/.deps
? src/backend/utils/init/.deps
? src/backend/utils/mb/.deps
? src/backend/utils/mb/Unicode/BIG5.TXT
? src/backend/utils/mb/Unicode/CP950.TXT
? src/backend/utils/mb/conversion_procs/conversion_create.sql
? src/backend/utils/mb/conversion_procs/ascii_and_mic/.deps
? src/backend/utils/mb/conversion_procs/cyrillic_and_mic/.deps
? src/backend/utils/mb/conversion_procs/euc2004_sjis2004/.deps
? src/backend/utils/mb/conversion_procs/euc_cn_and_mic/.deps
? src/backend/utils/mb/conversion_procs/euc_jis_2004_and_shift_jis_2004/.deps
? src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/.deps
? src/backend/utils/mb/conversion_procs/euc_kr_and_mic/.deps
? src/backend/utils/mb/conversion_procs/euc_tw_and_big5/.deps
? src/backend/utils/mb/conversion_procs/latin2_and_win1250/.deps
? src/backend/utils/mb/conversion_procs/latin_and_mic/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_ascii/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_big5/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_cyrillic/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_euc2004/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_euc_cn/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_euc_jis_2004/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_euc_jp/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_euc_kr/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_euc_tw/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_gb18030/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_gbk/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_iso8859/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_johab/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_shift_jis_2004/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_sjis/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_sjis2004/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_uhc/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_win/.deps
? src/backend/utils/misc/.deps
? src/backend/utils/mmgr/.deps
? src/backend/utils/resowner/.deps
? src/backend/utils/sort/.deps
? src/backend/utils/time/.deps
? src/bin/initdb/.deps
? src/bin/initdb/initdb
? src/bin/initdb/po/cs.mo
? src/bin/initdb/po/de.mo
? src/bin/initdb/po/es.mo
? src/bin/initdb/po/fr.mo
? src/bin/initdb/po/it.mo
? src/bin/initdb/po/ja.mo
? src/bin/initdb/po/ko.mo
? src/bin/initdb/po/pl.mo
? src/bin/initdb/po/pt_BR.mo
? src/bin/initdb/po/ro.mo
? src/bin/initdb/po/ru.mo
? src/bin/initdb/po/sk.mo
? src/bin/initdb/po/sl.mo
? src/bin/initdb/po/sv.mo
? src/bin/initdb/po/ta.mo
? src/bin/initdb/po/tr.mo
? src/bin/initdb/po/zh_CN.mo
? src/bin/initdb/po/zh_TW.mo
? src/bin/pg_config/.deps
? src/bin/pg_config/pg_config
? src/bin/pg_config/po/cs.mo
? src/bin/pg_config/po/de.mo
? src/bin/pg_config/po/es.mo
? src/bin/pg_config/po/fr.mo
? src/bin/pg_config/po/it.mo
? src/bin/pg_config/po/ja.mo
? src/bin/pg_config/po/ko.mo
? src/bin/pg_config/po/nb.mo
? src/bin/pg_config/po/pl.mo
? src/bin/pg_config/po/pt_BR.mo
? src/bin/pg_config/po/ro.mo
? src/bin/pg_config/po/ru.mo
? src/bin/pg_config/po/sl.mo
? src/bin/pg_config/po/sv.mo
? src/bin/pg_config/po/ta.mo
? src/bin/pg_config/po/tr.mo
? src/bin/pg_config/po/zh_CN.mo
? src/bin/pg_config/po/zh_TW.mo
? src/bin/pg_controldata/.deps
? src/bin/pg_controldata/pg_controldata
? src/bin/pg_controldata/po/cs.mo
? src/bin/pg_controldata/po/de.mo
? src/bin/pg_controldata/po/es.mo
? src/bin/pg_controldata/po/fa.mo
? src/bin/pg_controldata/po/fr.mo
? src/bin/pg_controldata/po/hu.mo
? src/bin/pg_controldata/po/it.mo
? src/bin/pg_controldata/po/ja.mo
? src/bin/pg_controldata/po/ko.mo
? src/bin/pg_controldata/po/nb.mo
? src/bin/pg_controldata/po/pl.mo
? src/bin/pg_controldata/po/pt_BR.mo
? src/bin/pg_controldata/po/ro.mo
? src/bin/pg_controldata/po/ru.mo
? src/bin/pg_controldata/po/sk.mo
? src/bin/pg_controldata/po/sl.mo
? src/bin/pg_controldata/po/sv.mo
? src/bin/pg_controldata/po/ta.mo
? src/bin/pg_controldata/po/tr.mo
? src/bin/pg_controldata/po/zh_CN.mo
? src/bin/pg_controldata/po/zh_TW.mo
? src/bin/pg_ctl/.deps
? src/bin/pg_ctl/pg_ctl
? src/bin/pg_ctl/po/cs.mo
? src/bin/pg_ctl/po/de.mo
? src/bin/pg_ctl/po/es.mo
? src/bin/pg_ctl/po/fr.mo
? src/bin/pg_ctl/po/it.mo
? src/bin/pg_ctl/po/ja.mo
? src/bin/pg_ctl/po/ko.mo
? src/bin/pg_ctl/po/pt_BR.mo
? src/bin/pg_ctl/po/ro.mo
? src/bin/pg_ctl/po/ru.mo
? src/bin/pg_ctl/po/sk.mo
? src/bin/pg_ctl/po/sl.mo
? src/bin/pg_ctl/po/sv.mo
? src/bin/pg_ctl/po/ta.mo
? src/bin/pg_ctl/po/tr.mo
? src/bin/pg_ctl/po/zh_CN.mo
? src/bin/pg_ctl/po/zh_TW.mo
? src/bin/pg_dump/.deps
? src/bin/pg_dump/pg_dump
? src/bin/pg_dump/pg_dumpall
? src/bin/pg_dump/pg_restore
? src/bin/pg_dump/po/cs.mo
? src/bin/pg_dump/po/de.mo
? src/bin/pg_dump/po/es.mo
? src/bin/pg_dump/po/fr.mo
? src/bin/pg_dump/po/it.mo
? src/bin/pg_dump/po/ja.mo
? src/bin/pg_dump/po/ko.mo
? src/bin/pg_dump/po/nb.mo
? src/bin/pg_dump/po/pt_BR.mo
? src/bin/pg_dump/po/ro.mo
? src/bin/pg_dump/po/ru.mo
? src/bin/pg_dump/po/sk.mo
? src/bin/pg_dump/po/sl.mo
? src/bin/pg_dump/po/sv.mo
? src/bin/pg_dump/po/tr.mo
? src/bin/pg_dump/po/zh_CN.mo
? src/bin/pg_dump/po/zh_TW.mo
? src/bin/pg_resetxlog/.deps
? src/bin/pg_resetxlog/pg_resetxlog
? src/bin/pg_resetxlog/po/cs.mo
? src/bin/pg_resetxlog/po/de.mo
? src/bin/pg_resetxlog/po/es.mo
? src/bin/pg_resetxlog/po/fr.mo
? src/bin/pg_resetxlog/po/hu.mo
? src/bin/pg_resetxlog/po/it.mo
? src/bin/pg_resetxlog/po/ja.mo
? src/bin/pg_resetxlog/po/ko.mo
? src/bin/pg_resetxlog/po/nb.mo
? src/bin/pg_resetxlog/po/pt_BR.mo
? src/bin/pg_resetxlog/po/ro.mo
? src/bin/pg_resetxlog/po/ru.mo
? src/bin/pg_resetxlog/po/sk.mo
? src/bin/pg_resetxlog/po/sl.mo
? src/bin/pg_resetxlog/po/sv.mo
? src/bin/pg_resetxlog/po/ta.mo
? src/bin/pg_resetxlog/po/tr.mo
? src/bin/pg_resetxlog/po/zh_CN.mo
? src/bin/pg_resetxlog/po/zh_TW.mo
? src/bin/psql/.deps
? src/bin/psql/psql
? src/bin/psql/po/cs.mo
? src/bin/psql/po/de.mo
? src/bin/psql/po/es.mo
? src/bin/psql/po/fa.mo
? src/bin/psql/po/fr.mo
? src/bin/psql/po/hu.mo
? src/bin/psql/po/it.mo
? src/bin/psql/po/ja.mo
? src/bin/psql/po/ko.mo
? src/bin/psql/po/nb.mo
? src/bin/psql/po/pt_BR.mo
? src/bin/psql/po/ro.mo
? src/bin/psql/po/ru.mo
? src/bin/psql/po/sk.mo
? src/bin/psql/po/sl.mo
? src/bin/psql/po/sv.mo
? src/bin/psql/po/tr.mo
? src/bin/psql/po/zh_CN.mo
? src/bin/psql/po/zh_TW.mo
? src/bin/scripts/.deps
? src/bin/scripts/clusterdb
? src/bin/scripts/createdb
? src/bin/scripts/createlang
? src/bin/scripts/createuser
? src/bin/scripts/dropdb
? src/bin/scripts/droplang
? src/bin/scripts/dropuser
? src/bin/scripts/reindexdb
? src/bin/scripts/vacuumdb
? src/bin/scripts/po/cs.mo
? src/bin/scripts/po/de.mo
? src/bin/scripts/po/es.mo
? src/bin/scripts/po/fr.mo
? src/bin/scripts/po/it.mo
? src/bin/scripts/po/ja.mo
? src/bin/scripts/po/ko.mo
? src/bin/scripts/po/pt_BR.mo
? src/bin/scripts/po/ro.mo
? src/bin/scripts/po/ru.mo
? src/bin/scripts/po/sk.mo
? src/bin/scripts/po/sl.mo
? src/bin/scripts/po/sv.mo
? src/bin/scripts/po/ta.mo
? src/bin/scripts/po/tr.mo
? src/bin/scripts/po/zh_CN.mo
? src/bin/scripts/po/zh_TW.mo
? src/include/pg_config.h
? src/include/stamp-h
? src/interfaces/ecpg/compatlib/.deps
? src/interfaces/ecpg/compatlib/exports.list
? src/interfaces/ecpg/compatlib/libecpg_compat.so.3.1
? src/interfaces/ecpg/compatlib/libecpg_compat.so.3.2
? src/interfaces/ecpg/ecpglib/.deps
? src/interfaces/ecpg/ecpglib/exports.list
? src/interfaces/ecpg/ecpglib/libecpg.so.6.1
? src/interfaces/ecpg/ecpglib/libecpg.so.6.2
? src/interfaces/ecpg/ecpglib/po/de.mo
? src/interfaces/ecpg/ecpglib/po/es.mo
? src/interfaces/ecpg/ecpglib/po/fr.mo
? src/interfaces/ecpg/ecpglib/po/it.mo
? src/interfaces/ecpg/ecpglib/po/ja.mo
? src/interfaces/ecpg/ecpglib/po/pt_BR.mo
? src/interfaces/ecpg/ecpglib/po/tr.mo
? src/interfaces/ecpg/include/ecpg_config.h
? src/interfaces/ecpg/include/stamp-h
? src/interfaces/ecpg/pgtypeslib/.deps
? src/interfaces/ecpg/pgtypeslib/exports.list
? src/interfaces/ecpg/pgtypeslib/libpgtypes.so.3.1
? src/interfaces/ecpg/pgtypeslib/libpgtypes.so.3.2
? src/interfaces/ecpg/preproc/.deps
? src/interfaces/ecpg/preproc/ecpg
? src/interfaces/ecpg/preproc/po/de.mo
? src/interfaces/ecpg/preproc/po/es.mo
? src/interfaces/ecpg/preproc/po/fr.mo
? src/interfaces/ecpg/preproc/po/it.mo
? src/interfaces/ecpg/preproc/po/ja.mo
? src/interfaces/ecpg/preproc/po/pt_BR.mo
? src/interfaces/ecpg/preproc/po/tr.mo
? src/interfaces/libpq/.deps
? src/interfaces/libpq/exports.list
? src/interfaces/libpq/libpq.so.5.2
? src/interfaces/libpq/libpq.so.5.3
? src/interfaces/libpq/po/af.mo
? src/interfaces/libpq/po/cs.mo
? src/interfaces/libpq/po/de.mo
? src/interfaces/libpq/po/es.mo
? src/interfaces/libpq/po/fr.mo
? src/interfaces/libpq/po/hr.mo
? src/interfaces/libpq/po/it.mo
? src/interfaces/libpq/po/ja.mo
? src/interfaces/libpq/po/ko.mo
? src/interfaces/libpq/po/nb.mo
? src/interfaces/libpq/po/pl.mo
? src/interfaces/libpq/po/pt_BR.mo
? src/interfaces/libpq/po/ru.mo
? src/interfaces/libpq/po/sk.mo
? src/interfaces/libpq/po/sl.mo
? src/interfaces/libpq/po/sv.mo
? src/interfaces/libpq/po/ta.mo
? src/interfaces/libpq/po/tr.mo
? src/interfaces/libpq/po/zh_CN.mo
? src/interfaces/libpq/po/zh_TW.mo
? src/pl/plperl/.deps
? src/pl/plperl/SPI.c
? src/pl/plperl/perlchunks.h
? src/pl/plperl/po/de.mo
? src/pl/plperl/po/es.mo
? src/pl/plperl/po/fr.mo
? src/pl/plperl/po/it.mo
? src/pl/plperl/po/ja.mo
? src/pl/plperl/po/pt_BR.mo
? src/pl/plperl/po/tr.mo
? src/pl/plpgsql/src/.deps
? src/pl/plpgsql/src/pl_scan.c
? src/pl/plpgsql/src/po/de.mo
? src/pl/plpgsql/src/po/es.mo
? src/pl/plpgsql/src/po/fr.mo
? src/pl/plpgsql/src/po/it.mo
? src/pl/plpgsql/src/po/ja.mo
? src/pl/plpgsql/src/po/ro.mo
? src/pl/plpgsql/src/po/tr.mo
? src/port/.deps
? src/port/pg_config_paths.h
? src/test/regress/.deps
? src/test/regress/log
? src/test/regress/pg_regress
? src/test/regress/results
? src/test/regress/testtablespace
? src/test/regress/tmp_check
? src/test/regress/expected/constraints.out
? src/test/regress/expected/copy.out
? src/test/regress/expected/create_function_1.out
? src/test/regress/expected/create_function_2.out
? src/test/regress/expected/largeobject.out
? src/test/regress/expected/largeobject_1.out
? src/test/regress/expected/misc.out
? src/test/regress/expected/tablespace.out
? src/test/regress/sql/constraints.sql
? src/test/regress/sql/copy.sql
? src/test/regress/sql/create_function_1.sql
? src/test/regress/sql/create_function_2.sql
? src/test/regress/sql/largeobject.sql
? src/test/regress/sql/misc.sql
? src/test/regress/sql/tablespace.sql
? src/timezone/.deps
? src/timezone/zic
Index: src/backend/access/heap/heapam.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/access/heap/heapam.c,v
retrieving revision 1.282
diff -u -r1.282 heapam.c
--- src/backend/access/heap/heapam.c 14 Jan 2010 11:08:00 -0000 1.282
+++ src/backend/access/heap/heapam.c 15 Jan 2010 11:27:44 -0000
@@ -5074,10 +5074,16 @@
void
heap_sync(Relation rel)
{
+ char reason[NAMEDATALEN + 30];
+
/* temp tables never need fsync */
if (rel->rd_istemp)
return;
+ snprintf(reason, sizeof(reason), "heap inserts on \"%s\"",
+ RelationGetRelationName(rel));
+ XLogSkipLogging(reason);
+
/* main heap */
FlushRelationBuffers(rel);
/* FlushRelationBuffers will have opened rd_smgr */
Index: src/backend/access/nbtree/nbtsort.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/access/nbtree/nbtsort.c,v
retrieving revision 1.122
diff -u -r1.122 nbtsort.c
--- src/backend/access/nbtree/nbtsort.c 15 Jan 2010 09:19:00 -0000 1.122
+++ src/backend/access/nbtree/nbtsort.c 15 Jan 2010 11:27:44 -0000
@@ -215,6 +215,18 @@
*/
wstate.btws_use_wal = XLogIsNeeded() && !wstate.index->rd_istemp;
+ /*
+ * Write an XLOG UNLOGGED record if WAL-logging was skipped because
+ * WAL archiving is not enabled.
+ */
+ if (!wstate.btws_use_wal && !wstate.index->rd_istemp)
+ {
+ char reason[NAMEDATALEN + 20];
+ snprintf(reason, sizeof(reason), "b-tree build on \"%s\"",
+ RelationGetRelationName(wstate.index));
+ XLogSkipLogging(reason);
+ }
+
/* reserve the metapage */
wstate.btws_pages_alloced = BTREE_METAPAGE + 1;
wstate.btws_pages_written = 0;
Index: src/backend/access/transam/xlog.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/access/transam/xlog.c,v
retrieving revision 1.358
diff -u -r1.358 xlog.c
--- src/backend/access/transam/xlog.c 15 Jan 2010 09:19:00 -0000 1.358
+++ src/backend/access/transam/xlog.c 15 Jan 2010 11:27:45 -0000
@@ -7562,6 +7562,31 @@
}
/*
+ * Write an XLOG UNLOGGED record, indicating that some operation was
+ * performed on data that we fsync()'d directly to disk, skipping
+ * WAL-logging.
+ *
+ * Such operations screw up archive recovery, so we complain if we see
+ * these records during archive recovery. That shouldn't happen in a
+ * correctly configured server, but you can induce it by temporarily
+ * disabling archiving and restarting, so it's good to at least get a
+ * warning of silent data loss in such cases. These records serve no
+ * other purpose and are simply ignored during crash recovery.
+ */
+void
+XLogSkipLogging(char *reason)
+{
+ XLogRecData rdata;
+
+ rdata.buffer = InvalidBuffer;
+ rdata.data = reason;
+ rdata.len = strlen(reason) + 1;
+ rdata.next = NULL;
+
+ XLogInsert(RM_XLOG_ID, XLOG_UNLOGGED, &rdata);
+}
+
+/*
* XLOG resource manager's routines
*
* Definitions of info values are in include/catalog/pg_control.h, though
@@ -7703,6 +7728,19 @@
LWLockRelease(ControlFileLock);
}
}
+ else if (info == XLOG_UNLOGGED)
+ {
+ if (InArchiveRecovery)
+ {
+ /*
+ * Note: We don't print the reason string from the record,
+ * because that gets added as a line using xlog_desc()
+ */
+ ereport(WARNING,
+ (errmsg("unlogged operation performed, data may be missing"),
+ errhint("This can happen if you temporarily disable archive_mode without taking a new base backup.")));
+ }
+ }
}
void
@@ -7752,6 +7790,12 @@
appendStringInfo(buf, "backup end: %X/%X",
startpoint.xlogid, startpoint.xrecoff);
}
+ else if (info == XLOG_UNLOGGED)
+ {
+ char *reason = rec;
+
+ appendStringInfo(buf, "unlogged operation: %s", reason);
+ }
else
appendStringInfo(buf, "UNKNOWN");
}
Index: src/backend/commands/cluster.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/cluster.c,v
retrieving revision 1.193
diff -u -r1.193 cluster.c
--- src/backend/commands/cluster.c 15 Jan 2010 09:19:01 -0000 1.193
+++ src/backend/commands/cluster.c 15 Jan 2010 11:27:45 -0000
@@ -821,6 +821,18 @@
*/
use_wal = XLogIsNeeded() && !NewHeap->rd_istemp;
+ /*
+ * Write an XLOG UNLOGGED record if WAL-logging was skipped because
+ * WAL archiving is not enabled.
+ */
+ if (!use_wal && !NewHeap->rd_istemp)
+ {
+ char reason[NAMEDATALEN + 20];
+ snprintf(reason, sizeof(reason), "CLUSTER on \"%s\"",
+ RelationGetRelationName(NewHeap));
+ XLogSkipLogging(reason);
+ }
+
/* use_wal off requires rd_targblock be initially invalid */
Assert(NewHeap->rd_targblock == InvalidBlockNumber);
Index: src/backend/commands/tablecmds.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/tablecmds.c,v
retrieving revision 1.315
diff -u -r1.315 tablecmds.c
--- src/backend/commands/tablecmds.c 15 Jan 2010 09:19:01 -0000 1.315
+++ src/backend/commands/tablecmds.c 15 Jan 2010 11:27:45 -0000
@@ -7018,6 +7018,19 @@
heap_close(pg_class, RowExclusiveLock);
+ /*
+ * Write an XLOG UNLOGGED record if WAL-logging was skipped because
+ * WAL archiving is not enabled.
+ */
+ if (!XLogIsNeeded() && !rel->rd_istemp)
+ {
+ char reason[NAMEDATALEN + 40];
+ snprintf(reason, sizeof(reason), "ALTER TABLE SET TABLESPACE on \"%s\"",
+ RelationGetRelationName(rel));
+
+ XLogSkipLogging(reason);
+ }
+
relation_close(rel, NoLock);
/* Make sure the reltablespace change is visible */
@@ -7046,6 +7059,10 @@
/*
* We need to log the copied data in WAL iff WAL archiving/streaming is
* enabled AND it's not a temp rel.
+ *
+ * Note: If you change the conditions here, update the conditions in
+ * ATExecSetTableSpace() for when an XLOG UNLOGGED record is written
+ * to match.
*/
use_wal = XLogIsNeeded() && !istemp;
Index: src/include/access/xlog.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/access/xlog.h,v
retrieving revision 1.96
diff -u -r1.96 xlog.h
--- src/include/access/xlog.h 15 Jan 2010 09:19:06 -0000 1.96
+++ src/include/access/xlog.h 15 Jan 2010 11:27:45 -0000
@@ -278,6 +278,7 @@
extern void CreateCheckPoint(int flags);
extern bool CreateRestartPoint(int flags);
extern void XLogPutNextOid(Oid nextOid);
+extern void XLogSkipLogging(char *reason);
extern XLogRecPtr GetRedoRecPtr(void);
extern XLogRecPtr GetInsertRecPtr(void);
extern XLogRecPtr GetWriteRecPtr(void);
Index: src/include/catalog/pg_control.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/catalog/pg_control.h,v
retrieving revision 1.48
diff -u -r1.48 pg_control.h
--- src/include/catalog/pg_control.h 4 Jan 2010 12:50:50 -0000 1.48
+++ src/include/catalog/pg_control.h 15 Jan 2010 11:27:45 -0000
@@ -63,6 +63,7 @@
#define XLOG_NEXTOID 0x30
#define XLOG_SWITCH 0x40
#define XLOG_BACKUP_END 0x50
+#define XLOG_UNLOGGED 0x60
/* System status indicator */
On Fri, Jan 15, 2010 at 11:28 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
I can see that it
was required to avoid the flooding from heap_insert(), but we can move
the XLogSkipLogging() call from heap_insert() to heap_sync().Attached is an updated patch, doing the above. Am I missing anything?
Hm, perhaps the timing is actually important? What if someone takes a
hot backup while an unlogged operation is in progress. The checkpoint
can occur and finish and the backup finish all while the unlogged
operation is happening. Then the replica can start restoring archived
logs from that point forward. In the original coding it sounds like
the replica would never notice the unlogged operation which might not
have been synced before the start of the initial hot backup. If the
record occurs when the sync begins then the replica would be in
trouble if the checkpoint begins before the operation completed but
finished after the sync began and the record was emitted.
It seems like it's important that the record occur only after the sync
*completes* to be sure that if the replica doesn't see the record then
it knows the sync was done before its initial backup image was taken.
--
greg
Greg Stark wrote:
What if someone takes a hot backup while an unlogged operation is in progress.
Can't do that, pg_start_backup() throws an error if archive_mode=off.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Fri, Jan 15, 2010 at 8:28 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
I don't like special-casing UNLOGGED records in XLogInsert and
ReadRecord(). Those functions are complicated enough already. The
special handling from XLogInsert() (and a few other places) is only
required because the UNLOGGED records carry no payload. That's easy to
avoid, just add some payload to them, doesn't matter what it is. And I
don't think ReadRecord() is the right place to emit the errors/warnings,
that belongs naturally in xlog_redo().It might be useful to add some information in the records telling why
WAL-logging was skipped. It might turn out to be useful in debugging.
That also conveniently adds payload to the records, to avoid the
special-casing in XLogInsert() :-).I think it's a premature optimization to skip writing the records if
we've written in the same session already. Especially with the 'reason'
information added to the records, it's nice to have a record of each
such operation. All operations that skip WAL-logging are heavy enough
that an additional WAL record will make no difference. I can see that it
was required to avoid the flooding from heap_insert(), but we can move
the XLogSkipLogging() call from heap_insert() to heap_sync().Attached is an updated patch, doing the above. Am I missing anything?
Thanks a lot! Your change seems to be OK.
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
On Sat, Jan 16, 2010 at 3:16 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
Attached is an updated patch, doing the above. Am I missing anything?
Thanks a lot! Your change seems to be OK.
We'll need to do some more work after the following patch
has been committed.
http://archives.postgresql.org/pgsql-hackers/2010-01/msg01715.php
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
On Fri, 2010-01-15 at 13:28 +0200, Heikki Linnakangas wrote:
I think it's a premature optimization to skip writing the records if
we've written in the same session already. Especially with the
'reason'
information added to the records, it's nice to have a record of each
such operation. All operations that skip WAL-logging are heavy enough
that an additional WAL record will make no difference. I can see that
it
was required to avoid the flooding from heap_insert(), but we can move
the XLogSkipLogging() call from heap_insert() to heap_sync().
Can we call that XLogReportUnloggedStatement() or similar?
XlogSkipLogging() sounds like a request rather than a mark/report/record
type of action.
Attached is an updated patch, doing the above. Am I missing anything?
Sounds OK and works with Hot Standby.
--
Simon Riggs www.2ndQuadrant.com
Simon Riggs wrote:
Can we call that XLogReportUnloggedStatement() or similar?
XlogSkipLogging() sounds like a request rather than a mark/report/record
type of action.
Agreed. I vote for XLogReportUnloggedOperation(). I'll change it to that
before committing, unless Fujii beats me to it.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Mon, Jan 18, 2010 at 9:17 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
Agreed. I vote for XLogReportUnloggedOperation().
OK.
I'll change it to that
before committing, unless Fujii beats me to it.
Yeah, please go ahead.
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center