pgsql: Reduce WAL activity for page splits: > Currently, an index split

Started by Nonamealmost 19 years ago7 messages
#1Noname
momjian@postgresql.org

Log Message:
-----------
Reduce WAL activity for page splits:

Currently, an index split writes all the data on the split page to
WAL. That's a lot of WAL traffic. The tuples that are copied to the
right page need to be WAL logged, but the tuples that stay on the
original page don't.

Heikki Linnakangas

Modified Files:
--------------
pgsql/src/backend/access/nbtree:
nbtinsert.c (r1.149 -> r1.150)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/access/nbtree/nbtinsert.c.diff?r1=1.149&r2=1.150)
nbtxlog.c (r1.41 -> r1.42)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/access/nbtree/nbtxlog.c.diff?r1=1.41&r2=1.42)
pgsql/src/include/access:
nbtree.h (r1.110 -> r1.111)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/include/access/nbtree.h.diff?r1=1.110&r2=1.111)

#2Stefan Kaltenbrunner
stefan@kaltenbrunner.cc
In reply to: Noname (#1)
Re: [COMMITTERS] pgsql: Reduce WAL activity for page splits: > Currently, an index split

Bruce Momjian wrote:

Log Message:
-----------
Reduce WAL activity for page splits:

Currently, an index split writes all the data on the split page to
WAL. That's a lot of WAL traffic. The tuples that are copied to the
right page need to be WAL logged, but the tuples that stay on the
original page don't.

this patch killed both of my OpenBSD buildfarm members

a backtrace of the crash shows:

(gdb) bt
#0 0x1c055504 in _bt_split (rel=0x835fcab0, buf=53, firstright=50,
newitemoff=54, newitemsz=76,
newitem=0x8210a364, newitemonleft=0 '\0') at nbtinsert.c:956
#1 0x1c05468b in _bt_insertonpg (rel=0x835fcab0, buf=53,
stack=0x8210a078, keysz=2, scankey=0x8210a3f0,
itup=0x8210a364, afteritem=0, split_only_page=0 '\0') at
nbtinsert.c:538
#2 0x1c053fae in _bt_doinsert (rel=0x835fcab0, itup=0x8210a364,
index_is_unique=1 '\001', heapRel=0x7ec0ee7c)
at nbtinsert.c:141
#3 0x1c05a1c4 in btinsert (fcinfo=0x2) at nbtree.c:224
#4 0x1c2028fc in FunctionCall6 (flinfo=0x8bc6d1f0, arg1=2, arg2=2,
arg3=2, arg4=2, arg5=2, arg6=2) at fmgr.c:1267
#5 0x1c052d20 in index_insert (indexRelation=0x835fcab0,
values=0xcf7defd0, isnull=0xcf7df050 "",
heap_t_ctid=0x7df53e50, heapRelation=0x7ec0ee7c, check_uniqueness=1
'\001') at indexam.c:196
#6 0x1c08f87e in CatalogIndexInsert (indstate=0x2,
heapTuple=0x8210a224) at indexing.c:124
#7 0x1c08f902 in CatalogUpdateIndexes (heapRel=0x7ec0ee7c,
heapTuple=0x7df53e4c) at indexing.c:149
#8 0x1c17255c in SetRelationRuleStatus (relationId=10951, relHasRules=1
'\001', relIsBecomingView=0 '\0')
at rewriteSupport.c:80
#9 0x1c16f074 in DefineQueryRewrite (stmt=0x82ca7e78) at
rewriteDefine.c:432
#10 0x1c18c822 in PortalRunUtility (portal=0x82cb201c, query=0x81897484,
dest=0x3c0c8ff4,
completionTag=0xcf7df2a0 "") at pquery.c:1063
#11 0x1c18cb12 in PortalRunMulti (portal=0x82cb201c, dest=0x3c0c8ff4,
altdest=0x3c0c8ff4,
completionTag=0xcf7df2a0 "") at pquery.c:1131
#12 0x1c18c28a in PortalRun (portal=0x82cb201c, count=2147483647,
dest=0x3c0c8ff4, altdest=0x3c0c8ff4,
completionTag=0xcf7df2a0 "") at pquery.c:700
#13 0x1c187d13 in exec_simple_query (
query_string=0x84a2501c "/*\n * PostgreSQL System Views\n *\n *
Copyright (c) 1996-2007, PostgreSQL Global Development Group\n *\n *
$PostgreSQL: pgsql/src/backend/catalog/system_views.sql,v 1.35
2007/01/05 22:19:25 momjian Exp $\n"...) at postgres.c:939
#14 0x1c18abb2 in PostgresMain (argc=1, argv=0xcf7df480,
username=0x80ba55b0 "mastermind") at postgres.c:3423
#15 0x1c122102 in main (argc=10, argv=0xcf7df47c) at main.c:186

Stefan

#3Heikki Linnakangas
heikki@enterprisedb.com
In reply to: Stefan Kaltenbrunner (#2)
Re: Re: [COMMITTERS] pgsql: Reduce WAL activity for page splits: > Currently, an index split

Stefan Kaltenbrunner wrote:

Bruce Momjian wrote:

Log Message:
-----------
Reduce WAL activity for page splits:

Currently, an index split writes all the data on the split page to
WAL. That's a lot of WAL traffic. The tuples that are copied to the
right page need to be WAL logged, but the tuples that stay on the
original page don't.

this patch killed both of my OpenBSD buildfarm members

Sorry about that.. I'll take a look at it.

a backtrace of the crash shows:

(gdb) bt
#0 0x1c055504 in _bt_split (rel=0x835fcab0, buf=53, firstright=50,
newitemoff=54, newitemsz=76,
newitem=0x8210a364, newitemonleft=0 '\0') at nbtinsert.c:956
#1 0x1c05468b in _bt_insertonpg (rel=0x835fcab0, buf=53,
stack=0x8210a078, keysz=2, scankey=0x8210a3f0,
itup=0x8210a364, afteritem=0, split_only_page=0 '\0') at
nbtinsert.c:538
#2 0x1c053fae in _bt_doinsert (rel=0x835fcab0, itup=0x8210a364,
index_is_unique=1 '\001', heapRel=0x7ec0ee7c)
at nbtinsert.c:141
#3 0x1c05a1c4 in btinsert (fcinfo=0x2) at nbtree.c:224
#4 0x1c2028fc in FunctionCall6 (flinfo=0x8bc6d1f0, arg1=2, arg2=2,
arg3=2, arg4=2, arg5=2, arg6=2) at fmgr.c:1267
#5 0x1c052d20 in index_insert (indexRelation=0x835fcab0,
values=0xcf7defd0, isnull=0xcf7df050 "",
heap_t_ctid=0x7df53e50, heapRelation=0x7ec0ee7c, check_uniqueness=1
'\001') at indexam.c:196
#6 0x1c08f87e in CatalogIndexInsert (indstate=0x2,
heapTuple=0x8210a224) at indexing.c:124
#7 0x1c08f902 in CatalogUpdateIndexes (heapRel=0x7ec0ee7c,
heapTuple=0x7df53e4c) at indexing.c:149
#8 0x1c17255c in SetRelationRuleStatus (relationId=10951, relHasRules=1
'\001', relIsBecomingView=0 '\0')
at rewriteSupport.c:80
#9 0x1c16f074 in DefineQueryRewrite (stmt=0x82ca7e78) at
rewriteDefine.c:432
#10 0x1c18c822 in PortalRunUtility (portal=0x82cb201c, query=0x81897484,
dest=0x3c0c8ff4,
completionTag=0xcf7df2a0 "") at pquery.c:1063
#11 0x1c18cb12 in PortalRunMulti (portal=0x82cb201c, dest=0x3c0c8ff4,
altdest=0x3c0c8ff4,
completionTag=0xcf7df2a0 "") at pquery.c:1131
#12 0x1c18c28a in PortalRun (portal=0x82cb201c, count=2147483647,
dest=0x3c0c8ff4, altdest=0x3c0c8ff4,
completionTag=0xcf7df2a0 "") at pquery.c:700
#13 0x1c187d13 in exec_simple_query (
query_string=0x84a2501c "/*\n * PostgreSQL System Views\n *\n *
Copyright (c) 1996-2007, PostgreSQL Global Development Group\n *\n *
$PostgreSQL: pgsql/src/backend/catalog/system_views.sql,v 1.35
2007/01/05 22:19:25 momjian Exp $\n"...) at postgres.c:939
#14 0x1c18abb2 in PostgresMain (argc=1, argv=0xcf7df480,
username=0x80ba55b0 "mastermind") at postgres.c:3423
#15 0x1c122102 in main (argc=10, argv=0xcf7df47c) at main.c:186

Stefan

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#4Heikki Linnakangas
heikki@enterprisedb.com
In reply to: Stefan Kaltenbrunner (#2)
1 attachment(s)
Re: Re: [COMMITTERS] pgsql: Reduce WAL activity for page splits: > Currently, an index split

Stefan Kaltenbrunner wrote:

Bruce Momjian wrote:

Log Message:
-----------
Reduce WAL activity for page splits:

Currently, an index split writes all the data on the split page to
WAL. That's a lot of WAL traffic. The tuples that are copied to the
right page need to be WAL logged, but the tuples that stay on the
original page don't.

this patch killed both of my OpenBSD buildfarm members

lopaque was referenced after pfreeing the temp page it pointed to. Also
later in the function the LSN of the left page was set, but again using
a pointer to the pfreed temp copy instead of the real shared memory buffer.

Here's a fix.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachments:

lopaque-ref-after-pfree-fix.patchtext/x-patch; name=lopaque-ref-after-pfree-fix.patchDownload
Index: src/backend/access/nbtree/nbtinsert.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/nbtree/nbtinsert.c,v
retrieving revision 1.150
diff -c -r1.150 nbtinsert.c
*** src/backend/access/nbtree/nbtinsert.c	8 Feb 2007 05:05:53 -0000	1.150
--- src/backend/access/nbtree/nbtinsert.c	8 Feb 2007 09:36:05 -0000
***************
*** 953,959 ****
  		xlrec.rightsib = BufferGetBlockNumber(rbuf);
  		xlrec.firstright = firstright;
  		xlrec.rnext = ropaque->btpo_next;
! 		xlrec.level = lopaque->btpo.level;
  
  		rdata[0].data = (char *) &xlrec;
  		rdata[0].len = SizeOfBtreeSplit;
--- 953,959 ----
  		xlrec.rightsib = BufferGetBlockNumber(rbuf);
  		xlrec.firstright = firstright;
  		xlrec.rnext = ropaque->btpo_next;
! 		xlrec.level = ropaque->btpo.level;
  
  		rdata[0].data = (char *) &xlrec;
  		rdata[0].len = SizeOfBtreeSplit;
***************
*** 962,968 ****
  		lastrdata = &rdata[0];
  
  		/* Log downlink on non-leaf pages. */
! 		if (lopaque->btpo.level > 0)
  		{
  			lastrdata->next = lastrdata + 1;
  			lastrdata++;
--- 962,968 ----
  		lastrdata = &rdata[0];
  
  		/* Log downlink on non-leaf pages. */
! 		if (ropaque->btpo.level > 0)
  		{
  			lastrdata->next = lastrdata + 1;
  			lastrdata++;
***************
*** 1040,1047 ****
  
  		recptr = XLogInsert(RM_BTREE_ID, xlinfo, rdata);
  
! 		PageSetLSN(leftpage, recptr);
! 		PageSetTLI(leftpage, ThisTimeLineID);
  		PageSetLSN(rightpage, recptr);
  		PageSetTLI(rightpage, ThisTimeLineID);
  		if (!P_RIGHTMOST(ropaque))
--- 1040,1047 ----
  
  		recptr = XLogInsert(RM_BTREE_ID, xlinfo, rdata);
  
! 		PageSetLSN(origpage, recptr);
! 		PageSetTLI(origpage, ThisTimeLineID);
  		PageSetLSN(rightpage, recptr);
  		PageSetTLI(rightpage, ThisTimeLineID);
  		if (!P_RIGHTMOST(ropaque))
#5Stefan Kaltenbrunner
stefan@kaltenbrunner.cc
In reply to: Heikki Linnakangas (#4)
Re: Re: [COMMITTERS] pgsql: Reduce WAL activity for page splits: > Currently, an index split

Heikki Linnakangas wrote:

Stefan Kaltenbrunner wrote:

Bruce Momjian wrote:

Log Message:
-----------
Reduce WAL activity for page splits:

Currently, an index split writes all the data on the split page to
WAL. That's a lot of WAL traffic. The tuples that are copied to the
right page need to be WAL logged, but the tuples that stay on the
original page don't.

this patch killed both of my OpenBSD buildfarm members

lopaque was referenced after pfreeing the temp page it pointed to. Also
later in the function the LSN of the left page was set, but again using
a pointer to the pfreed temp copy instead of the real shared memory buffer.

Here's a fix.

confirmed - with that patch -HEAD passes a full regression test run at
least on emu.

Stefan

#6Alvaro Herrera
alvherre@commandprompt.com
In reply to: Heikki Linnakangas (#4)
Re: Re: [COMMITTERS] pgsql: Reduce WAL activity for page splits: > Currently, an index split

Heikki Linnakangas wrote:

Stefan Kaltenbrunner wrote:

Bruce Momjian wrote:

Log Message:
-----------
Reduce WAL activity for page splits:

Currently, an index split writes all the data on the split page to
WAL. That's a lot of WAL traffic. The tuples that are copied to the
right page need to be WAL logged, but the tuples that stay on the
original page don't.

this patch killed both of my OpenBSD buildfarm members

Please note that zebra died for a different reason the second time (not
enough shared memory).

lopaque was referenced after pfreeing the temp page it pointed to. Also
later in the function the LSN of the left page was set, but again using
a pointer to the pfreed temp copy instead of the real shared memory buffer.

Applied, thanks.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#7Stefan Kaltenbrunner
stefan@kaltenbrunner.cc
In reply to: Alvaro Herrera (#6)
Re: Re: [COMMITTERS] pgsql: Reduce WAL activity for page splits: > Currently, an index split

Alvaro Herrera wrote:

Heikki Linnakangas wrote:

Stefan Kaltenbrunner wrote:

Bruce Momjian wrote:

Log Message:
-----------
Reduce WAL activity for page splits:

Currently, an index split writes all the data on the split page to
WAL. That's a lot of WAL traffic. The tuples that are copied to the
right page need to be WAL logged, but the tuples that stay on the
original page don't.

this patch killed both of my OpenBSD buildfarm members

Please note that zebra died for a different reason the second time (not
enough shared memory).

yeah that's because initdb is leaving the shm-segments there on a failure.
This resulted in the out-of-shm failure on the following run which I
have manually cleared now ...

Stefan