Deferring some AtStart* allocations?

Started by Andres Freundover 11 years ago24 messages
#1Andres Freund
andres@2ndquadrant.com

Hi,

In workloads with mostly simple statements, memory allocations are the
primary bottleneck. Some of the allocations are unlikely to be avoidable
without major work, but others seem superflous in many scenarios.

Why aren't we delaying allocations in e.g. AtStart_Inval(),
AfterTriggerBeginXact() to when the data structures are acutally used?
In most transactions neither will be?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#1)
Re: Deferring some AtStart* allocations?

Andres Freund <andres@2ndquadrant.com> writes:

Why aren't we delaying allocations in e.g. AtStart_Inval(),
AfterTriggerBeginXact() to when the data structures are acutally used?

Aren't we? Neither of those would be doing much work certainly.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Andres Freund
andres@2ndquadrant.com
In reply to: Tom Lane (#2)
Re: Deferring some AtStart* allocations?

On 2014-06-29 19:52:23 -0400, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

Why aren't we delaying allocations in e.g. AtStart_Inval(),
AfterTriggerBeginXact() to when the data structures are acutally used?

Aren't we? Neither of those would be doing much work certainly.

They are perhaps not doing much in absolute terms, but it's a fair share
of the processing overhead for simple statements. AfterTriggerBeginXact()
is called unconditionally from StartTransaction() and does three
MemoryContextAlloc()s. AtStart_Inval() one.
I think they should just be initialized whenever the memory is used?
Doesn't look too complicated to me.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#3)
Re: Deferring some AtStart* allocations?

Andres Freund <andres@2ndquadrant.com> writes:

On 2014-06-29 19:52:23 -0400, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

Why aren't we delaying allocations in e.g. AtStart_Inval(),
AfterTriggerBeginXact() to when the data structures are acutally used?

Aren't we? Neither of those would be doing much work certainly.

They are perhaps not doing much in absolute terms, but it's a fair share
of the processing overhead for simple statements. AfterTriggerBeginXact()
is called unconditionally from StartTransaction() and does three
MemoryContextAlloc()s. AtStart_Inval() one.
I think they should just be initialized whenever the memory is used?
Doesn't look too complicated to me.

Meh. Even "SELECT 1" is going to be doing *far* more pallocs than that to
get through raw parsing, parse analysis, planning, and execution startup.
If you can find a few hundred pallocs we can avoid in trivial queries,
it would get interesting; but I'll be astonished if saving 4 is measurable.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Andres Freund
andres@2ndquadrant.com
In reply to: Tom Lane (#4)
Re: Deferring some AtStart* allocations?

On 2014-06-29 21:12:49 -0400, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On 2014-06-29 19:52:23 -0400, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

Why aren't we delaying allocations in e.g. AtStart_Inval(),
AfterTriggerBeginXact() to when the data structures are acutally used?

Aren't we? Neither of those would be doing much work certainly.

They are perhaps not doing much in absolute terms, but it's a fair share
of the processing overhead for simple statements. AfterTriggerBeginXact()
is called unconditionally from StartTransaction() and does three
MemoryContextAlloc()s. AtStart_Inval() one.
I think they should just be initialized whenever the memory is used?
Doesn't look too complicated to me.

Meh. Even "SELECT 1" is going to be doing *far* more pallocs than that to
get through raw parsing, parse analysis, planning, and execution
startup.

The quick test I ran used prepared statements - there the number of
memory allocations is *much* lower...

If you can find a few hundred pallocs we can avoid in trivial queries,
it would get interesting; but I'll be astonished if saving 4 is measurable.

I only noticed it because it shows up in profiles. I doubt it'll even
remotely be noticeable without using prepared statements, but a lot of
people do use those.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#4)
1 attachment(s)
Re: Deferring some AtStart* allocations?

On Sun, Jun 29, 2014 at 9:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Meh. Even "SELECT 1" is going to be doing *far* more pallocs than that to
get through raw parsing, parse analysis, planning, and execution startup.
If you can find a few hundred pallocs we can avoid in trivial queries,
it would get interesting; but I'll be astonished if saving 4 is measurable.

I got nerd-sniped by this problem today, probably after staring at the
profiling data very similar to what led Andres to ask the question in
the first place. The gains are indeed not measurable on a
macrobenchmark, but I had to write the patch to figure that out, so
here it is.

Although the gain isn't a measurable percentage of total runtime, it
does appear to be a significant percentage of the palloc traffic on
trivial queries. I did 30-second SELECT-only pgbench runs at scale
factor 10, using prepared queries. According to perf, on unpatched
master, StartTransactionCommand accounts for 11.46% of the calls to
AllocSetAlloc. (I imagine this is by runtime, not by call count, but
it probably doesn't matter much either way.) But with this patch,
StartTransactionCommand's share drops to 4.43%. Most of that is
coming from AtStart_Inval(), which wouldn't be hard to fix either.

I'm inclined to think that tamping down palloc traffic is worthwhile
even if we can't directly measure the savings on a macrobenchmark.
AllocSetAlloc is often at or near the top of profiling results, but
there's rarely a single caller responsible for enough of those calls
for to make it worth optimizing. But that means that if we refuse to
take patches that save just a few pallocs, we're basically giving up
on ever improving anything in this area, and that doesn't seem like a
good idea either; it's only going to get slowly worse over time as we
add more features.

BTW, if anyone's curious what the biggest source of AllocSetAlloc
calls is on this workload, the answer is ExecInitIndexScan(), which
looks to be indirectly responsible for almost half the total (or just
over half, with the patch applied). That apparently calls a lot of
different things that allocate small amounts of memory, and it
accounts for nearly all of the AllocSetAlloc traffic that originates
from PortalStart(). I haven't poked around in there enough to know
whether there's anything worth optimizing, but given the degree to
which this patch shifts the profile, I bet the number that it takes to
make a material savings is more like a few dozen than a few hundred.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

postpone-trigger-setup.patchtext/x-patch; charset=US-ASCII; name=postpone-trigger-setup.patchDownload
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index f4c0ffa..1db0666 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -90,6 +90,7 @@ static void AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
 					  int event, bool row_trigger,
 					  HeapTuple oldtup, HeapTuple newtup,
 					  List *recheckIndexes, Bitmapset *modifiedCols);
+static void AfterTriggerEnlargeQueryState(void);
 
 
 /*
@@ -3203,9 +3204,7 @@ typedef struct AfterTriggersData
 	int			maxtransdepth;	/* allocated len of above arrays */
 } AfterTriggersData;
 
-typedef AfterTriggersData *AfterTriggers;
-
-static AfterTriggers afterTriggers;
+static AfterTriggersData afterTriggers;
 
 static void AfterTriggerExecute(AfterTriggerEvent event,
 					Relation rel, TriggerDesc *trigdesc,
@@ -3228,7 +3227,7 @@ GetCurrentFDWTuplestore()
 {
 	Tuplestorestate *ret;
 
-	ret = afterTriggers->fdw_tuplestores[afterTriggers->query_depth];
+	ret = afterTriggers.fdw_tuplestores[afterTriggers.query_depth];
 	if (ret == NULL)
 	{
 		MemoryContext oldcxt;
@@ -3255,7 +3254,7 @@ GetCurrentFDWTuplestore()
 		CurrentResourceOwner = saveResourceOwner;
 		MemoryContextSwitchTo(oldcxt);
 
-		afterTriggers->fdw_tuplestores[afterTriggers->query_depth] = ret;
+		afterTriggers.fdw_tuplestores[afterTriggers.query_depth] = ret;
 	}
 
 	return ret;
@@ -3271,7 +3270,7 @@ static bool
 afterTriggerCheckState(AfterTriggerShared evtshared)
 {
 	Oid			tgoid = evtshared->ats_tgoid;
-	SetConstraintState state = afterTriggers->state;
+	SetConstraintState state = afterTriggers.state;
 	int			i;
 
 	/*
@@ -3282,19 +3281,22 @@ afterTriggerCheckState(AfterTriggerShared evtshared)
 		return false;
 
 	/*
-	 * Check if SET CONSTRAINTS has been executed for this specific trigger.
+	 * If constraint state exists, SET CONSTRAINTS might have been executed
+	 * either for this trigger or for all triggers.
 	 */
-	for (i = 0; i < state->numstates; i++)
+	if (state != NULL)
 	{
-		if (state->trigstates[i].sct_tgoid == tgoid)
-			return state->trigstates[i].sct_tgisdeferred;
-	}
+		/* Check for SET CONSTRAINTS for this specific trigger. */
+		for (i = 0; i < state->numstates; i++)
+		{
+			if (state->trigstates[i].sct_tgoid == tgoid)
+				return state->trigstates[i].sct_tgisdeferred;
+		}
 
-	/*
-	 * Check if SET CONSTRAINTS ALL has been executed; if so use that.
-	 */
-	if (state->all_isset)
-		return state->all_isdeferred;
+		/* Check for SET CONSTRAINTS ALL. */
+		if (state->all_isset)
+			return state->all_isdeferred;
+	}
 
 	/*
 	 * Otherwise return the default state for the trigger.
@@ -3331,8 +3333,8 @@ afterTriggerAddEvent(AfterTriggerEventList *events,
 		Size		chunksize;
 
 		/* Create event context if we didn't already */
-		if (afterTriggers->event_cxt == NULL)
-			afterTriggers->event_cxt =
+		if (afterTriggers.event_cxt == NULL)
+			afterTriggers.event_cxt =
 				AllocSetContextCreate(TopTransactionContext,
 									  "AfterTriggerEvents",
 									  ALLOCSET_DEFAULT_MINSIZE,
@@ -3375,7 +3377,7 @@ afterTriggerAddEvent(AfterTriggerEventList *events,
 				chunksize /= 2; /* too many shared records */
 			chunksize = Min(chunksize, MAX_CHUNK_SIZE);
 		}
-		chunk = MemoryContextAlloc(afterTriggers->event_cxt, chunksize);
+		chunk = MemoryContextAlloc(afterTriggers.event_cxt, chunksize);
 		chunk->next = NULL;
 		chunk->freeptr = CHUNK_DATA_START(chunk);
 		chunk->endptr = chunk->endfree = (char *) chunk + chunksize;
@@ -3706,7 +3708,7 @@ afterTriggerMarkEvents(AfterTriggerEventList *events,
 				/*
 				 * Mark it as to be fired in this firing cycle.
 				 */
-				evtshared->ats_firing_id = afterTriggers->firing_counter;
+				evtshared->ats_firing_id = afterTriggers.firing_counter;
 				event->ate_flags |= AFTER_TRIGGER_IN_PROGRESS;
 				found = true;
 			}
@@ -3898,40 +3900,28 @@ afterTriggerInvokeEvents(AfterTriggerEventList *events,
 void
 AfterTriggerBeginXact(void)
 {
-	Assert(afterTriggers == NULL);
+	/*
+	 * Initialize after-trigger state structure to empty
+	 */
+	afterTriggers.firing_counter = (CommandId) 1;		/* mustn't be 0 */
+	afterTriggers.query_depth = -1;
 
 	/*
-	 * Build empty after-trigger state structure
+	 * Verify that there is no leftover state remaining.  If these assertions
+	 * trip, it means that AfterTriggerEndXact wasn't called or didn't clean
+	 * up properly.
 	 */
-	afterTriggers = (AfterTriggers)
-		MemoryContextAlloc(TopTransactionContext,
-						   sizeof(AfterTriggersData));
-
-	afterTriggers->firing_counter = (CommandId) 1;		/* mustn't be 0 */
-	afterTriggers->state = SetConstraintStateCreate(8);
-	afterTriggers->events.head = NULL;
-	afterTriggers->events.tail = NULL;
-	afterTriggers->events.tailfree = NULL;
-	afterTriggers->query_depth = -1;
-
-	/* We initialize the arrays to a reasonable size */
-	afterTriggers->query_stack = (AfterTriggerEventList *)
-		MemoryContextAlloc(TopTransactionContext,
-						   8 * sizeof(AfterTriggerEventList));
-	afterTriggers->fdw_tuplestores = (Tuplestorestate **)
-		MemoryContextAllocZero(TopTransactionContext,
-							   8 * sizeof(Tuplestorestate *));
-	afterTriggers->maxquerydepth = 8;
-
-	/* Context for events is created only when needed */
-	afterTriggers->event_cxt = NULL;
-
-	/* Subtransaction stack is empty until/unless needed */
-	afterTriggers->state_stack = NULL;
-	afterTriggers->events_stack = NULL;
-	afterTriggers->depth_stack = NULL;
-	afterTriggers->firing_stack = NULL;
-	afterTriggers->maxtransdepth = 0;
+	Assert(afterTriggers.state == NULL);
+	Assert(afterTriggers.query_stack == NULL);
+	Assert(afterTriggers.fdw_tuplestores == NULL);
+	Assert(afterTriggers.maxquerydepth == 0);
+	Assert(afterTriggers.event_cxt == NULL);
+	Assert(afterTriggers.events.head == NULL);
+	Assert(afterTriggers.state_stack == NULL);
+	Assert(afterTriggers.events_stack == NULL);
+	Assert(afterTriggers.depth_stack == NULL);
+	Assert(afterTriggers.firing_stack == NULL);
+	Assert(afterTriggers.maxtransdepth == 0);
 }
 
 
@@ -3939,48 +3929,15 @@ AfterTriggerBeginXact(void)
  * AfterTriggerBeginQuery()
  *
  *	Called just before we start processing a single query within a
- *	transaction (or subtransaction).  Set up to record AFTER trigger
- *	events queued by the query.  Note that it is allowed to have
- *	nested queries within a (sub)transaction.
+ *	transaction (or subtransaction).  Most of the real work gets deferred
+ *	until somebody actually tries to queue a trigger event.
  * ----------
  */
 void
 AfterTriggerBeginQuery(void)
 {
-	AfterTriggerEventList *events;
-
-	/* Must be inside a transaction */
-	Assert(afterTriggers != NULL);
-
 	/* Increase the query stack depth */
-	afterTriggers->query_depth++;
-
-	/*
-	 * Allocate more space in the query stack if needed.
-	 */
-	if (afterTriggers->query_depth >= afterTriggers->maxquerydepth)
-	{
-		/* repalloc will keep the stack in the same context */
-		int			old_alloc = afterTriggers->maxquerydepth;
-		int			new_alloc = old_alloc * 2;
-
-		afterTriggers->query_stack = (AfterTriggerEventList *)
-			repalloc(afterTriggers->query_stack,
-					 new_alloc * sizeof(AfterTriggerEventList));
-		afterTriggers->fdw_tuplestores = (Tuplestorestate **)
-			repalloc(afterTriggers->fdw_tuplestores,
-					 new_alloc * sizeof(Tuplestorestate *));
-		/* Clear newly-allocated slots for subsequent lazy initialization. */
-		memset(afterTriggers->fdw_tuplestores + old_alloc,
-			   0, (new_alloc - old_alloc) * sizeof(Tuplestorestate *));
-		afterTriggers->maxquerydepth = new_alloc;
-	}
-
-	/* Initialize this query's list to empty */
-	events = &afterTriggers->query_stack[afterTriggers->query_depth];
-	events->head = NULL;
-	events->tail = NULL;
-	events->tailfree = NULL;
+	afterTriggers.query_depth++;
 }
 
 
@@ -4002,11 +3959,18 @@ AfterTriggerEndQuery(EState *estate)
 	AfterTriggerEventList *events;
 	Tuplestorestate *fdw_tuplestore;
 
-	/* Must be inside a transaction */
-	Assert(afterTriggers != NULL);
-
 	/* Must be inside a query, too */
-	Assert(afterTriggers->query_depth >= 0);
+	Assert(afterTriggers.query_depth >= 0);
+
+	/*
+	 * If we never even got as far as initializing the event stack, there
+	 * certainly won't be any events, so exit quickly.
+	 */
+	if (afterTriggers.query_depth >= afterTriggers.maxquerydepth)
+	{
+		afterTriggers.query_depth--;
+		return;
+	}
 
 	/*
 	 * Process all immediate-mode triggers queued by the query, and move the
@@ -4032,10 +3996,10 @@ AfterTriggerEndQuery(EState *estate)
 	 */
 	for (;;)
 	{
-		events = &afterTriggers->query_stack[afterTriggers->query_depth];
-		if (afterTriggerMarkEvents(events, &afterTriggers->events, true))
+		events = &afterTriggers.query_stack[afterTriggers.query_depth];
+		if (afterTriggerMarkEvents(events, &afterTriggers.events, true))
 		{
-			CommandId	firing_id = afterTriggers->firing_counter++;
+			CommandId	firing_id = afterTriggers.firing_counter++;
 
 			/* OK to delete the immediate events after processing them */
 			if (afterTriggerInvokeEvents(events, firing_id, estate, true))
@@ -4046,15 +4010,15 @@ AfterTriggerEndQuery(EState *estate)
 	}
 
 	/* Release query-local storage for events, including tuplestore if any */
-	fdw_tuplestore = afterTriggers->fdw_tuplestores[afterTriggers->query_depth];
+	fdw_tuplestore = afterTriggers.fdw_tuplestores[afterTriggers.query_depth];
 	if (fdw_tuplestore)
 	{
 		tuplestore_end(fdw_tuplestore);
-		afterTriggers->fdw_tuplestores[afterTriggers->query_depth] = NULL;
+		afterTriggers.fdw_tuplestores[afterTriggers.query_depth] = NULL;
 	}
-	afterTriggerFreeEventList(&afterTriggers->query_stack[afterTriggers->query_depth]);
+	afterTriggerFreeEventList(&afterTriggers.query_stack[afterTriggers.query_depth]);
 
-	afterTriggers->query_depth--;
+	afterTriggers.query_depth--;
 }
 
 
@@ -4075,18 +4039,15 @@ AfterTriggerFireDeferred(void)
 	AfterTriggerEventList *events;
 	bool		snap_pushed = false;
 
-	/* Must be inside a transaction */
-	Assert(afterTriggers != NULL);
-
-	/* ... but not inside a query */
-	Assert(afterTriggers->query_depth == -1);
+	/* Must not be inside a query */
+	Assert(afterTriggers.query_depth == -1);
 
 	/*
 	 * If there are any triggers to fire, make sure we have set a snapshot for
 	 * them to use.  (Since PortalRunUtility doesn't set a snap for COMMIT, we
 	 * can't assume ActiveSnapshot is valid on entry.)
 	 */
-	events = &afterTriggers->events;
+	events = &afterTriggers.events;
 	if (events->head != NULL)
 	{
 		PushActiveSnapshot(GetTransactionSnapshot());
@@ -4099,7 +4060,7 @@ AfterTriggerFireDeferred(void)
 	 */
 	while (afterTriggerMarkEvents(events, NULL, false))
 	{
-		CommandId	firing_id = afterTriggers->firing_counter++;
+		CommandId	firing_id = afterTriggers.firing_counter++;
 
 		if (afterTriggerInvokeEvents(events, firing_id, NULL, true))
 			break;				/* all fired */
@@ -4132,7 +4093,7 @@ void
 AfterTriggerEndXact(bool isCommit)
 {
 	/*
-	 * Forget everything we know about AFTER triggers.
+	 * Forget the pending-events list.
 	 *
 	 * Since all the info is in TopTransactionContext or children thereof, we
 	 * don't really need to do anything to reclaim memory.  However, the
@@ -4140,10 +4101,39 @@ AfterTriggerEndXact(bool isCommit)
 	 * soon as possible --- especially if we are aborting because we ran out
 	 * of memory for the list!
 	 */
-	if (afterTriggers && afterTriggers->event_cxt)
-		MemoryContextDelete(afterTriggers->event_cxt);
+	if (afterTriggers.event_cxt)
+	{
+		MemoryContextDelete(afterTriggers.event_cxt);
+		afterTriggers.event_cxt = NULL;
+		afterTriggers.events.head = NULL;
+		afterTriggers.events.tail = NULL;
+		afterTriggers.events.tailfree = NULL;
+	}
+
+	/*
+	 * Forget any subtransaction state as well.  Since this can't be very
+	 * large, we let the eventual reset of TopTransactionContext free the
+	 * memory instead of doing it here.
+	 */
+	afterTriggers.state_stack = NULL;
+	afterTriggers.events_stack = NULL;
+	afterTriggers.depth_stack = NULL;
+	afterTriggers.firing_stack = NULL;
+	afterTriggers.maxtransdepth = 0;
+
+
+	/*
+	 * Forget the query stack and constrant-related state information.  As
+	 * with the subtransaction state information, we don't bother freeing the
+	 * memory here.
+	 */
+	afterTriggers.query_stack = NULL;
+	afterTriggers.fdw_tuplestores = NULL;
+	afterTriggers.maxquerydepth = 0;
+	afterTriggers.state = NULL;
 
-	afterTriggers = NULL;
+	/* No more afterTriggers manipulation until next transaction starts. */
+	afterTriggers.query_depth = -1;
 }
 
 /*
@@ -4157,56 +4147,49 @@ AfterTriggerBeginSubXact(void)
 	int			my_level = GetCurrentTransactionNestLevel();
 
 	/*
-	 * Ignore call if the transaction is in aborted state.  (Probably
-	 * shouldn't happen?)
-	 */
-	if (afterTriggers == NULL)
-		return;
-
-	/*
 	 * Allocate more space in the stacks if needed.  (Note: because the
 	 * minimum nest level of a subtransaction is 2, we waste the first couple
 	 * entries of each array; not worth the notational effort to avoid it.)
 	 */
-	while (my_level >= afterTriggers->maxtransdepth)
+	while (my_level >= afterTriggers.maxtransdepth)
 	{
-		if (afterTriggers->maxtransdepth == 0)
+		if (afterTriggers.maxtransdepth == 0)
 		{
 			MemoryContext old_cxt;
 
 			old_cxt = MemoryContextSwitchTo(TopTransactionContext);
 
 #define DEFTRIG_INITALLOC 8
-			afterTriggers->state_stack = (SetConstraintState *)
+			afterTriggers.state_stack = (SetConstraintState *)
 				palloc(DEFTRIG_INITALLOC * sizeof(SetConstraintState));
-			afterTriggers->events_stack = (AfterTriggerEventList *)
+			afterTriggers.events_stack = (AfterTriggerEventList *)
 				palloc(DEFTRIG_INITALLOC * sizeof(AfterTriggerEventList));
-			afterTriggers->depth_stack = (int *)
+			afterTriggers.depth_stack = (int *)
 				palloc(DEFTRIG_INITALLOC * sizeof(int));
-			afterTriggers->firing_stack = (CommandId *)
+			afterTriggers.firing_stack = (CommandId *)
 				palloc(DEFTRIG_INITALLOC * sizeof(CommandId));
-			afterTriggers->maxtransdepth = DEFTRIG_INITALLOC;
+			afterTriggers.maxtransdepth = DEFTRIG_INITALLOC;
 
 			MemoryContextSwitchTo(old_cxt);
 		}
 		else
 		{
 			/* repalloc will keep the stacks in the same context */
-			int			new_alloc = afterTriggers->maxtransdepth * 2;
+			int			new_alloc = afterTriggers.maxtransdepth * 2;
 
-			afterTriggers->state_stack = (SetConstraintState *)
-				repalloc(afterTriggers->state_stack,
+			afterTriggers.state_stack = (SetConstraintState *)
+				repalloc(afterTriggers.state_stack,
 						 new_alloc * sizeof(SetConstraintState));
-			afterTriggers->events_stack = (AfterTriggerEventList *)
-				repalloc(afterTriggers->events_stack,
+			afterTriggers.events_stack = (AfterTriggerEventList *)
+				repalloc(afterTriggers.events_stack,
 						 new_alloc * sizeof(AfterTriggerEventList));
-			afterTriggers->depth_stack = (int *)
-				repalloc(afterTriggers->depth_stack,
+			afterTriggers.depth_stack = (int *)
+				repalloc(afterTriggers.depth_stack,
 						 new_alloc * sizeof(int));
-			afterTriggers->firing_stack = (CommandId *)
-				repalloc(afterTriggers->firing_stack,
+			afterTriggers.firing_stack = (CommandId *)
+				repalloc(afterTriggers.firing_stack,
 						 new_alloc * sizeof(CommandId));
-			afterTriggers->maxtransdepth = new_alloc;
+			afterTriggers.maxtransdepth = new_alloc;
 		}
 	}
 
@@ -4215,10 +4198,10 @@ AfterTriggerBeginSubXact(void)
 	 * is not saved until/unless changed.  Likewise, we don't make a
 	 * per-subtransaction event context until needed.
 	 */
-	afterTriggers->state_stack[my_level] = NULL;
-	afterTriggers->events_stack[my_level] = afterTriggers->events;
-	afterTriggers->depth_stack[my_level] = afterTriggers->query_depth;
-	afterTriggers->firing_stack[my_level] = afterTriggers->firing_counter;
+	afterTriggers.state_stack[my_level] = NULL;
+	afterTriggers.events_stack[my_level] = afterTriggers.events;
+	afterTriggers.depth_stack[my_level] = afterTriggers.query_depth;
+	afterTriggers.firing_stack[my_level] = afterTriggers.firing_counter;
 }
 
 /*
@@ -4236,26 +4219,19 @@ AfterTriggerEndSubXact(bool isCommit)
 	CommandId	subxact_firing_id;
 
 	/*
-	 * Ignore call if the transaction is in aborted state.  (Probably
-	 * unneeded)
-	 */
-	if (afterTriggers == NULL)
-		return;
-
-	/*
 	 * Pop the prior state if needed.
 	 */
 	if (isCommit)
 	{
-		Assert(my_level < afterTriggers->maxtransdepth);
+		Assert(my_level < afterTriggers.maxtransdepth);
 		/* If we saved a prior state, we don't need it anymore */
-		state = afterTriggers->state_stack[my_level];
+		state = afterTriggers.state_stack[my_level];
 		if (state != NULL)
 			pfree(state);
 		/* this avoids double pfree if error later: */
-		afterTriggers->state_stack[my_level] = NULL;
-		Assert(afterTriggers->query_depth ==
-			   afterTriggers->depth_stack[my_level]);
+		afterTriggers.state_stack[my_level] = NULL;
+		Assert(afterTriggers.query_depth ==
+			   afterTriggers.depth_stack[my_level]);
 	}
 	else
 	{
@@ -4264,7 +4240,7 @@ AfterTriggerEndSubXact(bool isCommit)
 		 * AfterTriggerBeginSubXact, in which case we mustn't risk touching
 		 * stack levels that aren't there.
 		 */
-		if (my_level >= afterTriggers->maxtransdepth)
+		if (my_level >= afterTriggers.maxtransdepth)
 			return;
 
 		/*
@@ -4273,42 +4249,46 @@ AfterTriggerEndSubXact(bool isCommit)
 		 * subtransaction will not add events to query levels started in a
 		 * earlier transaction state.
 		 */
-		while (afterTriggers->query_depth > afterTriggers->depth_stack[my_level])
+		while (afterTriggers.query_depth > afterTriggers.depth_stack[my_level])
 		{
-			Tuplestorestate *ts;
-
-			ts = afterTriggers->fdw_tuplestores[afterTriggers->query_depth];
-			if (ts)
+			if (afterTriggers.query_depth < afterTriggers.maxquerydepth)
 			{
-				tuplestore_end(ts);
-				afterTriggers->fdw_tuplestores[afterTriggers->query_depth] = NULL;
+				Tuplestorestate *ts;
+
+				ts = afterTriggers.fdw_tuplestores[afterTriggers.query_depth];
+				if (ts)
+				{
+					tuplestore_end(ts);
+					afterTriggers.fdw_tuplestores[afterTriggers.query_depth] = NULL;
+				}
+
+				afterTriggerFreeEventList(&afterTriggers.query_stack[afterTriggers.query_depth]);
 			}
 
-			afterTriggerFreeEventList(&afterTriggers->query_stack[afterTriggers->query_depth]);
-			afterTriggers->query_depth--;
+			afterTriggers.query_depth--;
 		}
-		Assert(afterTriggers->query_depth ==
-			   afterTriggers->depth_stack[my_level]);
+		Assert(afterTriggers.query_depth ==
+			   afterTriggers.depth_stack[my_level]);
 
 		/*
 		 * Restore the global deferred-event list to its former length,
 		 * discarding any events queued by the subxact.
 		 */
-		afterTriggerRestoreEventList(&afterTriggers->events,
-									 &afterTriggers->events_stack[my_level]);
+		afterTriggerRestoreEventList(&afterTriggers.events,
+									 &afterTriggers.events_stack[my_level]);
 
 		/*
 		 * Restore the trigger state.  If the saved state is NULL, then this
 		 * subxact didn't save it, so it doesn't need restoring.
 		 */
-		state = afterTriggers->state_stack[my_level];
+		state = afterTriggers.state_stack[my_level];
 		if (state != NULL)
 		{
-			pfree(afterTriggers->state);
-			afterTriggers->state = state;
+			pfree(afterTriggers.state);
+			afterTriggers.state = state;
 		}
 		/* this avoids double pfree if error later: */
-		afterTriggers->state_stack[my_level] = NULL;
+		afterTriggers.state_stack[my_level] = NULL;
 
 		/*
 		 * Scan for any remaining deferred events that were marked DONE or IN
@@ -4318,8 +4298,8 @@ AfterTriggerEndSubXact(bool isCommit)
 		 * (This essentially assumes that the current subxact includes all
 		 * subxacts started after it.)
 		 */
-		subxact_firing_id = afterTriggers->firing_stack[my_level];
-		for_each_event_chunk(event, chunk, afterTriggers->events)
+		subxact_firing_id = afterTriggers.firing_stack[my_level];
+		for_each_event_chunk(event, chunk, afterTriggers.events)
 		{
 			AfterTriggerShared evtshared = GetTriggerSharedData(event);
 
@@ -4334,6 +4314,66 @@ AfterTriggerEndSubXact(bool isCommit)
 	}
 }
 
+/* ----------
+ * AfterTriggerEnlargeQueryState()
+ *
+ *	Prepare the necessary state so that we can record AFTER trigger events
+ *	queued by a query.  It is allowed to have nested queries within a
+ *	(sub)transaction, so we need to have separate state for each query
+ *	nesting level.
+ * ----------
+ */
+static void
+AfterTriggerEnlargeQueryState(void)
+{
+	int		init_depth = afterTriggers.maxquerydepth;
+
+	Assert(afterTriggers.query_depth >= afterTriggers.maxquerydepth);
+
+	if (afterTriggers.maxquerydepth == 0)
+	{
+		int			new_alloc = Max(afterTriggers.query_depth, 8);
+
+		afterTriggers.query_stack = (AfterTriggerEventList *)
+			MemoryContextAlloc(TopTransactionContext,
+							   new_alloc * sizeof(AfterTriggerEventList));
+		afterTriggers.fdw_tuplestores = (Tuplestorestate **)
+			MemoryContextAllocZero(TopTransactionContext,
+								   new_alloc * sizeof(Tuplestorestate *));
+		afterTriggers.maxquerydepth = new_alloc;
+	}
+	else
+	{
+		/* repalloc will keep the stack in the same context */
+		int			old_alloc = afterTriggers.maxquerydepth;
+		int			new_alloc = Max(afterTriggers.query_depth, old_alloc * 2);
+
+		afterTriggers.query_stack = (AfterTriggerEventList *)
+			repalloc(afterTriggers.query_stack,
+					 new_alloc * sizeof(AfterTriggerEventList));
+		afterTriggers.fdw_tuplestores = (Tuplestorestate **)
+			repalloc(afterTriggers.fdw_tuplestores,
+					 new_alloc * sizeof(Tuplestorestate *));
+		/* Clear newly-allocated slots for subsequent lazy initialization. */
+		memset(afterTriggers.fdw_tuplestores + old_alloc,
+			   0, (new_alloc - old_alloc) * sizeof(Tuplestorestate *));
+		afterTriggers.maxquerydepth = new_alloc;
+	}
+
+	/* Initialize new query lists to empty */
+	while (init_depth < afterTriggers.maxquerydepth)
+	{
+		AfterTriggerEventList *events;
+
+		events = &afterTriggers.query_stack[init_depth];
+		events->head = NULL;
+		events->tail = NULL;
+		events->tailfree = NULL;
+
+		++init_depth;
+	}
+}
+
 /*
  * Create an empty SetConstraintState with room for numalloc trigstates
  */
@@ -4417,21 +4457,19 @@ AfterTriggerSetState(ConstraintsSetStmt *stmt)
 {
 	int			my_level = GetCurrentTransactionNestLevel();
 
-	/*
-	 * Ignore call if we aren't in a transaction.  (Shouldn't happen?)
-	 */
-	if (afterTriggers == NULL)
-		return;
+	/* If we haven't already done so, initialize our state. */
+	if (afterTriggers.state == NULL)
+		afterTriggers.state = SetConstraintStateCreate(8);
 
 	/*
 	 * If in a subtransaction, and we didn't save the current state already,
 	 * save it so it can be restored if the subtransaction aborts.
 	 */
 	if (my_level > 1 &&
-		afterTriggers->state_stack[my_level] == NULL)
+		afterTriggers.state_stack[my_level] == NULL)
 	{
-		afterTriggers->state_stack[my_level] =
-			SetConstraintStateCopy(afterTriggers->state);
+		afterTriggers.state_stack[my_level] =
+			SetConstraintStateCopy(afterTriggers.state);
 	}
 
 	/*
@@ -4442,13 +4480,13 @@ AfterTriggerSetState(ConstraintsSetStmt *stmt)
 		/*
 		 * Forget any previous SET CONSTRAINTS commands in this transaction.
 		 */
-		afterTriggers->state->numstates = 0;
+		afterTriggers.state->numstates = 0;
 
 		/*
 		 * Set the per-transaction ALL state to known.
 		 */
-		afterTriggers->state->all_isset = true;
-		afterTriggers->state->all_isdeferred = stmt->deferred;
+		afterTriggers.state->all_isset = true;
+		afterTriggers.state->all_isdeferred = stmt->deferred;
 	}
 	else
 	{
@@ -4622,7 +4660,7 @@ AfterTriggerSetState(ConstraintsSetStmt *stmt)
 		foreach(lc, tgoidlist)
 		{
 			Oid			tgoid = lfirst_oid(lc);
-			SetConstraintState state = afterTriggers->state;
+			SetConstraintState state = afterTriggers.state;
 			bool		found = false;
 			int			i;
 
@@ -4637,7 +4675,7 @@ AfterTriggerSetState(ConstraintsSetStmt *stmt)
 			}
 			if (!found)
 			{
-				afterTriggers->state =
+				afterTriggers.state =
 					SetConstraintStateAddItem(state, tgoid, stmt->deferred);
 			}
 		}
@@ -4656,12 +4694,12 @@ AfterTriggerSetState(ConstraintsSetStmt *stmt)
 	 */
 	if (!stmt->deferred)
 	{
-		AfterTriggerEventList *events = &afterTriggers->events;
+		AfterTriggerEventList *events = &afterTriggers.events;
 		bool		snapshot_set = false;
 
 		while (afterTriggerMarkEvents(events, NULL, true))
 		{
-			CommandId	firing_id = afterTriggers->firing_counter++;
+			CommandId	firing_id = afterTriggers.firing_counter++;
 
 			/*
 			 * Make sure a snapshot has been established in case trigger
@@ -4715,12 +4753,8 @@ AfterTriggerPendingOnRel(Oid relid)
 	AfterTriggerEventChunk *chunk;
 	int			depth;
 
-	/* No-op if we aren't in a transaction.  (Shouldn't happen?) */
-	if (afterTriggers == NULL)
-		return false;
-
 	/* Scan queued events */
-	for_each_event_chunk(event, chunk, afterTriggers->events)
+	for_each_event_chunk(event, chunk, afterTriggers.events)
 	{
 		AfterTriggerShared evtshared = GetTriggerSharedData(event);
 
@@ -4741,9 +4775,9 @@ AfterTriggerPendingOnRel(Oid relid)
 	 * if TRUNCATE/etc is executed by a function or trigger within an updating
 	 * query on the same relation, which is pretty perverse, but let's check.
 	 */
-	for (depth = 0; depth <= afterTriggers->query_depth; depth++)
+	for (depth = 0; depth <= afterTriggers.query_depth; depth++)
 	{
-		for_each_event_chunk(event, chunk, afterTriggers->query_stack[depth])
+		for_each_event_chunk(event, chunk, afterTriggers.query_stack[depth])
 		{
 			AfterTriggerShared evtshared = GetTriggerSharedData(event);
 
@@ -4787,15 +4821,17 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
 	Tuplestorestate *fdw_tuplestore = NULL;
 
 	/*
-	 * Check state.  We use normal tests not Asserts because it is possible to
+	 * Check state.  We use a normal test not Assert because it is possible to
 	 * reach here in the wrong state given misconfigured RI triggers, in
 	 * particular deferring a cascade action trigger.
 	 */
-	if (afterTriggers == NULL)
-		elog(ERROR, "AfterTriggerSaveEvent() called outside of transaction");
-	if (afterTriggers->query_depth < 0)
+	if (afterTriggers.query_depth < 0)
 		elog(ERROR, "AfterTriggerSaveEvent() called outside of query");
 
+	/* Be sure we have enough space to record events at this query depth. */
+	if (afterTriggers.query_depth >= afterTriggers.maxquerydepth)
+		AfterTriggerEnlargeQueryState();
+
 	/*
 	 * Validate the event code and collect the associated tuple CTIDs.
 	 *
@@ -4959,7 +4995,7 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
 		new_shared.ats_relid = RelationGetRelid(rel);
 		new_shared.ats_firing_id = 0;
 
-		afterTriggerAddEvent(&afterTriggers->query_stack[afterTriggers->query_depth],
+		afterTriggerAddEvent(&afterTriggers.query_stack[afterTriggers.query_depth],
 							 &new_event, &new_shared);
 	}
 
#7Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#6)
Re: Deferring some AtStart* allocations?

On Wed, Oct 8, 2014 at 11:22 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sun, Jun 29, 2014 at 9:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Meh. Even "SELECT 1" is going to be doing *far* more pallocs than that

to

get through raw parsing, parse analysis, planning, and execution

startup.

If you can find a few hundred pallocs we can avoid in trivial queries,
it would get interesting; but I'll be astonished if saving 4 is

measurable.

I got nerd-sniped by this problem today, probably after staring at the
profiling data very similar to what led Andres to ask the question in
the first place. The gains are indeed not measurable on a
macrobenchmark, but I had to write the patch to figure that out, so
here it is.

Although the gain isn't a measurable percentage of total runtime, it
does appear to be a significant percentage of the palloc traffic on
trivial queries. I did 30-second SELECT-only pgbench runs at scale
factor 10, using prepared queries. According to perf, on unpatched
master, StartTransactionCommand accounts for 11.46% of the calls to
AllocSetAlloc. (I imagine this is by runtime, not by call count, but
it probably doesn't matter much either way.) But with this patch,
StartTransactionCommand's share drops to 4.43%. Most of that is
coming from AtStart_Inval(), which wouldn't be hard to fix either.

I'm inclined to think that tamping down palloc traffic is worthwhile
even if we can't directly measure the savings on a macrobenchmark.
AllocSetAlloc is often at or near the top of profiling results, but
there's rarely a single caller responsible for enough of those calls
for to make it worth optimizing. But that means that if we refuse to
take patches that save just a few pallocs, we're basically giving up
on ever improving anything in this area, and that doesn't seem like a
good idea either; it's only going to get slowly worse over time as we
add more features.

Yeah, this makes sense, even otherwise also if somebody comes
up with a much larger patch which reduce few dozen of pallocs and shows
noticeable gain, then it will be difficult to quantify the patch based on
which particular pallocs gives improvements, as reducing few pallocs
might not give any noticeable gain.

BTW, if anyone's curious what the biggest source of AllocSetAlloc
calls is on this workload, the answer is ExecInitIndexScan(), which
looks to be indirectly responsible for almost half the total (or just
over half, with the patch applied). That apparently calls a lot of
different things that allocate small amounts of memory, and it
accounts for nearly all of the AllocSetAlloc traffic that originates
from PortalStart().

One way to dig into this problem is that we look at each individual
pallocs in PortalStart() path and try to see which one's can be
optimized, another way is that we introduce a concept of
Prepared Portals some thing similar to Prepared Statements, but for
Portal. This will not only avoid the pallocs in executor startup phase,
but can improve the performance by avoiding many other calls related
to executor startup. I understand that this will be a much bigger
project, however the gains can also be substantial for prepared
statements when all/most of the data fits in memory. It seems other
databases also have similar concepts for execution (example Oracle
uses Cursor_sharing/Shared cursors to achieve the same)

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#8Andres Freund
andres@2ndquadrant.com
In reply to: Robert Haas (#6)
Re: Deferring some AtStart* allocations?

On 2014-10-08 13:52:14 -0400, Robert Haas wrote:

On Sun, Jun 29, 2014 at 9:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Meh. Even "SELECT 1" is going to be doing *far* more pallocs than that to
get through raw parsing, parse analysis, planning, and execution startup.
If you can find a few hundred pallocs we can avoid in trivial queries,
it would get interesting; but I'll be astonished if saving 4 is measurable.

I got nerd-sniped by this problem today, probably after staring at the
profiling data very similar to what led Andres to ask the question in
the first place. The gains are indeed not measurable on a
macrobenchmark, but I had to write the patch to figure that out, so
here it is.

Although the gain isn't a measurable percentage of total runtime, it
does appear to be a significant percentage of the palloc traffic on
trivial queries. I did 30-second SELECT-only pgbench runs at scale
factor 10, using prepared queries. According to perf, on unpatched
master, StartTransactionCommand accounts for 11.46% of the calls to
AllocSetAlloc. (I imagine this is by runtime, not by call count, but
it probably doesn't matter much either way.) But with this patch,
StartTransactionCommand's share drops to 4.43%. Most of that is
coming from AtStart_Inval(), which wouldn't be hard to fix either.

Interesting - in my local profile AtStart_Inval() is more pronounced
than AfterTriggerBeginQuery(). I've quickly and in a ugly fashion hacked
AtStart_Inval() out of readonly queries ontop of your patch. Together
that yields a ~3.5% performance improvement in my trivial 'SELECT * FROM
tbl WHER pkey = xxx' testcase.

I'm inclined to think that tamping down palloc traffic is worthwhile
even if we can't directly measure the savings on a macrobenchmark.
AllocSetAlloc is often at or near the top of profiling results, but
there's rarely a single caller responsible for enough of those calls
for to make it worth optimizing. But that means that if we refuse to
take patches that save just a few pallocs, we're basically giving up
on ever improving anything in this area, and that doesn't seem like a
good idea either; it's only going to get slowly worse over time as we
add more features.

I think it depends a bit on the callsites. If its somewhere that nobody
will ever care, because it's a slowpath, then we shouldn't care
either. But that's not the case here, so I do think that makes sense.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#8)
Re: Deferring some AtStart* allocations?

On Thu, Oct 9, 2014 at 5:34 AM, Andres Freund <andres@2ndquadrant.com> wrote:

Interesting - in my local profile AtStart_Inval() is more pronounced
than AfterTriggerBeginQuery(). I've quickly and in a ugly fashion hacked
AtStart_Inval() out of readonly queries ontop of your patch. Together
that yields a ~3.5% performance improvement in my trivial 'SELECT * FROM
tbl WHER pkey = xxx' testcase.

Whoa. Now that's clearly significant. You didn't attach the patch;
was that inadvertent, or was it too ugly for that?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Andres Freund
andres@2ndquadrant.com
In reply to: Robert Haas (#9)
Re: Deferring some AtStart* allocations?

On 2014-10-09 08:18:18 -0400, Robert Haas wrote:

On Thu, Oct 9, 2014 at 5:34 AM, Andres Freund <andres@2ndquadrant.com> wrote:

Interesting - in my local profile AtStart_Inval() is more pronounced
than AfterTriggerBeginQuery(). I've quickly and in a ugly fashion hacked
AtStart_Inval() out of readonly queries ontop of your patch. Together
that yields a ~3.5% performance improvement in my trivial 'SELECT * FROM
tbl WHER pkey = xxx' testcase.

Whoa. Now that's clearly significant.

Well, my guess it'll be far less noticeable in less trivial
workloads. But it does seem worthwile.

You didn't attach the patch; was that inadvertent, or was it too ugly
for that?

Far, far too ugly ;). I just removed the AtStart() call from xact.c and
sprinkled it around relevant places instead ;)

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#10)
1 attachment(s)
Re: Deferring some AtStart* allocations?

On Thu, Oct 9, 2014 at 8:20 AM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-10-09 08:18:18 -0400, Robert Haas wrote:

On Thu, Oct 9, 2014 at 5:34 AM, Andres Freund <andres@2ndquadrant.com> wrote:

Interesting - in my local profile AtStart_Inval() is more pronounced
than AfterTriggerBeginQuery(). I've quickly and in a ugly fashion hacked
AtStart_Inval() out of readonly queries ontop of your patch. Together
that yields a ~3.5% performance improvement in my trivial 'SELECT * FROM
tbl WHER pkey = xxx' testcase.

Whoa. Now that's clearly significant.

Well, my guess it'll be far less noticeable in less trivial
workloads. But it does seem worthwile.

You didn't attach the patch; was that inadvertent, or was it too ugly
for that?

Far, far too ugly ;). I just removed the AtStart() call from xact.c and
sprinkled it around relevant places instead ;)

OK, here's an attempt at a real patch for that. I haven't perf-tested this.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

postpone-inval-setup.patchtext/x-patch; charset=US-ASCII; name=postpone-inval-setup.patchDownload
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 5b5d31b..651a5c4 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -1838,7 +1838,6 @@ StartTransaction(void)
 	 * initialize other subsystems for new transaction
 	 */
 	AtStart_GUC();
-	AtStart_Inval();
 	AtStart_Cache();
 	AfterTriggerBeginXact();
 
@@ -4151,7 +4150,6 @@ StartSubTransaction(void)
 	 */
 	AtSubStart_Memory();
 	AtSubStart_ResourceOwner();
-	AtSubStart_Inval();
 	AtSubStart_Notify();
 	AfterTriggerBeginSubXact();
 
diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
index a7a768e..6b6c88e 100644
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -693,19 +693,32 @@ AcceptInvalidationMessages(void)
 }
 
 /*
- * AtStart_Inval
- *		Initialize inval lists at start of a main transaction.
+ * PrepareInvalidationState
+ *		Initialize inval lists for the current (sub)transaction.
  */
-void
-AtStart_Inval(void)
+static void
+PrepareInvalidationState(void)
 {
-	Assert(transInvalInfo == NULL);
-	transInvalInfo = (TransInvalidationInfo *)
+	TransInvalidationInfo *myInfo;
+
+	if (transInvalInfo != NULL &&
+		transInvalInfo->my_level == GetCurrentTransactionNestLevel())
+		return;
+
+	myInfo = (TransInvalidationInfo *)
 		MemoryContextAllocZero(TopTransactionContext,
 							   sizeof(TransInvalidationInfo));
-	transInvalInfo->my_level = GetCurrentTransactionNestLevel();
-	SharedInvalidMessagesArray = NULL;
-	numSharedInvalidMessagesArray = 0;
+	myInfo->parent = transInvalInfo;
+	myInfo->my_level = GetCurrentTransactionNestLevel();
+
+	/*
+	 * If there's any previous entry, this one should be for a deeper
+	 * nesting level.
+	 */
+	Assert(transInvalInfo == NULL ||
+		myInfo->my_level > transInvalInfo->my_level);
+
+	transInvalInfo = myInfo;
 }
 
 /*
@@ -727,24 +740,6 @@ PostPrepare_Inval(void)
 }
 
 /*
- * AtSubStart_Inval
- *		Initialize inval lists at start of a subtransaction.
- */
-void
-AtSubStart_Inval(void)
-{
-	TransInvalidationInfo *myInfo;
-
-	Assert(transInvalInfo != NULL);
-	myInfo = (TransInvalidationInfo *)
-		MemoryContextAllocZero(TopTransactionContext,
-							   sizeof(TransInvalidationInfo));
-	myInfo->parent = transInvalInfo;
-	myInfo->my_level = GetCurrentTransactionNestLevel();
-	transInvalInfo = myInfo;
-}
-
-/*
  * Collect invalidation messages into SharedInvalidMessagesArray array.
  */
 static void
@@ -803,8 +798,16 @@ xactGetCommittedInvalidationMessages(SharedInvalidationMessage **msgs,
 {
 	MemoryContext oldcontext;
 
+	/* Quick exit if we haven't done anything with invalidation messages. */
+	if (transInvalInfo == NULL)
+	{
+		*RelcacheInitFileInval = false;
+		*msgs = NULL;
+		return 0;
+	}
+
 	/* Must be at top of stack */
-	Assert(transInvalInfo != NULL && transInvalInfo->parent == NULL);
+	Assert(transInvalInfo->my_level == 1 && transInvalInfo->parent == NULL);
 
 	/*
 	 * Relcache init file invalidation requires processing both before and
@@ -904,11 +907,15 @@ ProcessCommittedInvalidationMessages(SharedInvalidationMessage *msgs,
 void
 AtEOXact_Inval(bool isCommit)
 {
+	/* Quick exit if no messages */
+	if (transInvalInfo == NULL)
+		return;
+
+	/* Must be at top of stack */
+	Assert(transInvalInfo->my_level == 1 && transInvalInfo->parent == NULL);
+
 	if (isCommit)
 	{
-		/* Must be at top of stack */
-		Assert(transInvalInfo != NULL && transInvalInfo->parent == NULL);
-
 		/*
 		 * Relcache init file invalidation requires processing both before and
 		 * after we send the SI messages.  However, we need not do anything
@@ -926,17 +933,16 @@ AtEOXact_Inval(bool isCommit)
 		if (transInvalInfo->RelcacheInitFileInval)
 			RelationCacheInitFilePostInvalidate();
 	}
-	else if (transInvalInfo != NULL)
+	else
 	{
-		/* Must be at top of stack */
-		Assert(transInvalInfo->parent == NULL);
-
 		ProcessInvalidationMessages(&transInvalInfo->PriorCmdInvalidMsgs,
 									LocalExecuteInvalidationMessage);
 	}
 
 	/* Need not free anything explicitly */
 	transInvalInfo = NULL;
+	SharedInvalidMessagesArray = NULL;
+	numSharedInvalidMessagesArray = 0;
 }
 
 /*
@@ -960,18 +966,38 @@ AtEOXact_Inval(bool isCommit)
 void
 AtEOSubXact_Inval(bool isCommit)
 {
-	int			my_level = GetCurrentTransactionNestLevel();
+	int			my_level;
 	TransInvalidationInfo *myInfo = transInvalInfo;
 
-	if (isCommit)
+	/* Quick exit if no messages. */
+	if (myInfo == NULL)
+		return;
+
+	/* Also bail out quickly if messages are not for this level. */
+	my_level = GetCurrentTransactionNestLevel();
+	if (myInfo->my_level != my_level)
 	{
-		/* Must be at non-top of stack */
-		Assert(myInfo != NULL && myInfo->parent != NULL);
-		Assert(myInfo->my_level == my_level);
+		Assert(myInfo->my_level < my_level);
+		return;
+	}
 
+	if (isCommit)
+	{
 		/* If CurrentCmdInvalidMsgs still has anything, fix it */
 		CommandEndInvalidationMessages();
 
+		/*
+		 * We create invalidation stack entries lazily, so the parent might
+		 * not have one.  Instead of creating one, moving all the data over,
+		 * and then freeing our own, we can just adjust the level of our own
+		 * entry.
+		 */
+		if (myInfo->parent == NULL || myInfo->parent->my_level < my_level - 1)
+		{
+			myInfo->my_level--;
+			return;
+		}
+
 		/* Pass up my inval messages to parent */
 		AppendInvalidationMessages(&myInfo->parent->PriorCmdInvalidMsgs,
 								   &myInfo->PriorCmdInvalidMsgs);
@@ -986,11 +1012,8 @@ AtEOSubXact_Inval(bool isCommit)
 		/* Need not free anything else explicitly */
 		pfree(myInfo);
 	}
-	else if (myInfo != NULL && myInfo->my_level == my_level)
+	else
 	{
-		/* Must be at non-top of stack */
-		Assert(myInfo->parent != NULL);
-
 		ProcessInvalidationMessages(&myInfo->PriorCmdInvalidMsgs,
 									LocalExecuteInvalidationMessage);
 
@@ -1075,6 +1098,12 @@ CacheInvalidateHeapTuple(Relation relation,
 		return;
 
 	/*
+	 * If we're not prepared to queue invalidation messages for this
+	 * subtransaction level, get ready now.
+	 */
+	PrepareInvalidationState();
+
+	/*
 	 * First let the catcache do its thing
 	 */
 	tupleRelId = RelationGetRelid(relation);
@@ -1159,6 +1188,8 @@ CacheInvalidateCatalog(Oid catalogId)
 {
 	Oid			databaseId;
 
+	PrepareInvalidationState();
+
 	if (IsSharedRelation(catalogId))
 		databaseId = InvalidOid;
 	else
@@ -1182,6 +1213,8 @@ CacheInvalidateRelcache(Relation relation)
 	Oid			databaseId;
 	Oid			relationId;
 
+	PrepareInvalidationState();
+
 	relationId = RelationGetRelid(relation);
 	if (relation->rd_rel->relisshared)
 		databaseId = InvalidOid;
@@ -1202,6 +1235,8 @@ CacheInvalidateRelcacheByTuple(HeapTuple classTuple)
 	Oid			databaseId;
 	Oid			relationId;
 
+	PrepareInvalidationState();
+
 	relationId = HeapTupleGetOid(classTuple);
 	if (classtup->relisshared)
 		databaseId = InvalidOid;
@@ -1221,6 +1256,8 @@ CacheInvalidateRelcacheByRelid(Oid relid)
 {
 	HeapTuple	tup;
 
+	PrepareInvalidationState();
+
 	tup = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
 	if (!HeapTupleIsValid(tup))
 		elog(ERROR, "cache lookup failed for relation %u", relid);
diff --git a/src/include/utils/inval.h b/src/include/utils/inval.h
index 6156e02..9842b69 100644
--- a/src/include/utils/inval.h
+++ b/src/include/utils/inval.h
@@ -25,10 +25,6 @@ typedef void (*RelcacheCallbackFunction) (Datum arg, Oid relid);
 
 extern void AcceptInvalidationMessages(void);
 
-extern void AtStart_Inval(void);
-
-extern void AtSubStart_Inval(void);
-
 extern void AtEOXact_Inval(bool isCommit);
 
 extern void AtEOSubXact_Inval(bool isCommit);
#12Andres Freund
andres@2ndquadrant.com
In reply to: Robert Haas (#11)
Re: Deferring some AtStart* allocations?

On 2014-10-09 15:01:19 -0400, Robert Haas wrote:

On Thu, Oct 9, 2014 at 8:20 AM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-10-09 08:18:18 -0400, Robert Haas wrote:

On Thu, Oct 9, 2014 at 5:34 AM, Andres Freund <andres@2ndquadrant.com> wrote:

Interesting - in my local profile AtStart_Inval() is more pronounced
than AfterTriggerBeginQuery(). I've quickly and in a ugly fashion hacked
AtStart_Inval() out of readonly queries ontop of your patch. Together
that yields a ~3.5% performance improvement in my trivial 'SELECT * FROM
tbl WHER pkey = xxx' testcase.

Whoa. Now that's clearly significant.

Well, my guess it'll be far less noticeable in less trivial
workloads. But it does seem worthwile.

You didn't attach the patch; was that inadvertent, or was it too ugly
for that?

Far, far too ugly ;). I just removed the AtStart() call from xact.c and
sprinkled it around relevant places instead ;)

OK, here's an attempt at a real patch for that. I haven't perf-tested this.

Neato. With a really trivial SELECT:

before:
tps = 28150.794776 (excluding connections establishing)
after:
tps = 29978.767703 (excluding connections establishing)

slightly more meaningful:

before:
tps = 21272.400039 (including connections establishing)
after:
tps = 22290.703482 (excluding connections establishing)

So that's a noticeable win. Obviously it's going to be less for more
complicated stuff, but still...

I've not really looked at the patches though.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#12)
Re: Deferring some AtStart* allocations?

On Thu, Oct 9, 2014 at 3:53 PM, Andres Freund <andres@2ndquadrant.com> wrote:

OK, here's an attempt at a real patch for that. I haven't perf-tested this.

Neato. With a really trivial SELECT:

before:
tps = 28150.794776 (excluding connections establishing)
after:
tps = 29978.767703 (excluding connections establishing)

slightly more meaningful:

before:
tps = 21272.400039 (including connections establishing)
after:
tps = 22290.703482 (excluding connections establishing)

So that's a noticeable win. Obviously it's going to be less for more
complicated stuff, but still...

I've not really looked at the patches though.

Yeah, not bad at all for the amount of work involved. Want to
disclose the actual queries?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Andres Freund
andres@2ndquadrant.com
In reply to: Robert Haas (#13)
Re: Deferring some AtStart* allocations?

On 2014-10-09 17:02:02 -0400, Robert Haas wrote:

On Thu, Oct 9, 2014 at 3:53 PM, Andres Freund <andres@2ndquadrant.com> wrote:

OK, here's an attempt at a real patch for that. I haven't perf-tested this.

Neato. With a really trivial SELECT:

before:
tps = 28150.794776 (excluding connections establishing)
after:
tps = 29978.767703 (excluding connections establishing)

slightly more meaningful:

before:
tps = 21272.400039 (including connections establishing)
after:
tps = 22290.703482 (excluding connections establishing)

So that's a noticeable win. Obviously it's going to be less for more
complicated stuff, but still...

I've not really looked at the patches though.

Yeah, not bad at all for the amount of work involved. Want to
disclose the actual queries?

SELECT 1;
SELECT * FROM smalltable WHERE id = xxx;

The latter is something quite frequent in the real world, so it's not
all academic...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#11)
Re: Deferring some AtStart* allocations?

On 9 October 2014 20:01, Robert Haas <robertmhaas@gmail.com> wrote:

OK, here's an attempt at a real patch for that. I haven't perf-tested this.

Patch looks good to me. Surprised we didn't do that before.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Andres Freund
andres@2ndquadrant.com
In reply to: Robert Haas (#6)
Re: Deferring some AtStart* allocations?

On 2014-10-08 13:52:14 -0400, Robert Haas wrote:

On Sun, Jun 29, 2014 at 9:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Meh. Even "SELECT 1" is going to be doing *far* more pallocs than that to
get through raw parsing, parse analysis, planning, and execution startup.
If you can find a few hundred pallocs we can avoid in trivial queries,
it would get interesting; but I'll be astonished if saving 4 is measurable.

I got nerd-sniped by this problem today, probably after staring at the
profiling data very similar to what led Andres to ask the question in
the first place.

I've loolked through this patch and I'm happy with it. One could argue
that the current afterTriggers == NULL checks should be replaced with a
Assert ensuring we're inside the xact. But I think you're right in
removing them, and I don't think we actually need the asserts.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Andres Freund
andres@2ndquadrant.com
In reply to: Robert Haas (#11)
Re: Deferring some AtStart* allocations?

Hi,

On 2014-10-09 15:01:19 -0400, Robert Haas wrote:

/*
@@ -960,18 +966,38 @@ AtEOXact_Inval(bool isCommit)

...

+		/*
+		 * We create invalidation stack entries lazily, so the parent might
+		 * not have one.  Instead of creating one, moving all the data over,
+		 * and then freeing our own, we can just adjust the level of our own
+		 * entry.
+		 */
+		if (myInfo->parent == NULL || myInfo->parent->my_level < my_level - 1)
+		{
+			myInfo->my_level--;
+			return;
+		}
+

I think this bit might not be correct. What if the subxact one level up
aborts? Then it'll miss dealing with these invalidation entries. Or am I
misunderstanding something?

I like the patch, except the above potential issue.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#17)
Re: Deferring some AtStart* allocations?

On Tue, Oct 21, 2014 at 12:00 PM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-10-09 15:01:19 -0400, Robert Haas wrote:

/*
@@ -960,18 +966,38 @@ AtEOXact_Inval(bool isCommit)

...

+             /*
+              * We create invalidation stack entries lazily, so the parent might
+              * not have one.  Instead of creating one, moving all the data over,
+              * and then freeing our own, we can just adjust the level of our own
+              * entry.
+              */
+             if (myInfo->parent == NULL || myInfo->parent->my_level < my_level - 1)
+             {
+                     myInfo->my_level--;
+                     return;
+             }
+

I think this bit might not be correct. What if the subxact one level up
aborts? Then it'll miss dealing with these invalidation entries. Or am I
misunderstanding something?

One of us is. I think you're asking about a situation where we have a
transaction, and a subtransaction, and within that another
subtransaction. Only the innermost subtransaction has invalidation
messages. At the innermost level, we commit; the above code makes
those messages the responsibility of the outer subtransaction. If
that subtransaction abouts, AtEOSubXact_Inval() gets called again,
sees that it has messages (that it inherited from the innermost
subtransaction), and takes the exact same code-path that it would have
pre-patch.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Andres Freund
andres@2ndquadrant.com
In reply to: Robert Haas (#18)
Re: Deferring some AtStart* allocations?

On 2014-10-23 12:04:40 -0400, Robert Haas wrote:

On Tue, Oct 21, 2014 at 12:00 PM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-10-09 15:01:19 -0400, Robert Haas wrote:

/*
@@ -960,18 +966,38 @@ AtEOXact_Inval(bool isCommit)

...

+             /*
+              * We create invalidation stack entries lazily, so the parent might
+              * not have one.  Instead of creating one, moving all the data over,
+              * and then freeing our own, we can just adjust the level of our own
+              * entry.
+              */
+             if (myInfo->parent == NULL || myInfo->parent->my_level < my_level - 1)
+             {
+                     myInfo->my_level--;
+                     return;
+             }
+

I think this bit might not be correct. What if the subxact one level up
aborts? Then it'll miss dealing with these invalidation entries. Or am I
misunderstanding something?

One of us is. I think you're asking about a situation where we have a
transaction, and a subtransaction, and within that another
subtransaction. Only the innermost subtransaction has invalidation
messages. At the innermost level, we commit; the above code makes
those messages the responsibility of the outer subtransaction.

Exactly.

If
that subtransaction abouts, AtEOSubXact_Inval() gets called again,
sees that it has messages (that it inherited from the innermost
subtransaction), and takes the exact same code-path that it would have
pre-patch.

Consider what happens if the innermost transaction commits and the
second innermost one rolls back. AtEOSubXact_Inval() won't do anything
because it doesn't have any entries itself. Even though there's still
valid cache inval entries containing the innermost xact's version of the
catalog, now not valid anymore.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#19)
Re: Deferring some AtStart* allocations?

On Fri, Oct 24, 2014 at 9:17 AM, Andres Freund <andres@2ndquadrant.com> wrote:

If
that subtransaction abouts, AtEOSubXact_Inval() gets called again,
sees that it has messages (that it inherited from the innermost
subtransaction), and takes the exact same code-path that it would have
pre-patch.

Consider what happens if the innermost transaction commits and the
second innermost one rolls back. AtEOSubXact_Inval() won't do anything
because it doesn't have any entries itself.

This is the part I don't understand. After the innermost transaction
commits, it *does* have entries itself. This whole block is basically
just an optimization:

+               if (myInfo->parent == NULL || myInfo->parent->my_level
< my_level - 1)
+               {
+                       myInfo->my_level--;
+                       return;
+               }

If we removed that code, then we'd just do this instead:

/* Pass up my inval messages to parent */
AppendInvalidationMessages(&myInfo->parent->PriorCmdInvalidMsgs,
&myInfo->PriorCmdInvalidMsgs);

/* Pending relcache inval becomes parent's problem too */
if (myInfo->RelcacheInitFileInval)
myInfo->parent->RelcacheInitFileInval = true;

That should be basically equivalent. I mean, we need a bit of surgery
to ensure that the parent entry actually exists before we attach stuff
to it, but that's mechanical. The whole point here though is that,
pre-patch, the parent has an empty entry and we have one with stuff in
it. What I think happens in the current code is that we take all of
the stuff in our non-empty entry and move it into the parent's empty
entry, so that the parent ends up with an entry identical to ours but
with a level one less than we had. We could do that here too,
manufacturing the empty entry and then moving our stuff into it and
then deleting our original entry, but that seems silly; just change
the level and call it good.

IOW, I don't see how I'm *changing* the logic here at all. Why
doesn't whatever problem you're concerned about exist in the current
coding, too?

Even though there's still
valid cache inval entries containing the innermost xact's version of the
catalog, now not valid anymore.

This part doesn't make sense to me either. The invalidation entries
don't put data into the caches; they take data out. When we change
stuff, we generate invalidation messages. At end of statement, once
we've done CommandCounterIncrement(), we execute those invalidation
messages locally, so that the next cache reload the updated state. At
end of transaction, once we've committed, we send those invalidation
messages to the shared queue to be executed by everyone else, so that
they also see the updated state. If we abort a transaction or
sub-transaction, we re-execute those same invalidation messages so
that we flush the new state, forcing the next set of cache reloads to
again pull in the old state. The patch isn't changing - or not
intentionally anyway - anything about any of that logic.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Andres Freund
andres@2ndquadrant.com
In reply to: Robert Haas (#20)
Re: Deferring some AtStart* allocations?

On 2014-10-24 09:45:59 -0400, Robert Haas wrote:

On Fri, Oct 24, 2014 at 9:17 AM, Andres Freund <andres@2ndquadrant.com> wrote:

If
that subtransaction abouts, AtEOSubXact_Inval() gets called again,
sees that it has messages (that it inherited from the innermost
subtransaction), and takes the exact same code-path that it would have
pre-patch.

Consider what happens if the innermost transaction commits and the
second innermost one rolls back. AtEOSubXact_Inval() won't do anything
because it doesn't have any entries itself.

This is the part I don't understand. After the innermost transaction
commits, it *does* have entries itself.

Sure, otherwise there'd be no problem.

This whole block is basically just an optimization:

+               if (myInfo->parent == NULL || myInfo->parent->my_level
< my_level - 1)
+               {
+                       myInfo->my_level--;
+                       return;
+               }

If we removed that code, then we'd just do this instead:

/* Pass up my inval messages to parent */
AppendInvalidationMessages(&myInfo->parent->PriorCmdInvalidMsgs,
&myInfo->PriorCmdInvalidMsgs);

/* Pending relcache inval becomes parent's problem too */
if (myInfo->RelcacheInitFileInval)
myInfo->parent->RelcacheInitFileInval = true;

Ick. I somehow misimagined that you'd just append them one layer further
up. I obviously shouldn't review code during a conference.

Even though there's still
valid cache inval entries containing the innermost xact's version of the
catalog, now not valid anymore.

This part doesn't make sense to me either. The invalidation entries
don't put data into the caches; they take data out. When we change
stuff, we generate invalidation messages.

What I was thinking was that you'd append the messages to the layer one
level deeper than the parent. Then we'd missed the invalidations when
rolling back the intermediate xact. But since I was quite mistaken
above, this isn't a problem :)

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#21)
Re: Deferring some AtStart* allocations?

On Fri, Oct 24, 2014 at 10:10 AM, Andres Freund <andres@2ndquadrant.com> wrote:

What I was thinking was that you'd append the messages to the layer one
level deeper than the parent. Then we'd missed the invalidations when
rolling back the intermediate xact. But since I was quite mistaken
above, this isn't a problem :)

So, you happy with the patch now?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23Andres Freund
andres@2ndquadrant.com
In reply to: Robert Haas (#22)
Re: Deferring some AtStart* allocations?

On 2014-10-24 11:25:23 -0400, Robert Haas wrote:

On Fri, Oct 24, 2014 at 10:10 AM, Andres Freund <andres@2ndquadrant.com> wrote:

What I was thinking was that you'd append the messages to the layer one
level deeper than the parent. Then we'd missed the invalidations when
rolling back the intermediate xact. But since I was quite mistaken
above, this isn't a problem :)

So, you happy with the patch now?

Yes.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#23)
Re: Deferring some AtStart* allocations?

On Tue, Oct 28, 2014 at 10:16 AM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-10-24 11:25:23 -0400, Robert Haas wrote:

On Fri, Oct 24, 2014 at 10:10 AM, Andres Freund <andres@2ndquadrant.com> wrote:

What I was thinking was that you'd append the messages to the layer one
level deeper than the parent. Then we'd missed the invalidations when
rolling back the intermediate xact. But since I was quite mistaken
above, this isn't a problem :)

So, you happy with the patch now?

Yes.

Great. Committed. Thanks for the review.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers