pgsql: Fix an O(N^2) problem in foreign key references.

Started by Kevin Grittnerover 10 years ago16 messages
#1Kevin Grittner
kgrittn@postgresql.org

Fix an O(N^2) problem in foreign key references.

Commit 45ba424f improved foreign key lookups during bulk updates
when the FK value does not change. When restoring a schema dump
from a database with many (say 100,000) foreign keys, this cache
would grow very big and every ALTER TABLE command was causing an
InvalidateConstraintCacheCallBack(), which uses a sequential hash
table scan. This could cause a severe performance regression in
restoring a schema dump (including during pg_upgrade).

The patch uses a heuristic method of detecting when the hash table
should be destroyed and recreated.
InvalidateConstraintCacheCallBack() adds the current size of the
hash table to a counter. When that sum reaches 1,000,000, the hash
table is flushed. This fixes the regression without noticeable
harm to the bulk update use case.

Jan Wieck
Backpatch to 9.3 where the performance regression was introduced.

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/5ddc72887a012f6a8b85707ef27d85c274faf53d

Modified Files
--------------
src/backend/utils/adt/ri_triggers.c | 38 ++++++++++++++++++++++++++++++++---
1 file changed, 35 insertions(+), 3 deletions(-)

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#1)
Re: [COMMITTERS] pgsql: Fix an O(N^2) problem in foreign key references.

Kevin Grittner <kgrittn@postgresql.org> writes:

Fix an O(N^2) problem in foreign key references.

Judging from the buildfarm, this patch is broken under
CLOBBER_CACHE_ALWAYS. See friarbird's results in particular.
I might be too quick to finger this patch, but nothing else
lately has touched foreign-key behavior, and the foreign_key
test is where things are going south.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Jan Wieck
jan@wi3ck.info
In reply to: Tom Lane (#2)
Re: [COMMITTERS] pgsql: Fix an O(N^2) problem in foreign key references.

On 09/14/2015 09:56 AM, Tom Lane wrote:

Kevin Grittner <kgrittn@postgresql.org> writes:

Fix an O(N^2) problem in foreign key references.

Judging from the buildfarm, this patch is broken under
CLOBBER_CACHE_ALWAYS. See friarbird's results in particular.
I might be too quick to finger this patch, but nothing else
lately has touched foreign-key behavior, and the foreign_key
test is where things are going south.

I'm able to reproduce that failure with CLOBBER_CACHE_ALWAYS and it
definitely is caused by this commit. Do you want to back it out for the
time being? Kevin is on vacation right now.

Regards, Jan

--
Jan Wieck
Senior Software Engineer
http://slony.info

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jan Wieck (#3)
Re: [COMMITTERS] pgsql: Fix an O(N^2) problem in foreign key references.

Jan Wieck <jan@wi3ck.info> writes:

On 09/14/2015 09:56 AM, Tom Lane wrote:

Kevin Grittner <kgrittn@postgresql.org> writes:

Fix an O(N^2) problem in foreign key references.

Judging from the buildfarm, this patch is broken under
CLOBBER_CACHE_ALWAYS. See friarbird's results in particular.
I might be too quick to finger this patch, but nothing else
lately has touched foreign-key behavior, and the foreign_key
test is where things are going south.

I'm able to reproduce that failure with CLOBBER_CACHE_ALWAYS and it
definitely is caused by this commit. Do you want to back it out for the
time being? Kevin is on vacation right now.

I'll take a look today, and either fix it if I can find the problem
quickly or back it out.

(If anyone else is already looking at it, happy to defer to you...)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Jan Wieck
jan@wi3ck.info
In reply to: Tom Lane (#4)
Re: [COMMITTERS] pgsql: Fix an O(N^2) problem in foreign key references.

On 09/15/2015 09:51 AM, Tom Lane wrote:

Jan Wieck <jan@wi3ck.info> writes:

On 09/14/2015 09:56 AM, Tom Lane wrote:

Kevin Grittner <kgrittn@postgresql.org> writes:

Fix an O(N^2) problem in foreign key references.

Judging from the buildfarm, this patch is broken under
CLOBBER_CACHE_ALWAYS. See friarbird's results in particular.
I might be too quick to finger this patch, but nothing else
lately has touched foreign-key behavior, and the foreign_key
test is where things are going south.

I'm able to reproduce that failure with CLOBBER_CACHE_ALWAYS and it
definitely is caused by this commit. Do you want to back it out for the
time being? Kevin is on vacation right now.

I'll take a look today, and either fix it if I can find the problem
quickly or back it out.

(If anyone else is already looking at it, happy to defer to you...)

I am looking at it. Unfortunately I'm not getting any usable backtrace yet.

Regards, Jan

--
Jan Wieck
Senior Software Engineer
http://slony.info

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#4)
Re: [COMMITTERS] pgsql: Fix an O(N^2) problem in foreign key references.

I wrote:

Jan Wieck <jan@wi3ck.info> writes:

I'm able to reproduce that failure with CLOBBER_CACHE_ALWAYS and it
definitely is caused by this commit. Do you want to back it out for the
time being? Kevin is on vacation right now.

I'll take a look today, and either fix it if I can find the problem
quickly or back it out.

The answer is "back it out", because this patch is fundamentally and
irretrievably broken. You can't just clobber the hashtable like that,
because there are very possibly active uses of hashtable entries in
outer function call levels. In the particular case exhibited in
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=friarbird&amp;dt=2015-09-15%2004%3A20%3A00
namely
TRAP: FailedAssertion("!(riinfo->constraint_id == constraintOid)", File: "/pgbuild/root/HEAD/pgsql.build/../pgsql/src/backend/utils/adt/ri_triggers.c", Line: 2838)
what's evidently happened is that a cache flush occurred during the
syscache fetch at line 2828, and the patch chose that moment to destroy
the hashtable, in particular the entry already created at line 2817.
Hashtable resets occurring later in a particular RI trigger's execution
would cause other symptoms.

The pre-existing coding is safe, despite the lack of any reference
counting on hashtable entries, because the cache flush callback never
actually destroys any entries. It just cautiously marks them invalid,
which won't actually have any effect until the next
ri_LoadConstraintInfo() call on that constraint OID.

AFAICS, there's no way to make this work reliably without changing the
cache API somehow. Reference counts would make it possible to tell
whether it's safe to delete a particular entry, but they would be fairly
complicated to maintain (because of the need to make sure they get
decremented after an elog(ERROR)). Given that the entries are just flat
chunks of memory, another answer is to have cache lookups copy the entry
data into caller-provided storage rather than rely on the cache to stay
valid. Not sure offhand whether that would be unreasonably expensive;
sizeof(RI_ConstraintInfo) is about 600 bytes on my machine, which seems
like kind of a lot, but I guess it's not enormous. For that matter, if
we're gonna do it like that, it's not very clear that we need the
ri_constraint_cache at all, rather than just having ri_LoadConstraintInfo
fill the data from the pg_constraint syscache row every time.

In any case, this patch seems like a pretty ugly and brute-force way to
limit the cache size, since it destroys not only old and uninteresting
entries but also valid, up-to-date, recently used entries. So I'm not
very excited about thinking of some band-aid way to let this work; I don't
like the approach of destroying the entire hash table in the first place.

So I'm going to revert it and send it back for rework.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Jan Wieck
jan@wi3ck.info
In reply to: Tom Lane (#6)
Re: [COMMITTERS] pgsql: Fix an O(N^2) problem in foreign key references.

On 09/15/2015 10:58 AM, Tom Lane wrote:

I wrote:

Jan Wieck <jan@wi3ck.info> writes:

I'm able to reproduce that failure with CLOBBER_CACHE_ALWAYS and it
definitely is caused by this commit. Do you want to back it out for the
time being? Kevin is on vacation right now.

I'll take a look today, and either fix it if I can find the problem
quickly or back it out.

The answer is "back it out", because this patch is fundamentally and
irretrievably broken. You can't just clobber the hashtable like that,
because there are very possibly active uses of hashtable entries in
outer function call levels.

Ok.

Would it be appropriate to use (Un)RegisterXactCallback() in case of
detecting excessive sequential scanning of that cache and remove all
invalid entries from it at that time?

Regards, Jan

--
Jan Wieck
Senior Software Engineer
http://slony.info

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jan Wieck (#7)
Re: [COMMITTERS] pgsql: Fix an O(N^2) problem in foreign key references.

Jan Wieck <jan@wi3ck.info> writes:

Would it be appropriate to use (Un)RegisterXactCallback() in case of
detecting excessive sequential scanning of that cache and remove all
invalid entries from it at that time?

Don't see how unregistering the callback would help?

In any case, I'm -1 on this entire concept of "let's zero out the cache at
randomly chosen moments". That's far too hard to test the effects of, as
we've just seen. It needs to be more predictable, and I think it ought to
be more selective too.

One idea is to limit the cache to say 1000 entries, and get rid of the
least-used one when the threshold is exceeded. We could do that fairly
cheaply if we chain the cache entries together in a doubly-linked list
(see ilist.h) and move an entry to the front when referenced. For
reasonably large cache sizes, that would pretty much guarantee that you
weren't destroying an actively-used entry. However, it would still be
a bit hard to test, because you could not safely reduce the cache size
to just 1 as a test mechanism.

Another idea is that it's not the size of the cache that's the problem,
it's the cost of finding the entries that need to be invalidated.
So we could also consider adding list links that chain only the entries
that are currently marked valid. If the list gets too long, we could mark
them all invalid and thereby reset the list to nil. This doesn't do
anything for the cache's space consumption, but that wasn't what you were
worried about anyway was it?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Jan Wieck
jan@wi3ck.info
In reply to: Tom Lane (#8)
Re: [COMMITTERS] pgsql: Fix an O(N^2) problem in foreign key references.

On 09/15/2015 11:54 AM, Tom Lane wrote:

Jan Wieck <jan@wi3ck.info> writes:

Would it be appropriate to use (Un)RegisterXactCallback() in case of
detecting excessive sequential scanning of that cache and remove all
invalid entries from it at that time?

Another idea is that it's not the size of the cache that's the problem,
it's the cost of finding the entries that need to be invalidated.
So we could also consider adding list links that chain only the entries
that are currently marked valid. If the list gets too long, we could mark
them all invalid and thereby reset the list to nil. This doesn't do
anything for the cache's space consumption, but that wasn't what you were
worried about anyway was it?

That sounds like a workable solution to this edge case. I'll see how
that works.

Thanks, Jan

--
Jan Wieck
Senior Software Engineer
http://slony.info

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Jan Wieck
jan@wi3ck.info
In reply to: Jan Wieck (#9)
1 attachment(s)
Re: [COMMITTERS] pgsql: Fix an O(N^2) problem in foreign key references.

On 09/15/2015 12:02 PM, Jan Wieck wrote:

On 09/15/2015 11:54 AM, Tom Lane wrote:

Jan Wieck <jan@wi3ck.info> writes:

Would it be appropriate to use (Un)RegisterXactCallback() in case of
detecting excessive sequential scanning of that cache and remove all
invalid entries from it at that time?

Another idea is that it's not the size of the cache that's the problem,
it's the cost of finding the entries that need to be invalidated.
So we could also consider adding list links that chain only the entries
that are currently marked valid. If the list gets too long, we could mark
them all invalid and thereby reset the list to nil. This doesn't do
anything for the cache's space consumption, but that wasn't what you were
worried about anyway was it?

That sounds like a workable solution to this edge case. I'll see how
that works.

Attached is a complete rework of the fix from scratch, based on Tom's
suggestion.

The code now maintains a double linked list as suggested, but only uses
it to mark all currently valid entries as invalid when hashvalue == 0.
If a specific entry is invalidated it performs a hash lookup for maximum
efficiency in that case.

This code does pass the regression tests with CLOBBER_CACHE_ALWAYS, does
have the desired performance impact on restore of hundreds of thousands
of foreign key constraints and does not show any negative impact on bulk
loading of data with foreign keys.

Regards, Jan

--
Jan Wieck
Senior Software Engineer
http://slony.info

Attachments:

pg96_ri_constraint_cache_flush_v3.patchtext/x-patch; name=pg96_ri_constraint_cache_flush_v3.patchDownload
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 61edde9..eaec737 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -125,6 +125,8 @@ typedef struct RI_ConstraintInfo
 												 * PK) */
 	Oid			ff_eq_oprs[RI_MAX_NUMKEYS];		/* equality operators (FK =
 												 * FK) */
+	struct RI_ConstraintInfo *prev_valid;		/* Previous valid entry */
+	struct RI_ConstraintInfo *next_valid;		/* Next valid entry */
 } RI_ConstraintInfo;
 
 
@@ -185,6 +187,7 @@ typedef struct RI_CompareHashEntry
 static HTAB *ri_constraint_cache = NULL;
 static HTAB *ri_query_cache = NULL;
 static HTAB *ri_compare_cache = NULL;
+static RI_ConstraintInfo *ri_constraint_cache_valid_list = NULL;
 
 
 /* ----------
@@ -2924,6 +2927,19 @@ ri_LoadConstraintInfo(Oid constraintOid)
 
 	ReleaseSysCache(tup);
 
+	/*
+	 * At the time a cache invalidation message is processed there may be
+	 * active references to the cache. Because of this we never remove entries
+	 * from the cache, but only mark them invalid. For efficient processing
+	 * of invalidation messages below we keep a double linked list of
+	 * currently valid entries.
+	 */
+	if (ri_constraint_cache_valid_list != NULL)
+		ri_constraint_cache_valid_list->prev_valid = riinfo;
+	riinfo->prev_valid = NULL;
+	riinfo->next_valid = ri_constraint_cache_valid_list;
+	ri_constraint_cache_valid_list = riinfo;
+
 	riinfo->valid = true;
 
 	return riinfo;
@@ -2940,17 +2956,55 @@ ri_LoadConstraintInfo(Oid constraintOid)
 static void
 InvalidateConstraintCacheCallBack(Datum arg, int cacheid, uint32 hashvalue)
 {
-	HASH_SEQ_STATUS status;
-	RI_ConstraintInfo *hentry;
+	RI_ConstraintInfo  *hentry;
+	RI_ConstraintInfo  *hnext;
+	bool				found;
 
 	Assert(ri_constraint_cache != NULL);
 
-	hash_seq_init(&status, ri_constraint_cache);
-	while ((hentry = (RI_ConstraintInfo *) hash_seq_search(&status)) != NULL)
+	if (hashvalue == 0)
 	{
-		if (hentry->valid &&
-			(hashvalue == 0 || hentry->oidHashValue == hashvalue))
+		/*
+		 * Set all valid entries in the cache to invalid and clear the
+		 * list of valid entries.
+		 */
+		hnext = ri_constraint_cache_valid_list;
+		while (hnext)
+		{
+			hentry = hnext;
+			hnext = hnext->next_valid;
+
+			Assert(hentry->valid);
+
+			hentry->prev_valid = NULL;
+			hentry->next_valid = NULL;
 			hentry->valid = false;
+		}
+		ri_constraint_cache_valid_list = NULL;
+	}
+	else
+	{
+		/*
+		 * Search for the specified entry and set that one invalid
+		 * and remove it from the list of valid entries.
+		 */
+		hentry = (RI_ConstraintInfo *) hash_search(ri_constraint_cache,
+												   (void *) &hashvalue,
+												   HASH_FIND, &found);
+		if (found && hentry->valid)
+		{
+			Assert(hentry->oidHashValue == hashvalue);
+
+			if (hentry->prev_valid != NULL)
+				hentry->prev_valid->next_valid = hentry->next_valid;
+			else
+				ri_constraint_cache_valid_list = hentry->next_valid;
+			if (hentry->next_valid != NULL)
+				hentry->next_valid->prev_valid = hentry->prev_valid;
+			hentry->prev_valid = NULL;
+			hentry->next_valid = NULL;
+			hentry->valid = false;
+		}
 	}
 }
 
#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jan Wieck (#10)
1 attachment(s)
Re: [COMMITTERS] pgsql: Fix an O(N^2) problem in foreign key references.

Jan Wieck <jan@wi3ck.info> writes:

Attached is a complete rework of the fix from scratch, based on Tom's
suggestion.

The code now maintains a double linked list as suggested, but only uses
it to mark all currently valid entries as invalid when hashvalue == 0.
If a specific entry is invalidated it performs a hash lookup for maximum
efficiency in that case.

That does not work; you're ignoring the possibility of hashvalue
collisions, and for that matter not considering that the hash value
is not equal to the hash key. I don't think your code is invalidating
the right entry at all during a foreign key constraint update, and
it certainly cannot do the right thing if there's a hash value collision.

Attached is something closer to what I was envisioning; can you do
performance testing on it?

regards, tom lane

Attachments:

fk-cache-performance-fix-v4.patchtext/x-diff; charset=us-ascii; name=fk-cache-performance-fix-v4.patchDownload
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 61edde9..db1787a 100644
*** a/src/backend/utils/adt/ri_triggers.c
--- b/src/backend/utils/adt/ri_triggers.c
***************
*** 40,45 ****
--- 40,46 ----
  #include "commands/trigger.h"
  #include "executor/executor.h"
  #include "executor/spi.h"
+ #include "lib/ilist.h"
  #include "parser/parse_coerce.h"
  #include "parser/parse_relation.h"
  #include "miscadmin.h"
*************** typedef struct RI_ConstraintInfo
*** 125,130 ****
--- 126,132 ----
  												 * PK) */
  	Oid			ff_eq_oprs[RI_MAX_NUMKEYS];		/* equality operators (FK =
  												 * FK) */
+ 	dlist_node	valid_link;		/* Link in list of valid entries */
  } RI_ConstraintInfo;
  
  
*************** typedef struct RI_CompareHashEntry
*** 185,190 ****
--- 187,194 ----
  static HTAB *ri_constraint_cache = NULL;
  static HTAB *ri_query_cache = NULL;
  static HTAB *ri_compare_cache = NULL;
+ static dlist_head ri_constraint_cache_valid_list;
+ static int	ri_constraint_cache_valid_count = 0;
  
  
  /* ----------
*************** ri_LoadConstraintInfo(Oid constraintOid)
*** 2924,2929 ****
--- 2928,2940 ----
  
  	ReleaseSysCache(tup);
  
+ 	/*
+ 	 * For efficient processing of invalidation messages below, we keep a
+ 	 * doubly-linked list, and a count, of all currently valid entries.
+ 	 */
+ 	dlist_push_tail(&ri_constraint_cache_valid_list, &riinfo->valid_link);
+ 	ri_constraint_cache_valid_count++;
+ 
  	riinfo->valid = true;
  
  	return riinfo;
*************** ri_LoadConstraintInfo(Oid constraintOid)
*** 2936,2956 ****
   * gets enough update traffic that it's probably worth being smarter.
   * Invalidate any ri_constraint_cache entry associated with the syscache
   * entry with the specified hash value, or all entries if hashvalue == 0.
   */
  static void
  InvalidateConstraintCacheCallBack(Datum arg, int cacheid, uint32 hashvalue)
  {
! 	HASH_SEQ_STATUS status;
! 	RI_ConstraintInfo *hentry;
  
  	Assert(ri_constraint_cache != NULL);
  
! 	hash_seq_init(&status, ri_constraint_cache);
! 	while ((hentry = (RI_ConstraintInfo *) hash_seq_search(&status)) != NULL)
  	{
! 		if (hentry->valid &&
! 			(hashvalue == 0 || hentry->oidHashValue == hashvalue))
! 			hentry->valid = false;
  	}
  }
  
--- 2947,2987 ----
   * gets enough update traffic that it's probably worth being smarter.
   * Invalidate any ri_constraint_cache entry associated with the syscache
   * entry with the specified hash value, or all entries if hashvalue == 0.
+  *
+  * Note: at the time a cache invalidation message is processed there may be
+  * active references to the cache.  Because of this we never remove entries
+  * from the cache, but only mark them invalid, which is harmless to active
+  * uses.  (Any query using an entry should hold a lock sufficient to keep that
+  * data from changing under it --- but we may get cache flushes anyway.)
   */
  static void
  InvalidateConstraintCacheCallBack(Datum arg, int cacheid, uint32 hashvalue)
  {
! 	dlist_mutable_iter iter;
  
  	Assert(ri_constraint_cache != NULL);
  
! 	/*
! 	 * If the list of currently valid entries gets excessively large, we mark
! 	 * them all invalid so we can empty the list.  This arrangement avoids
! 	 * O(N^2) behavior in situations where a session touches many foreign keys
! 	 * and also does many ALTER TABLEs, such as a restore from pg_dump.
! 	 */
! 	if (ri_constraint_cache_valid_count > 1000)
! 		hashvalue = 0;			/* pretend it's a cache reset */
! 
! 	dlist_foreach_modify(iter, &ri_constraint_cache_valid_list)
  	{
! 		RI_ConstraintInfo *riinfo = dlist_container(RI_ConstraintInfo,
! 													valid_link, iter.cur);
! 
! 		if (hashvalue == 0 || riinfo->oidHashValue == hashvalue)
! 		{
! 			riinfo->valid = false;
! 			/* Remove invalidated entries from the list, too */
! 			dlist_delete(iter.cur);
! 			ri_constraint_cache_valid_count--;
! 		}
  	}
  }
  
#12Jan Wieck
jan@wi3ck.info
In reply to: Tom Lane (#11)
Re: [COMMITTERS] pgsql: Fix an O(N^2) problem in foreign key references.

On 09/18/2015 10:47 AM, Tom Lane wrote:

Jan Wieck <jan@wi3ck.info> writes:

Attached is a complete rework of the fix from scratch, based on Tom's
suggestion.

The code now maintains a double linked list as suggested, but only uses
it to mark all currently valid entries as invalid when hashvalue == 0.
If a specific entry is invalidated it performs a hash lookup for maximum
efficiency in that case.

That does not work; you're ignoring the possibility of hashvalue
collisions, and for that matter not considering that the hash value
is not equal to the hash key. I don't think your code is invalidating
the right entry at all during a foreign key constraint update, and
it certainly cannot do the right thing if there's a hash value collision.

Attached is something closer to what I was envisioning; can you do
performance testing on it?

Sorry for the delay.

Yes, that patch also has the desired performance for restoring a schema
with hundreds of thousands of foreign key constraints.

Regards, Jan

--
Jan Wieck
Senior Software Engineer
http://slony.info

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jan Wieck (#12)
Re: [COMMITTERS] pgsql: Fix an O(N^2) problem in foreign key references.

Jan Wieck <jan@wi3ck.info> writes:

On 09/18/2015 10:47 AM, Tom Lane wrote:

Attached is something closer to what I was envisioning; can you do
performance testing on it?

Yes, that patch also has the desired performance for restoring a schema
with hundreds of thousands of foreign key constraints.

Great, thanks for checking. I'll push it in a moment.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#11)
Re: [COMMITTERS] pgsql: Fix an O(N^2) problem in foreign key references.

Tom Lane wrote:

Jan Wieck <jan@wi3ck.info> writes:

Attached is a complete rework of the fix from scratch, based on Tom's
suggestion.

The code now maintains a double linked list as suggested, but only uses
it to mark all currently valid entries as invalid when hashvalue == 0.
If a specific entry is invalidated it performs a hash lookup for maximum
efficiency in that case.

That does not work; you're ignoring the possibility of hashvalue
collisions, and for that matter not considering that the hash value
is not equal to the hash key. I don't think your code is invalidating
the right entry at all during a foreign key constraint update, and
it certainly cannot do the right thing if there's a hash value collision.

Would it make sense to remove the only the few oldest entries, instead
of all of them? As is, I think this causes a storm of reloads every
once in a while, if the number of FKs in the system is large enough.
Maybe on a cache hit we could push the entry to the head of the list,
and then remove N entries from the back of the list when the threshold
is reached.

I think the assumption here is that most sessions will not reach the
threshold at all, except for those that access all tables such as
pg_dump -- that is, most sessions are short-lived. But in cases
involved connection poolers, eventually all sessions would access all
tables, and thus be subject to the storm issue.

(Of course, this is all hypothetical.)

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#14)
Re: [COMMITTERS] pgsql: Fix an O(N^2) problem in foreign key references.

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Would it make sense to remove the only the few oldest entries, instead
of all of them? As is, I think this causes a storm of reloads every
once in a while, if the number of FKs in the system is large enough.
Maybe on a cache hit we could push the entry to the head of the list,
and then remove N entries from the back of the list when the threshold
is reached.

Sure, there's room for optimization of that sort, although I think we
could do with some evidence that it's actually helpful. I believe that
under "production" workloads InvalidateConstraintCacheCallBack won't
get called much at all, so the problem's moot.

(FWIW, it might take less code to put the recently-used entries at the
back of the list. Then the loop in InvalidateConstraintCacheCallBack
could just invalidate/delete entries if either they're targets, or
the current list length exceeds the threshold.)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#15)
1 attachment(s)
Re: [COMMITTERS] pgsql: Fix an O(N^2) problem in foreign key references.

I wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Would it make sense to remove the only the few oldest entries, instead
of all of them? As is, I think this causes a storm of reloads every
once in a while, if the number of FKs in the system is large enough.
Maybe on a cache hit we could push the entry to the head of the list,
and then remove N entries from the back of the list when the threshold
is reached.

(FWIW, it might take less code to put the recently-used entries at the
back of the list. Then the loop in InvalidateConstraintCacheCallBack
could just invalidate/delete entries if either they're targets, or
the current list length exceeds the threshold.)

Specifically, we could change it as per attached. But this adds some
cycles to the mainline usage of fetching a valid cache entry, so I'd
want to see some evidence that there are use-cases it helps before
applying it.

regards, tom lane

Attachments:

kill-only-oldest-FK-cache-entries.patchtext/x-diff; charset=us-ascii; name=kill-only-oldest-FK-cache-entries.patchDownload
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 018cb99..54376ed 100644
*** a/src/backend/utils/adt/ri_triggers.c
--- b/src/backend/utils/adt/ri_triggers.c
*************** ri_LoadConstraintInfo(Oid constraintOid)
*** 2814,2820 ****
  		ri_InitHashTables();
  
  	/*
! 	 * Find or create a hash entry.  If we find a valid one, just return it.
  	 */
  	riinfo = (RI_ConstraintInfo *) hash_search(ri_constraint_cache,
  											   (void *) &constraintOid,
--- 2814,2821 ----
  		ri_InitHashTables();
  
  	/*
! 	 * Find or create a hash entry.  If we find a valid one, just return it,
! 	 * after pushing it to the end of the list to mark it recently used.
  	 */
  	riinfo = (RI_ConstraintInfo *) hash_search(ri_constraint_cache,
  											   (void *) &constraintOid,
*************** ri_LoadConstraintInfo(Oid constraintOid)
*** 2822,2828 ****
--- 2823,2833 ----
  	if (!found)
  		riinfo->valid = false;
  	else if (riinfo->valid)
+ 	{
+ 		dlist_delete(&riinfo->valid_link);
+ 		dlist_push_tail(&ri_constraint_cache_valid_list, &riinfo->valid_link);
  		return riinfo;
+ 	}
  
  	/*
  	 * Fetch the pg_constraint row so we can fill in the entry.
*************** ri_LoadConstraintInfo(Oid constraintOid)
*** 2931,2936 ****
--- 2936,2942 ----
  	/*
  	 * For efficient processing of invalidation messages below, we keep a
  	 * doubly-linked list, and a count, of all currently valid entries.
+ 	 * The most recently used entries are at the *end* of the list.
  	 */
  	dlist_push_tail(&ri_constraint_cache_valid_list, &riinfo->valid_link);
  	ri_constraint_cache_valid_count++;
*************** ri_LoadConstraintInfo(Oid constraintOid)
*** 2948,2953 ****
--- 2954,2965 ----
   * Invalidate any ri_constraint_cache entry associated with the syscache
   * entry with the specified hash value, or all entries if hashvalue == 0.
   *
+  * We also invalidate entries if there are too many valid entries.  This rule
+  * avoids O(N^2) behavior in situations where a session touches many foreign
+  * keys and also does many ALTER TABLEs, such as a restore from pg_dump.  When
+  * we do kill entries for that reason, they'll be the least recently used
+  * ones, since they're at the front of the list.
+  *
   * Note: at the time a cache invalidation message is processed there may be
   * active references to the cache.  Because of this we never remove entries
   * from the cache, but only mark them invalid, which is harmless to active
*************** InvalidateConstraintCacheCallBack(Datum 
*** 2961,2981 ****
  
  	Assert(ri_constraint_cache != NULL);
  
- 	/*
- 	 * If the list of currently valid entries gets excessively large, we mark
- 	 * them all invalid so we can empty the list.  This arrangement avoids
- 	 * O(N^2) behavior in situations where a session touches many foreign keys
- 	 * and also does many ALTER TABLEs, such as a restore from pg_dump.
- 	 */
- 	if (ri_constraint_cache_valid_count > 1000)
- 		hashvalue = 0;			/* pretend it's a cache reset */
- 
  	dlist_foreach_modify(iter, &ri_constraint_cache_valid_list)
  	{
  		RI_ConstraintInfo *riinfo = dlist_container(RI_ConstraintInfo,
  													valid_link, iter.cur);
  
! 		if (hashvalue == 0 || riinfo->oidHashValue == hashvalue)
  		{
  			riinfo->valid = false;
  			/* Remove invalidated entries from the list, too */
--- 2973,2985 ----
  
  	Assert(ri_constraint_cache != NULL);
  
  	dlist_foreach_modify(iter, &ri_constraint_cache_valid_list)
  	{
  		RI_ConstraintInfo *riinfo = dlist_container(RI_ConstraintInfo,
  													valid_link, iter.cur);
  
! 		if (hashvalue == 0 || riinfo->oidHashValue == hashvalue ||
! 			ri_constraint_cache_valid_count > 1000)
  		{
  			riinfo->valid = false;
  			/* Remove invalidated entries from the list, too */