Small patch to fix an O(N^2) problem in foreign keys

Started by Jan Wieckover 10 years ago4 messages

jan@wi3ck.info

over 10 years ago

2 attachment(s)

Attached is a small patch and a script to reproduce the issue.

The problem is a cache introduced in commit 45ba4247 that improves
foreign key lookups during bulk updates when the FK value does not
change. When restoring a schema dump from a database with many (say
100,000) foreign keys, this cache is growing very big and every ALTER
TABLE command is causing a InvalidateConstraintCacheCallBack(), which
does a sequential hash table scan.

The patch uses a heuristic method of detecting when the hash table
should be destroyed and recreated. InvalidateConstraintCacheCallBack()
adds the current size of the hash table to a counter. When that sum
reaches 1,000,000, the hash table is flushed. This improves the schema
restore of a database with 100,000 foreign keys by factor 3.

According to my tests the patch does not interfere with the bulk
updates, the original feature was supposed to improve.

Regards, Jan

--
Jan Wieck
Senior Software Engineer
http://slony.info

Attachments:

pg96_ri_constraint_cache_flush.patchtext/x-patch; name=pg96_ri_constraint_cache_flush.patchDownload

diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 61edde9..d7023ce 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -183,6 +183,7 @@ typedef struct RI_CompareHashEntry
  * ----------
  */
 static HTAB *ri_constraint_cache = NULL;
+static long  ri_constraint_cache_seq_count = 0;
 static HTAB *ri_query_cache = NULL;
 static HTAB *ri_compare_cache = NULL;
 
@@ -2945,6 +2946,27 @@ InvalidateConstraintCacheCallBack(Datum arg, int cacheid, uint32 hashvalue)
 
 	Assert(ri_constraint_cache != NULL);
 
+	/*
+	 * Prevent an O(N^2) problem when creating large amounts of foreign
+	 * key constraints with ALTER TABLE, like it happens at the end of
+	 * a pg_dump with hundred-thousands of tables having references.
+	 */
+	ri_constraint_cache_seq_count += hash_get_num_entries(ri_constraint_cache);
+	if (ri_constraint_cache_seq_count > 1000000)
+	{
+		HASHCTL		ctl;
+
+		hash_destroy(ri_constraint_cache);
+		memset(&ctl, 0, sizeof(ctl));
+		ctl.keysize = sizeof(Oid);
+		ctl.entrysize = sizeof(RI_ConstraintInfo);
+		ri_constraint_cache = hash_create("RI constraint cache",
+										  RI_INIT_CONSTRAINTHASHSIZE,
+										  &ctl, HASH_ELEM | HASH_BLOBS);
+		ri_constraint_cache_seq_count = 0;
+		return;
+	}
+
 	hash_seq_init(&status, ri_constraint_cache);
 	while ((hentry = (RI_ConstraintInfo *) hash_seq_search(&status)) != NULL)
 	{

t1.shapplication/x-shellscript; name=t1.shDownload

Kevin Grittner

kgrittn@ymail.com

over 10 years ago

In reply to: Jan Wieck (#1)

Re: Small patch to fix an O(N^2) problem in foreign keys

Jan Wieck <jan@wi3ck.info> wrote:

The problem is a cache introduced in commit 45ba4247 that improves

That's a bit off; 45ba424f seems to be what you mean.

foreign key lookups during bulk updates when the FK value does not
change. When restoring a schema dump from a database with many (say
100,000) foreign keys, this cache is growing very big and every ALTER
TABLE command is causing a InvalidateConstraintCacheCallBack(), which
does a sequential hash table scan.

The patch uses a heuristic method of detecting when the hash table
should be destroyed and recreated. InvalidateConstraintCacheCallBack()
adds the current size of the hash table to a counter. When that sum
reaches 1,000,000, the hash table is flushed. This improves the schema
restore of a database with 100,000 foreign keys by factor 3.

According to my tests the patch does not interfere with the bulk
updates, the original feature was supposed to improve.

In the real-world customer case that caused you to look into this,
I thought 45ba424f drove schema-only restore time from 2 hours to
about 25 hours, and that this patch takes it back down to 2 hours.
Am I remembering right? And this came about because it added
20-some hours to a pg_upgrade run?

If there are no objections, I will push this as a bug fix back to
9.3, where the performance regression was introduced.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Jan Wieck

jan@wi3ck.info

over 10 years ago

In reply to: Kevin Grittner (#2)

Re: Small patch to fix an O(N^2) problem in foreign keys

On 09/08/2015 08:49 AM, Kevin Grittner wrote:

Jan Wieck <jan@wi3ck.info> wrote:

The problem is a cache introduced in commit 45ba4247 that improves

That's a bit off; 45ba424f seems to be what you mean.

Copy paste over paper.

foreign key lookups during bulk updates when the FK value does not
change. When restoring a schema dump from a database with many (say
100,000) foreign keys, this cache is growing very big and every ALTER
TABLE command is causing a InvalidateConstraintCacheCallBack(), which
does a sequential hash table scan.

The patch uses a heuristic method of detecting when the hash table
should be destroyed and recreated. InvalidateConstraintCacheCallBack()
adds the current size of the hash table to a counter. When that sum
reaches 1,000,000, the hash table is flushed. This improves the schema
restore of a database with 100,000 foreign keys by factor 3.

According to my tests the patch does not interfere with the bulk
updates, the original feature was supposed to improve.

In the real-world customer case that caused you to look into this,
I thought 45ba424f drove schema-only restore time from 2 hours to
about 25 hours, and that this patch takes it back down to 2 hours.
Am I remembering right? And this came about because it added
20-some hours to a pg_upgrade run?

From 2 hours to >50, but yes, this is that case.

If there are no objections, I will push this as a bug fix back to
9.3, where the performance regression was introduced.

Regards, Jan

--
Jan Wieck
Senior Software Engineer
http://slony.info

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Kevin Grittner

kgrittn@ymail.com

over 10 years ago

In reply to: Jan Wieck (#3)

Re: Small patch to fix an O(N^2) problem in foreign keys

Jan Wieck <jan@wi3ck.info> wrote:

The patch uses a heuristic method of detecting when the hash table
should be destroyed and recreated. InvalidateConstraintCacheCallBack()
adds the current size of the hash table to a counter. When that sum
reaches 1,000,000, the hash table is flushed. This improves the schema
restore of a database with 100,000 foreign keys by factor 3.

According to my tests the patch does not interfere with the bulk
updates, the original feature was supposed to improve.

In the real-world customer case that caused you to look into this,
I thought 45ba424f drove schema-only restore time from 2 hours to
about 25 hours, and that this patch takes it back down to 2 hours.
Am I remembering right? And this came about because it added
20-some hours to a pg_upgrade run?

From 2 hours to >50, but yes, this is that case.

If there are no objections, I will push this as a bug fix back to
9.3, where the performance regression was introduced.

Hearing no objections, pushed with one minor tweak Jan and I
discussed off-list -- the original patch duplicated a bit of code
that we agreed should be split into a static function and called
from both places, to protect against someone later changing one and
missing the other.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers