Experimenting with hash tables inside pg_dump

Started by Tom Laneabout 4 years ago21 messages
#1Tom Lane
tgl@sss.pgh.pa.us
1 attachment(s)

Today, pg_dump does a lot of internal lookups via binary search
in presorted arrays. I thought it might improve matters
to replace those binary searches with hash tables, theoretically
converting O(log N) searches into O(1) searches. So I tried making
a hash table indexed by CatalogId (tableoid+oid) with simplehash.h,
and replacing as many data structures as I could with that.

This makes the code shorter and (IMO anyway) cleaner, but

(a) the executable size increases by a few KB --- apparently, even
the minimum subset of simplehash.h's functionality is code-wasteful.

(b) I couldn't measure any change in performance at all. I tried
it on the regression database and on a toy DB with 10000 simple
tables. Maybe on a really large DB you'd notice some difference,
but I'm not very optimistic now.

So this experiment feels like a failure, but I thought I'd post
the patch and results for the archives' sake. Maybe somebody
will think of a way to improve matters. Or maybe it's worth
doing just to shorten the code?

regards, tom lane

Attachments:

use-simplehash-in-pg-dump-1.patchtext/x-diff; charset=us-ascii; name=use-simplehash-in-pg-dump-1.patchDownload
diff --git a/src/bin/pg_dump/common.c b/src/bin/pg_dump/common.c
index 1f24e79665..49932fd598 100644
--- a/src/bin/pg_dump/common.c
+++ b/src/bin/pg_dump/common.c
@@ -18,6 +18,14 @@
 #include <ctype.h>
 
 #include "catalog/pg_class_d.h"
+#include "catalog/pg_collation_d.h"
+#include "catalog/pg_extension_d.h"
+#include "catalog/pg_namespace_d.h"
+#include "catalog/pg_operator_d.h"
+#include "catalog/pg_proc_d.h"
+#include "catalog/pg_publication_d.h"
+#include "catalog/pg_type_d.h"
+#include "common/hashfn.h"
 #include "fe_utils/string_utils.h"
 #include "pg_backup_archiver.h"
 #include "pg_backup_utils.h"
@@ -31,54 +39,54 @@ static int	allocedDumpIds = 0;
 static DumpId lastDumpId = 0;	/* Note: 0 is InvalidDumpId */
 
 /*
- * Variables for mapping CatalogId to DumpableObject
- */
-static bool catalogIdMapValid = false;
-static DumpableObject **catalogIdMap = NULL;
-static int	numCatalogIds = 0;
-
-/*
- * These variables are static to avoid the notational cruft of having to pass
- * them into findTableByOid() and friends.  For each of these arrays, we build
- * a sorted-by-OID index array immediately after the objects are fetched,
- * and then we use binary search in findTableByOid() and friends.  (qsort'ing
- * the object arrays themselves would be simpler, but it doesn't work because
- * pg_dump.c may have already established pointers between items.)
+ * Infrastructure for mapping CatalogId to DumpableObject
+ *
+ * We use a hash table generated by simplehash.h.  That infrastructure
+ * requires all the hash table entries to be the same size, and it also
+ * expects that it can move them around when resizing the table.  So we
+ * cannot make the DumpableObjects be elements of the hash table directly;
+ * instead, the hash table elements contain pointers to DumpableObjects.
+ *
+ * It turns out to be convenient to also use this data structure to map
+ * CatalogIds to owning extensions, if any.  Since extension membership
+ * data is read before creating most DumpableObjects, either one of dobj
+ * and ext could be NULL.
  */
-static DumpableObject **tblinfoindex;
-static DumpableObject **typinfoindex;
-static DumpableObject **funinfoindex;
-static DumpableObject **oprinfoindex;
-static DumpableObject **collinfoindex;
-static DumpableObject **nspinfoindex;
-static DumpableObject **extinfoindex;
-static DumpableObject **pubinfoindex;
-static int	numTables;
-static int	numTypes;
-static int	numFuncs;
-static int	numOperators;
-static int	numCollations;
-static int	numNamespaces;
-static int	numExtensions;
-static int	numPublications;
-
-/* This is an array of object identities, not actual DumpableObjects */
-static ExtensionMemberId *extmembers;
-static int	numextmembers;
+typedef struct _catalogIdMapEntry
+{
+	CatalogId	catId;			/* the indexed CatalogId */
+	uint32		status;			/* hash status */
+	uint32		hashval;		/* hash code for the CatalogId */
+	DumpableObject *dobj;		/* the associated DumpableObject, if any */
+	ExtensionInfo *ext;			/* owning extension, if any */
+} CatalogIdMapEntry;
+
+#define SH_PREFIX		catalogid
+#define SH_ELEMENT_TYPE	CatalogIdMapEntry
+#define SH_KEY_TYPE		CatalogId
+#define	SH_KEY			catId
+#define SH_HASH_KEY(tb, key)	hash_bytes((const unsigned char *) &(key), sizeof(CatalogId))
+#define SH_EQUAL(tb, a, b)		((a).oid == (b).oid && (a).tableoid == (b).tableoid)
+#define SH_STORE_HASH
+#define SH_GET_HASH(tb, a) (a)->hashval
+#define	SH_SCOPE		static inline
+#define SH_RAW_ALLOCATOR	pg_malloc0
+#define SH_DECLARE
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+#define CATALOGIDHASH_INITIAL_SIZE	10000
+
+static catalogid_hash *catalogIdHash = NULL;
 
 static void flagInhTables(Archive *fout, TableInfo *tbinfo, int numTables,
 						  InhInfo *inhinfo, int numInherits);
 static void flagInhIndexes(Archive *fout, TableInfo *tblinfo, int numTables);
 static void flagInhAttrs(DumpOptions *dopt, TableInfo *tblinfo, int numTables);
-static DumpableObject **buildIndexArray(void *objArray, int numObjs,
-										Size objSize);
-static int	DOCatalogIdCompare(const void *p1, const void *p2);
-static int	ExtensionMemberIdCompare(const void *p1, const void *p2);
 static void findParentsByOid(TableInfo *self,
 							 InhInfo *inhinfo, int numInherits);
 static int	strInArray(const char *pattern, char **arr, int arr_size);
-static IndxInfo *findIndexByOid(Oid oid, DumpableObject **idxinfoindex,
-								int numIndexes);
+static IndxInfo *findIndexByOid(Oid oid);
 
 
 /*
@@ -89,14 +97,16 @@ TableInfo *
 getSchemaData(Archive *fout, int *numTablesPtr)
 {
 	TableInfo  *tblinfo;
-	TypeInfo   *typinfo;
-	FuncInfo   *funinfo;
-	OprInfo    *oprinfo;
-	CollInfo   *collinfo;
-	NamespaceInfo *nspinfo;
 	ExtensionInfo *extinfo;
-	PublicationInfo *pubinfo;
 	InhInfo    *inhinfo;
+	int			numTables;
+	int			numTypes;
+	int			numFuncs;
+	int			numOperators;
+	int			numCollations;
+	int			numNamespaces;
+	int			numExtensions;
+	int			numPublications;
 	int			numAggregates;
 	int			numInherits;
 	int			numRules;
@@ -123,14 +133,12 @@ getSchemaData(Archive *fout, int *numTablesPtr)
 	 */
 	pg_log_info("reading extensions");
 	extinfo = getExtensions(fout, &numExtensions);
-	extinfoindex = buildIndexArray(extinfo, numExtensions, sizeof(ExtensionInfo));
 
 	pg_log_info("identifying extension members");
 	getExtensionMembership(fout, extinfo, numExtensions);
 
 	pg_log_info("reading schemas");
-	nspinfo = getNamespaces(fout, &numNamespaces);
-	nspinfoindex = buildIndexArray(nspinfo, numNamespaces, sizeof(NamespaceInfo));
+	(void) getNamespaces(fout, &numNamespaces);
 
 	/*
 	 * getTables should be done as soon as possible, so as to minimize the
@@ -140,19 +148,15 @@ getSchemaData(Archive *fout, int *numTablesPtr)
 	 */
 	pg_log_info("reading user-defined tables");
 	tblinfo = getTables(fout, &numTables);
-	tblinfoindex = buildIndexArray(tblinfo, numTables, sizeof(TableInfo));
 
-	/* Do this after we've built tblinfoindex */
 	getOwnedSeqs(fout, tblinfo, numTables);
 
 	pg_log_info("reading user-defined functions");
-	funinfo = getFuncs(fout, &numFuncs);
-	funinfoindex = buildIndexArray(funinfo, numFuncs, sizeof(FuncInfo));
+	(void) getFuncs(fout, &numFuncs);
 
 	/* this must be after getTables and getFuncs */
 	pg_log_info("reading user-defined types");
-	typinfo = getTypes(fout, &numTypes);
-	typinfoindex = buildIndexArray(typinfo, numTypes, sizeof(TypeInfo));
+	(void) getTypes(fout, &numTypes);
 
 	/* this must be after getFuncs, too */
 	pg_log_info("reading procedural languages");
@@ -162,8 +166,7 @@ getSchemaData(Archive *fout, int *numTablesPtr)
 	getAggregates(fout, &numAggregates);
 
 	pg_log_info("reading user-defined operators");
-	oprinfo = getOperators(fout, &numOperators);
-	oprinfoindex = buildIndexArray(oprinfo, numOperators, sizeof(OprInfo));
+	(void) getOperators(fout, &numOperators);
 
 	pg_log_info("reading user-defined access methods");
 	getAccessMethods(fout, &numAccessMethods);
@@ -196,8 +199,7 @@ getSchemaData(Archive *fout, int *numTablesPtr)
 	getDefaultACLs(fout, &numDefaultACLs);
 
 	pg_log_info("reading user-defined collations");
-	collinfo = getCollations(fout, &numCollations);
-	collinfoindex = buildIndexArray(collinfo, numCollations, sizeof(CollInfo));
+	(void) getCollations(fout, &numCollations);
 
 	pg_log_info("reading user-defined conversions");
 	getConversions(fout, &numConversions);
@@ -250,9 +252,7 @@ getSchemaData(Archive *fout, int *numTablesPtr)
 	getPolicies(fout, tblinfo, numTables);
 
 	pg_log_info("reading publications");
-	pubinfo = getPublications(fout, &numPublications);
-	pubinfoindex = buildIndexArray(pubinfo, numPublications,
-								   sizeof(PublicationInfo));
+	(void) getPublications(fout, &numPublications);
 
 	pg_log_info("reading publication membership");
 	getPublicationTables(fout, tblinfo, numTables);
@@ -375,34 +375,15 @@ flagInhIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 	int			i,
 				j,
 				k;
-	DumpableObject ***parentIndexArray;
-
-	parentIndexArray = (DumpableObject ***)
-		pg_malloc0(getMaxDumpId() * sizeof(DumpableObject **));
 
 	for (i = 0; i < numTables; i++)
 	{
-		TableInfo  *parenttbl;
 		IndexAttachInfo *attachinfo;
 
 		if (!tblinfo[i].ispartition || tblinfo[i].numParents == 0)
 			continue;
 
 		Assert(tblinfo[i].numParents == 1);
-		parenttbl = tblinfo[i].parents[0];
-
-		/*
-		 * We need access to each parent table's index list, but there is no
-		 * index to cover them outside of this function.  To avoid having to
-		 * sort every parent table's indexes each time we come across each of
-		 * its partitions, create an indexed array for each parent the first
-		 * time it is required.
-		 */
-		if (parentIndexArray[parenttbl->dobj.dumpId] == NULL)
-			parentIndexArray[parenttbl->dobj.dumpId] =
-				buildIndexArray(parenttbl->indexes,
-								parenttbl->numIndexes,
-								sizeof(IndxInfo));
 
 		attachinfo = (IndexAttachInfo *)
 			pg_malloc0(tblinfo[i].numIndexes * sizeof(IndexAttachInfo));
@@ -414,9 +395,7 @@ flagInhIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 			if (index->parentidx == 0)
 				continue;
 
-			parentidx = findIndexByOid(index->parentidx,
-									   parentIndexArray[parenttbl->dobj.dumpId],
-									   parenttbl->numIndexes);
+			parentidx = findIndexByOid(index->parentidx);
 			if (parentidx == NULL)
 				continue;
 
@@ -457,11 +436,6 @@ flagInhIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 			k++;
 		}
 	}
-
-	for (i = 0; i < numTables; i++)
-		if (parentIndexArray[i])
-			pg_free(parentIndexArray[i]);
-	pg_free(parentIndexArray);
 }
 
 /* flagInhAttrs -
@@ -596,7 +570,7 @@ flagInhAttrs(DumpOptions *dopt, TableInfo *tblinfo, int numTables)
 /*
  * AssignDumpId
  *		Given a newly-created dumpable object, assign a dump ID,
- *		and enter the object into the lookup table.
+ *		and enter the object into the lookup tables.
  *
  * The caller is expected to have filled in objType and catId,
  * but not any of the other standard fields of a DumpableObject.
@@ -615,6 +589,7 @@ AssignDumpId(DumpableObject *dobj)
 	dobj->nDeps = 0;
 	dobj->allocDeps = 0;
 
+	/* Add object to dumpIdMap[], enlarging that array if need be */
 	while (dobj->dumpId >= allocedDumpIds)
 	{
 		int			newAlloc;
@@ -637,8 +612,25 @@ AssignDumpId(DumpableObject *dobj)
 	}
 	dumpIdMap[dobj->dumpId] = dobj;
 
-	/* mark catalogIdMap invalid, but don't rebuild it yet */
-	catalogIdMapValid = false;
+	/* If it has a valid CatalogId, enter it into the hash table */
+	if (OidIsValid(dobj->catId.tableoid))
+	{
+		CatalogIdMapEntry *entry;
+		bool		found;
+
+		/* Initialize CatalogId hash table if not done yet */
+		if (catalogIdHash == NULL)
+			catalogIdHash = catalogid_create(CATALOGIDHASH_INITIAL_SIZE, NULL);
+
+		entry = catalogid_insert(catalogIdHash, dobj->catId, &found);
+		if (!found)
+		{
+			entry->dobj = NULL;
+			entry->ext = NULL;
+		}
+		Assert(entry->dobj == NULL);
+		entry->dobj = dobj;
+	}
 }
 
 /*
@@ -679,140 +671,19 @@ findObjectByDumpId(DumpId dumpId)
  * Find a DumpableObject by catalog ID
  *
  * Returns NULL for unknown ID
- *
- * We use binary search in a sorted list that is built on first call.
- * If AssignDumpId() and findObjectByCatalogId() calls were freely intermixed,
- * the code would work, but possibly be very slow.  In the current usage
- * pattern that does not happen, indeed we build the list at most twice.
  */
 DumpableObject *
 findObjectByCatalogId(CatalogId catalogId)
 {
-	DumpableObject **low;
-	DumpableObject **high;
-
-	if (!catalogIdMapValid)
-	{
-		if (catalogIdMap)
-			free(catalogIdMap);
-		getDumpableObjects(&catalogIdMap, &numCatalogIds);
-		if (numCatalogIds > 1)
-			qsort((void *) catalogIdMap, numCatalogIds,
-				  sizeof(DumpableObject *), DOCatalogIdCompare);
-		catalogIdMapValid = true;
-	}
-
-	/*
-	 * We could use bsearch() here, but the notational cruft of calling
-	 * bsearch is nearly as bad as doing it ourselves; and the generalized
-	 * bsearch function is noticeably slower as well.
-	 */
-	if (numCatalogIds <= 0)
-		return NULL;
-	low = catalogIdMap;
-	high = catalogIdMap + (numCatalogIds - 1);
-	while (low <= high)
-	{
-		DumpableObject **middle;
-		int			difference;
-
-		middle = low + (high - low) / 2;
-		/* comparison must match DOCatalogIdCompare, below */
-		difference = oidcmp((*middle)->catId.oid, catalogId.oid);
-		if (difference == 0)
-			difference = oidcmp((*middle)->catId.tableoid, catalogId.tableoid);
-		if (difference == 0)
-			return *middle;
-		else if (difference < 0)
-			low = middle + 1;
-		else
-			high = middle - 1;
-	}
-	return NULL;
-}
-
-/*
- * Find a DumpableObject by OID, in a pre-sorted array of one type of object
- *
- * Returns NULL for unknown OID
- */
-static DumpableObject *
-findObjectByOid(Oid oid, DumpableObject **indexArray, int numObjs)
-{
-	DumpableObject **low;
-	DumpableObject **high;
+	CatalogIdMapEntry *entry;
 
-	/*
-	 * This is the same as findObjectByCatalogId except we assume we need not
-	 * look at table OID because the objects are all the same type.
-	 *
-	 * We could use bsearch() here, but the notational cruft of calling
-	 * bsearch is nearly as bad as doing it ourselves; and the generalized
-	 * bsearch function is noticeably slower as well.
-	 */
-	if (numObjs <= 0)
-		return NULL;
-	low = indexArray;
-	high = indexArray + (numObjs - 1);
-	while (low <= high)
-	{
-		DumpableObject **middle;
-		int			difference;
-
-		middle = low + (high - low) / 2;
-		difference = oidcmp((*middle)->catId.oid, oid);
-		if (difference == 0)
-			return *middle;
-		else if (difference < 0)
-			low = middle + 1;
-		else
-			high = middle - 1;
-	}
-	return NULL;
-}
-
-/*
- * Build an index array of DumpableObject pointers, sorted by OID
- */
-static DumpableObject **
-buildIndexArray(void *objArray, int numObjs, Size objSize)
-{
-	DumpableObject **ptrs;
-	int			i;
+	if (catalogIdHash == NULL)
+		return NULL;			/* no objects exist yet */
 
-	if (numObjs <= 0)
+	entry = catalogid_lookup(catalogIdHash, catalogId);
+	if (entry == NULL)
 		return NULL;
-
-	ptrs = (DumpableObject **) pg_malloc(numObjs * sizeof(DumpableObject *));
-	for (i = 0; i < numObjs; i++)
-		ptrs[i] = (DumpableObject *) ((char *) objArray + i * objSize);
-
-	/* We can use DOCatalogIdCompare to sort since its first key is OID */
-	if (numObjs > 1)
-		qsort((void *) ptrs, numObjs, sizeof(DumpableObject *),
-			  DOCatalogIdCompare);
-
-	return ptrs;
-}
-
-/*
- * qsort comparator for pointers to DumpableObjects
- */
-static int
-DOCatalogIdCompare(const void *p1, const void *p2)
-{
-	const DumpableObject *obj1 = *(DumpableObject *const *) p1;
-	const DumpableObject *obj2 = *(DumpableObject *const *) p2;
-	int			cmpval;
-
-	/*
-	 * Compare OID first since it's usually unique, whereas there will only be
-	 * a few distinct values of tableoid.
-	 */
-	cmpval = oidcmp(obj1->catId.oid, obj2->catId.oid);
-	if (cmpval == 0)
-		cmpval = oidcmp(obj1->catId.tableoid, obj2->catId.tableoid);
-	return cmpval;
+	return entry->dobj;
 }
 
 /*
@@ -886,119 +757,169 @@ removeObjectDependency(DumpableObject *dobj, DumpId refId)
 
 /*
  * findTableByOid
- *	  finds the entry (in tblinfo) of the table with the given oid
+ *	  finds the DumpableObject for the table with the given oid
  *	  returns NULL if not found
  */
 TableInfo *
 findTableByOid(Oid oid)
 {
-	return (TableInfo *) findObjectByOid(oid, tblinfoindex, numTables);
+	CatalogId	catId;
+	DumpableObject *dobj;
+
+	catId.tableoid = RelationRelationId;
+	catId.oid = oid;
+	dobj = findObjectByCatalogId(catId);
+	Assert(dobj == NULL || dobj->objType == DO_TABLE);
+	return (TableInfo *) dobj;
+}
+
+/*
+ * findIndexByOid
+ *	  finds the DumpableObject for the index with the given oid
+ *	  returns NULL if not found
+ */
+static IndxInfo *
+findIndexByOid(Oid oid)
+{
+	CatalogId	catId;
+	DumpableObject *dobj;
+
+	catId.tableoid = RelationRelationId;
+	catId.oid = oid;
+	dobj = findObjectByCatalogId(catId);
+	Assert(dobj == NULL || dobj->objType == DO_INDEX);
+	return (IndxInfo *) dobj;
 }
 
 /*
  * findTypeByOid
- *	  finds the entry (in typinfo) of the type with the given oid
+ *	  finds the DumpableObject for the type with the given oid
  *	  returns NULL if not found
  */
 TypeInfo *
 findTypeByOid(Oid oid)
 {
-	return (TypeInfo *) findObjectByOid(oid, typinfoindex, numTypes);
+	CatalogId	catId;
+
+	catId.tableoid = TypeRelationId;
+	catId.oid = oid;
+	return (TypeInfo *) findObjectByCatalogId(catId);
 }
 
 /*
  * findFuncByOid
- *	  finds the entry (in funinfo) of the function with the given oid
+ *	  finds the DumpableObject for the function with the given oid
  *	  returns NULL if not found
  */
 FuncInfo *
 findFuncByOid(Oid oid)
 {
-	return (FuncInfo *) findObjectByOid(oid, funinfoindex, numFuncs);
+	CatalogId	catId;
+
+	catId.tableoid = ProcedureRelationId;
+	catId.oid = oid;
+	return (FuncInfo *) findObjectByCatalogId(catId);
 }
 
 /*
  * findOprByOid
- *	  finds the entry (in oprinfo) of the operator with the given oid
+ *	  finds the DumpableObject for the operator with the given oid
  *	  returns NULL if not found
  */
 OprInfo *
 findOprByOid(Oid oid)
 {
-	return (OprInfo *) findObjectByOid(oid, oprinfoindex, numOperators);
+	CatalogId	catId;
+
+	catId.tableoid = OperatorRelationId;
+	catId.oid = oid;
+	return (OprInfo *) findObjectByCatalogId(catId);
 }
 
 /*
  * findCollationByOid
- *	  finds the entry (in collinfo) of the collation with the given oid
+ *	  finds the DumpableObject for the collation with the given oid
  *	  returns NULL if not found
  */
 CollInfo *
 findCollationByOid(Oid oid)
 {
-	return (CollInfo *) findObjectByOid(oid, collinfoindex, numCollations);
+	CatalogId	catId;
+
+	catId.tableoid = CollationRelationId;
+	catId.oid = oid;
+	return (CollInfo *) findObjectByCatalogId(catId);
 }
 
 /*
  * findNamespaceByOid
- *	  finds the entry (in nspinfo) of the namespace with the given oid
+ *	  finds the DumpableObject for the namespace with the given oid
  *	  returns NULL if not found
  */
 NamespaceInfo *
 findNamespaceByOid(Oid oid)
 {
-	return (NamespaceInfo *) findObjectByOid(oid, nspinfoindex, numNamespaces);
+	CatalogId	catId;
+
+	catId.tableoid = NamespaceRelationId;
+	catId.oid = oid;
+	return (NamespaceInfo *) findObjectByCatalogId(catId);
 }
 
 /*
  * findExtensionByOid
- *	  finds the entry (in extinfo) of the extension with the given oid
+ *	  finds the DumpableObject for the extension with the given oid
  *	  returns NULL if not found
  */
 ExtensionInfo *
 findExtensionByOid(Oid oid)
 {
-	return (ExtensionInfo *) findObjectByOid(oid, extinfoindex, numExtensions);
+	CatalogId	catId;
+
+	catId.tableoid = ExtensionRelationId;
+	catId.oid = oid;
+	return (ExtensionInfo *) findObjectByCatalogId(catId);
 }
 
 /*
  * findPublicationByOid
- *	  finds the entry (in pubinfo) of the publication with the given oid
+ *	  finds the DumpableObject for the publication with the given oid
  *	  returns NULL if not found
  */
 PublicationInfo *
 findPublicationByOid(Oid oid)
 {
-	return (PublicationInfo *) findObjectByOid(oid, pubinfoindex, numPublications);
-}
+	CatalogId	catId;
 
-/*
- * findIndexByOid
- *		find the entry of the index with the given oid
- *
- * This one's signature is different from the previous ones because we lack a
- * global array of all indexes, so caller must pass their array as argument.
- */
-static IndxInfo *
-findIndexByOid(Oid oid, DumpableObject **idxinfoindex, int numIndexes)
-{
-	return (IndxInfo *) findObjectByOid(oid, idxinfoindex, numIndexes);
+	catId.tableoid = PublicationRelationId;
+	catId.oid = oid;
+	return (PublicationInfo *) findObjectByCatalogId(catId);
 }
 
+
 /*
- * setExtensionMembership
- *	  accept and save data about which objects belong to extensions
+ * recordExtensionMembership
+ *	  Record that the object identified by the given catalog ID
+ *	  belongs to the given extension
  */
 void
-setExtensionMembership(ExtensionMemberId *extmems, int nextmems)
+recordExtensionMembership(CatalogId catId, ExtensionInfo *ext)
 {
-	/* Sort array in preparation for binary searches */
-	if (nextmems > 1)
-		qsort((void *) extmems, nextmems, sizeof(ExtensionMemberId),
-			  ExtensionMemberIdCompare);
-	/* And save */
-	extmembers = extmems;
-	numextmembers = nextmems;
+	CatalogIdMapEntry *entry;
+	bool		found;
+
+	/* CatalogId hash table must exist, if we have an ExtensionInfo */
+	Assert(catalogIdHash != NULL);
+
+	/* Add reference to CatalogId hash */
+	entry = catalogid_insert(catalogIdHash, catId, &found);
+	if (!found)
+	{
+		entry->dobj = NULL;
+		entry->ext = NULL;
+	}
+	Assert(entry->ext == NULL);
+	entry->ext = ext;
 }
 
 /*
@@ -1008,56 +929,15 @@ setExtensionMembership(ExtensionMemberId *extmems, int nextmems)
 ExtensionInfo *
 findOwningExtension(CatalogId catalogId)
 {
-	ExtensionMemberId *low;
-	ExtensionMemberId *high;
+	CatalogIdMapEntry *entry;
 
-	/*
-	 * We could use bsearch() here, but the notational cruft of calling
-	 * bsearch is nearly as bad as doing it ourselves; and the generalized
-	 * bsearch function is noticeably slower as well.
-	 */
-	if (numextmembers <= 0)
-		return NULL;
-	low = extmembers;
-	high = extmembers + (numextmembers - 1);
-	while (low <= high)
-	{
-		ExtensionMemberId *middle;
-		int			difference;
-
-		middle = low + (high - low) / 2;
-		/* comparison must match ExtensionMemberIdCompare, below */
-		difference = oidcmp(middle->catId.oid, catalogId.oid);
-		if (difference == 0)
-			difference = oidcmp(middle->catId.tableoid, catalogId.tableoid);
-		if (difference == 0)
-			return middle->ext;
-		else if (difference < 0)
-			low = middle + 1;
-		else
-			high = middle - 1;
-	}
-	return NULL;
-}
-
-/*
- * qsort comparator for ExtensionMemberIds
- */
-static int
-ExtensionMemberIdCompare(const void *p1, const void *p2)
-{
-	const ExtensionMemberId *obj1 = (const ExtensionMemberId *) p1;
-	const ExtensionMemberId *obj2 = (const ExtensionMemberId *) p2;
-	int			cmpval;
+	if (catalogIdHash == NULL)
+		return NULL;			/* no objects exist yet */
 
-	/*
-	 * Compare OID first since it's usually unique, whereas there will only be
-	 * a few distinct values of tableoid.
-	 */
-	cmpval = oidcmp(obj1->catId.oid, obj2->catId.oid);
-	if (cmpval == 0)
-		cmpval = oidcmp(obj1->catId.tableoid, obj2->catId.tableoid);
-	return cmpval;
+	entry = catalogid_lookup(catalogIdHash, catalogId);
+	if (entry == NULL)
+		return NULL;
+	return entry->ext;
 }
 
 
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index 3c1cd858a8..6af10a85a2 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -236,6 +236,7 @@ typedef struct Archive
 
 typedef struct
 {
+	/* Note: this struct must not contain any unused bytes */
 	Oid			tableoid;
 	Oid			oid;
 } CatalogId;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index ed8ed2f266..948615a5b9 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -17901,12 +17901,10 @@ getExtensionMembership(Archive *fout, ExtensionInfo extinfo[],
 	PQExpBuffer query;
 	PGresult   *res;
 	int			ntups,
-				nextmembers,
 				i;
 	int			i_classid,
 				i_objid,
 				i_refobjid;
-	ExtensionMemberId *extmembers;
 	ExtensionInfo *ext;
 
 	/* Nothing to do if no extensions */
@@ -17931,12 +17929,7 @@ getExtensionMembership(Archive *fout, ExtensionInfo extinfo[],
 	i_objid = PQfnumber(res, "objid");
 	i_refobjid = PQfnumber(res, "refobjid");
 
-	extmembers = (ExtensionMemberId *) pg_malloc(ntups * sizeof(ExtensionMemberId));
-	nextmembers = 0;
-
 	/*
-	 * Accumulate data into extmembers[].
-	 *
 	 * Since we ordered the SELECT by referenced ID, we can expect that
 	 * multiple entries for the same extension will appear together; this
 	 * saves on searches.
@@ -17963,16 +17956,11 @@ getExtensionMembership(Archive *fout, ExtensionInfo extinfo[],
 			continue;
 		}
 
-		extmembers[nextmembers].catId = objId;
-		extmembers[nextmembers].ext = ext;
-		nextmembers++;
+		recordExtensionMembership(objId, ext);
 	}
 
 	PQclear(res);
 
-	/* Remember the data for use later */
-	setExtensionMembership(extmembers, nextmembers);
-
 	destroyPQExpBuffer(query);
 }
 
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 29af845ece..cc55e598ec 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -647,16 +647,6 @@ typedef struct _SubscriptionInfo
 	char	   *subpublications;
 } SubscriptionInfo;
 
-/*
- * We build an array of these with an entry for each object that is an
- * extension member according to pg_depend.
- */
-typedef struct _extensionMemberId
-{
-	CatalogId	catId;			/* tableoid+oid of some member object */
-	ExtensionInfo *ext;			/* owning extension */
-} ExtensionMemberId;
-
 /*
  *	common utility functions
  */
@@ -682,7 +672,7 @@ extern NamespaceInfo *findNamespaceByOid(Oid oid);
 extern ExtensionInfo *findExtensionByOid(Oid oid);
 extern PublicationInfo *findPublicationByOid(Oid oid);
 
-extern void setExtensionMembership(ExtensionMemberId *extmems, int nextmems);
+extern void recordExtensionMembership(CatalogId catId, ExtensionInfo *ext);
 extern ExtensionInfo *findOwningExtension(CatalogId catalogId);
 
 extern void parseOidArray(const char *str, Oid *array, int arraysize);
#2Bossart, Nathan
bossartn@amazon.com
In reply to: Tom Lane (#1)
Re: Experimenting with hash tables inside pg_dump

On 10/21/21, 3:29 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

(b) I couldn't measure any change in performance at all. I tried
it on the regression database and on a toy DB with 10000 simple
tables. Maybe on a really large DB you'd notice some difference,
but I'm not very optimistic now.

I wonder how many tables you'd need to start seeing a difference.
There are certainly databases out there with many more than 10,000
tables. I'll look into this...

Nathan

#3Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#1)
Re: Experimenting with hash tables inside pg_dump

Hi,

On 2021-10-21 18:27:25 -0400, Tom Lane wrote:

Today, pg_dump does a lot of internal lookups via binary search
in presorted arrays. I thought it might improve matters
to replace those binary searches with hash tables, theoretically
converting O(log N) searches into O(1) searches. So I tried making
a hash table indexed by CatalogId (tableoid+oid) with simplehash.h,
and replacing as many data structures as I could with that.

That does sound like a good idea in theory...

This makes the code shorter and (IMO anyway) cleaner, but

(a) the executable size increases by a few KB --- apparently, even
the minimum subset of simplehash.h's functionality is code-wasteful.

Hm. Surprised a bit by that. In an optimized build the difference is a
smaller, at least.

optimized:
text data bss dec hex filename
448066 7048 1368 456482 6f722 src/bin/pg_dump/pg_dump
447530 7048 1496 456074 6f58a src/bin/pg_dump/pg_dump.orig

debug:
text data bss dec hex filename
516883 7024 1352 525259 803cb src/bin/pg_dump/pg_dump
509819 7024 1480 518323 7e8b3 src/bin/pg_dump/pg_dump.orig

The fact that optimization plays such a role makes me wonder if a good chunk
of the difference is the slightly more complicated find{Type,Func,...}ByOid()
functions.

(b) I couldn't measure any change in performance at all. I tried
it on the regression database and on a toy DB with 10000 simple
tables. Maybe on a really large DB you'd notice some difference,
but I'm not very optimistic now.

Did you measure runtime of pg_dump, or how much CPU it used? I think a lot of
the time the backend is a bigger bottleneck than pg_dump...

For the regression test DB the majority of the time seems to be spent below
two things:
1) libpq
2) sortDumpableObjects().

I don't think 2) hits the binary search / hashtable path?

It does seem interesting that a substantial part of the time is spent in/below
PQexec() and PQfnumber(). Especially the latter shouldn't be too hard to
optimize away...

Greetings,

Andres Freund

#4Bossart, Nathan
bossartn@amazon.com
In reply to: Andres Freund (#3)
Re: Experimenting with hash tables inside pg_dump

On 10/21/21, 4:14 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote:

On 10/21/21, 3:29 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

(b) I couldn't measure any change in performance at all. I tried
it on the regression database and on a toy DB with 10000 simple
tables. Maybe on a really large DB you'd notice some difference,
but I'm not very optimistic now.

I wonder how many tables you'd need to start seeing a difference.
There are certainly databases out there with many more than 10,000
tables. I'll look into this...

Well, I tested with 200,000 tables and saw no difference with this.

Nathan

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#3)
Re: Experimenting with hash tables inside pg_dump

Andres Freund <andres@anarazel.de> writes:

Did you measure runtime of pg_dump, or how much CPU it used?

I was looking mostly at wall-clock runtime, though I did notice
that the CPU time looked about the same too.

I think a lot of
the time the backend is a bigger bottleneck than pg_dump...

Yeah, that. I tried doing a system-wide "perf" measurement, and soon
realized that a big fraction of the time for a "pg_dump -s" run is
being spent in the planner :-(. I'm currently experimenting with
PREPARE'ing pg_dump's repetitive queries, and it's looking very
promising. More later.

regards, tom lane

#6Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#3)
Re: Experimenting with hash tables inside pg_dump

Hi,

On 2021-10-21 16:37:57 -0700, Andres Freund wrote:

On 2021-10-21 18:27:25 -0400, Tom Lane wrote:

(a) the executable size increases by a few KB --- apparently, even
the minimum subset of simplehash.h's functionality is code-wasteful.

Hm. Surprised a bit by that. In an optimized build the difference is a
smaller, at least.

optimized:
text data bss dec hex filename
448066 7048 1368 456482 6f722 src/bin/pg_dump/pg_dump
447530 7048 1496 456074 6f58a src/bin/pg_dump/pg_dump.orig

debug:
text data bss dec hex filename
516883 7024 1352 525259 803cb src/bin/pg_dump/pg_dump
509819 7024 1480 518323 7e8b3 src/bin/pg_dump/pg_dump.orig

The fact that optimization plays such a role makes me wonder if a good chunk
of the difference is the slightly more complicated find{Type,Func,...}ByOid()
functions.

It's not that.

In a debug build a good chunk of it is due to a bunch of Assert()s. Another
part is that trivial helper functions like SH_PREV() don't get inlined.

The increase for an optimized build seems to boil down to pg_log_error()
invocations. If I replace those with an exit(1), the resulting binaries are
within 100 byte.

If I prevent the compiler from inlining findObjectByCatalogId() in all the
find*ByOid() routines, your version is smaller than master even without other
changes.

Greetings,

Andres Freund

#7Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#5)
Re: Experimenting with hash tables inside pg_dump

Hi,

On 2021-10-21 20:22:56 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:
Yeah, that. I tried doing a system-wide "perf" measurement, and soon
realized that a big fraction of the time for a "pg_dump -s" run is
being spent in the planner :-(.

A trick for seeing the proportions of this easily in perf is to start both
postgres and pg_dump pinned to a specific CPU, and profile that cpu. That gets
rid of most of the noise of other programs etc.

I'm currently experimenting with
PREPARE'ing pg_dump's repetitive queries, and it's looking very
promising. More later.

Good idea.

I wonder though if for some of them we should instead replace the per-object
queries with one query returning the information for all objects of a type. It
doesn't make all that much sense that we build and send one query for each
table and index.

Greetings,

Andres Freund

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#7)
Re: Experimenting with hash tables inside pg_dump

Andres Freund <andres@anarazel.de> writes:

I wonder though if for some of them we should instead replace the per-object
queries with one query returning the information for all objects of a type. It
doesn't make all that much sense that we build and send one query for each
table and index.

The trick is the problem I alluded to in another thread: it's not safe to
do stuff like pg_get_expr() on tables we don't have lock on.

I've thought about doing something like

SELECT unsafe-functions FROM pg_class WHERE oid IN (someoid, someoid, ...)

but in cases with tens of thousands of tables, it seems unlikely that
that's going to behave all that nicely.

The *real* fix, I suppose, would be to fix all those catalog-inspection
functions so that they operate with respect to the query's snapshot.
But that's not a job I'm volunteering for. Besides which, pg_dump
still has to cope with back-rev servers where it wouldn't be safe.

regards, tom lane

#9Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#8)
1 attachment(s)
Re: Experimenting with hash tables inside pg_dump

Hi,

On 2021-10-21 22:13:22 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

I wonder though if for some of them we should instead replace the per-object
queries with one query returning the information for all objects of a type. It
doesn't make all that much sense that we build and send one query for each
table and index.

The trick is the problem I alluded to in another thread: it's not safe to
do stuff like pg_get_expr() on tables we don't have lock on.

I was looking at getTableAttrs() - sending one query instead of #tables
queries yields a quite substantial speedup in a quick prototype. And I don't
think it changes anything around locking semantics.

I've thought about doing something like

SELECT unsafe-functions FROM pg_class WHERE oid IN (someoid, someoid, ...)

but in cases with tens of thousands of tables, it seems unlikely that
that's going to behave all that nicely.

That's kinda what I'm doing in the quick hack. But instead of using IN(...) I
made it unnest('{oid, oid, ...}'), that scales much better.

A pg_dump --schema-only of the regression database goes from

real 0m0.675s
user 0m0.039s
sys 0m0.029s

to

real 0m0.477s
user 0m0.037s
sys 0m0.020s

which isn't half-bad.

There's a few more cases like this I think. But most are harder because the
dumping happens one-by-one from dumpDumpableObject(). The relatively easy but
substantial cases I could find quickly were getIndexes(), getConstraints(),
getTriggers()

To see where it's worth putting in time it'd be useful if getSchemaData() in
verbose mode printed timing information...

The *real* fix, I suppose, would be to fix all those catalog-inspection
functions so that they operate with respect to the query's snapshot.
But that's not a job I'm volunteering for. Besides which, pg_dump
still has to cope with back-rev servers where it wouldn't be safe.

Yea, that's not a small change :(. I suspect that we'd need a bunch of new
caching infrastructure to make that reasonably performant, since this
presumably couldn't use syscache etc.

Greetings,

Andres Freund

Attachments:

pg_dump-bulk-gettableattrs.difftext/x-diff; charset=us-asciiDownload
diff --git c/src/bin/pg_dump/pg_dump.c i/src/bin/pg_dump/pg_dump.c
index ed8ed2f266e..7ec84428e9c 100644
--- c/src/bin/pg_dump/pg_dump.c
+++ i/src/bin/pg_dump/pg_dump.c
@@ -43,6 +43,7 @@
 #include "access/transam.h"
 #include "catalog/pg_aggregate_d.h"
 #include "catalog/pg_am_d.h"
+#include "catalog/pg_attrdef.h"
 #include "catalog/pg_attribute_d.h"
 #include "catalog/pg_authid_d.h"
 #include "catalog/pg_cast_d.h"
@@ -8397,6 +8398,14 @@ getTableAttrs(Archive *fout, TableInfo *tblinfo, int numTables)
 {
 	DumpOptions *dopt = fout->dopt;
 	PQExpBuffer q = createPQExpBuffer();
+	PQExpBuffer tblidx = createPQExpBuffer();
+	PQExpBuffer tbloid = createPQExpBuffer();
+
+	PGresult   *res;
+	bool		first;
+	int			i_tbloid;
+	int			i_idx;
+	int			i_natts;
 	int			i_attnum;
 	int			i_attname;
 	int			i_atttypname;
@@ -8417,13 +8426,28 @@ getTableAttrs(Archive *fout, TableInfo *tblinfo, int numTables)
 	int			i_attfdwoptions;
 	int			i_attmissingval;
 	int			i_atthasdef;
+	int			i_attdef;
+	int			i_attdefoid;
 
+	int			ntups;
+	int			tblidx_prev = -1;
+	Oid			tbloid_prev = InvalidOid;
+	int			natts_prev = -1;
+	int			natt_prev = -1;
+	TableInfo  *tbinfo = NULL;
+	int			tbstart = -1;
+
+	resetPQExpBuffer(tblidx);
+	resetPQExpBuffer(tbloid);
+
+	appendPQExpBufferStr(tblidx, "'{");
+	appendPQExpBufferStr(tbloid, "'{");
+
+	/* find all the user attributes and their types */
+	first = true;
 	for (int i = 0; i < numTables; i++)
 	{
 		TableInfo  *tbinfo = &tblinfo[i];
-		PGresult   *res;
-		int			ntups;
-		bool		hasdefaults;
 
 		/* Don't bother to collect info for sequences */
 		if (tbinfo->relkind == RELKIND_SEQUENCE)
@@ -8433,301 +8457,376 @@ getTableAttrs(Archive *fout, TableInfo *tblinfo, int numTables)
 		if (!tbinfo->interesting)
 			continue;
 
-		/* find all the user attributes and their types */
-
-		/*
-		 * we must read the attribute names in attribute number order! because
-		 * we will use the attnum to index into the attnames array later.
-		 */
-		pg_log_info("finding the columns and types of table \"%s.%s\"",
-					tbinfo->dobj.namespace->dobj.name,
-					tbinfo->dobj.name);
-
-		resetPQExpBuffer(q);
-
-		appendPQExpBufferStr(q,
-							 "SELECT\n"
-							 "a.attnum,\n"
-							 "a.attname,\n"
-							 "a.atttypmod,\n"
-							 "a.attstattarget,\n"
-							 "a.attstorage,\n"
-							 "t.typstorage,\n"
-							 "a.attnotnull,\n"
-							 "a.atthasdef,\n"
-							 "a.attisdropped,\n"
-							 "a.attlen,\n"
-							 "a.attalign,\n"
-							 "a.attislocal,\n"
-							 "pg_catalog.format_type(t.oid, a.atttypmod) AS atttypname,\n");
-
-		if (fout->remoteVersion >= 90000)
-			appendPQExpBufferStr(q,
-								 "array_to_string(a.attoptions, ', ') AS attoptions,\n");
+		if (first)
+		{
+			appendPQExpBuffer(tblidx, "%u", i);
+			appendPQExpBuffer(tbloid, "%u", tbinfo->dobj.catId.oid);
+			first = false;
+		}
 		else
-			appendPQExpBufferStr(q,
-								 "'' AS attoptions,\n");
+		{
+			appendPQExpBuffer(tblidx, ", %u", i);
+			appendPQExpBuffer(tbloid, ", %u", tbinfo->dobj.catId.oid);
+		}
+	}
 
-		if (fout->remoteVersion >= 90100)
+	appendPQExpBufferStr(tblidx, "}'::int[]");
+	appendPQExpBufferStr(tbloid, "}'::oid[]");
+
+	resetPQExpBuffer(q);
+	appendPQExpBufferStr(q,
+						 "SELECT\n"
+						 "src.idx,\n"
+						 "src.tbloid,\n"
+						 "count(*) OVER(PARTITION BY idx) AS natts,\n"
+						 "a.attnum,\n"
+						 "a.attname,\n"
+						 "a.atttypmod,\n"
+						 "a.attstattarget,\n"
+						 "a.attstorage,\n"
+						 "t.typstorage,\n"
+						 "a.attnotnull,\n"
+						 "a.atthasdef,\n"
+						 "a.attisdropped,\n"
+						 "a.attlen,\n"
+						 "a.attalign,\n"
+						 "a.attislocal,\n"
+						 "pg_catalog.format_type(t.oid, a.atttypmod) AS atttypname,\n");
+
+	if (fout->remoteVersion >= 90000)
+		appendPQExpBufferStr(q,
+							 "array_to_string(a.attoptions, ', ') AS attoptions,\n");
+	else
+		appendPQExpBufferStr(q,
+							 "'' AS attoptions,\n");
+
+	if (fout->remoteVersion >= 90100)
+	{
+		/*
+		 * Since we only want to dump COLLATE clauses for attributes whose
+		 * collation is different from their type's default, we use a CASE
+		 * here to suppress uninteresting attcollations cheaply.
+		 */
+		appendPQExpBufferStr(q,
+							 "CASE WHEN a.attcollation <> t.typcollation "
+							 "THEN a.attcollation ELSE 0 END AS attcollation,\n");
+	}
+	else
+		appendPQExpBufferStr(q,
+							 "0 AS attcollation,\n");
+
+	if (fout->remoteVersion >= 140000)
+		appendPQExpBuffer(q,
+						  "a.attcompression AS attcompression,\n");
+	else
+		appendPQExpBuffer(q,
+						  "'' AS attcompression,\n");
+
+	if (fout->remoteVersion >= 90200)
+		appendPQExpBufferStr(q,
+							 "pg_catalog.array_to_string(ARRAY("
+							 "SELECT pg_catalog.quote_ident(option_name) || "
+							 "' ' || pg_catalog.quote_literal(option_value) "
+							 "FROM pg_catalog.pg_options_to_table(attfdwoptions) "
+							 "ORDER BY option_name"
+							 "), E',\n    ') AS attfdwoptions,\n");
+	else
+		appendPQExpBufferStr(q,
+							 "'' AS attfdwoptions,\n");
+
+	if (fout->remoteVersion >= 100000)
+		appendPQExpBufferStr(q,
+							 "a.attidentity,\n");
+	else
+		appendPQExpBufferStr(q,
+							 "'' AS attidentity,\n");
+
+	if (fout->remoteVersion >= 110000)
+		appendPQExpBufferStr(q,
+							 "CASE WHEN a.atthasmissing AND NOT a.attisdropped "
+							 "THEN a.attmissingval ELSE null END AS attmissingval,\n");
+	else
+		appendPQExpBufferStr(q,
+							 "NULL AS attmissingval,\n");
+
+	if (fout->remoteVersion >= 120000)
+		appendPQExpBufferStr(q,
+							 "a.attgenerated,\n");
+	else
+		appendPQExpBufferStr(q,
+							 "'' AS attgenerated,\n");
+
+	/*
+	 * Get info about column defaults.  This is skipped for a data-only
+	 * dump, as it is only needed for table schemas.
+	 */
+	if (!dopt->dataOnly)
+	{
+		appendPQExpBufferStr(q,
+							 "CASE WHEN atthasdef THEN\n"
+							 "  pg_catalog.pg_get_expr(atdef.adbin, atdef.adrelid)\n"
+							 "END AS attdef,\n");
+		appendPQExpBufferStr(q, "atdef.oid AS attdefoid\n");
+	}
+	else
+	{
+		appendPQExpBufferStr(q, "NULL::text AS attdef,\n");
+		appendPQExpBufferStr(q, "NULL::oid AS attdefoid\n");
+	}
+
+	/* need left join here to not fail on dropped columns ... */
+	appendPQExpBuffer(q,
+					  "FROM\n"
+					  "  unnest(%s,\n"
+					  "         %s)\n"
+					  "    AS src(idx, tbloid)\n"
+					  "  JOIN pg_catalog.pg_attribute a\n"
+					  "    ON (src.tbloid = a.attrelid AND a.attnum > 0::pg_catalog.int2)\n"
+					  "  LEFT JOIN pg_catalog.pg_type t\n"
+					  "    ON (a.atttypid = t.oid)\n"
+					  "  LEFT JOIN pg_catalog.pg_attrdef atdef\n"
+					  "    ON (a.atthasdef AND atdef.adrelid = src.tbloid AND atdef.adnum = a.attnum)\n"
+					  "ORDER BY idx, a.attnum\n",
+					  tblidx->data,
+					  tbloid->data);
+
+	res = ExecuteSqlQuery(fout, q->data, PGRES_TUPLES_OK);
+
+	i_idx = PQfnumber(res, "idx");
+	i_tbloid = PQfnumber(res, "tbloid");
+	i_natts = PQfnumber(res, "natts");
+	i_attnum = PQfnumber(res, "attnum");
+	i_attname = PQfnumber(res, "attname");
+	i_atttypname = PQfnumber(res, "atttypname");
+	i_atttypmod = PQfnumber(res, "atttypmod");
+	i_attstattarget = PQfnumber(res, "attstattarget");
+	i_attstorage = PQfnumber(res, "attstorage");
+	i_typstorage = PQfnumber(res, "typstorage");
+	i_attidentity = PQfnumber(res, "attidentity");
+	i_attgenerated = PQfnumber(res, "attgenerated");
+	i_attisdropped = PQfnumber(res, "attisdropped");
+	i_attlen = PQfnumber(res, "attlen");
+	i_attalign = PQfnumber(res, "attalign");
+	i_attislocal = PQfnumber(res, "attislocal");
+	i_attnotnull = PQfnumber(res, "attnotnull");
+	i_attoptions = PQfnumber(res, "attoptions");
+	i_attcollation = PQfnumber(res, "attcollation");
+	i_attcompression = PQfnumber(res, "attcompression");
+	i_attfdwoptions = PQfnumber(res, "attfdwoptions");
+	i_attmissingval = PQfnumber(res, "attmissingval");
+	i_atthasdef = PQfnumber(res, "atthasdef");
+	i_attdef = PQfnumber(res, "attdef");
+	i_attdefoid = PQfnumber(res, "attdefoid");
+
+	ntups = PQntuples(res);
+
+	for (int r = 0; r < ntups; r++)
+	{
+		int tblidx;
+		int tbloid;
+		int natts;
+		int natt;
+		int attnum;
+		bool hasdefault;
+
+		tblidx = atoi(PQgetvalue(res, r, i_idx));
+		tbloid = atoi(PQgetvalue(res, r, i_tbloid));
+		natts = atoi(PQgetvalue(res, r, i_natts));
+		attnum = atoi(PQgetvalue(res, r, i_attnum));
+		natt = attnum - 1;
+
+		if (tblidx_prev == -1 || tblidx != tblidx_prev)
 		{
 			/*
-			 * Since we only want to dump COLLATE clauses for attributes whose
-			 * collation is different from their type's default, we use a CASE
-			 * here to suppress uninteresting attcollations cheaply.
+			 * Either the first row for the first table, or rows for a
+			 * different table are starting.
 			 */
-			appendPQExpBufferStr(q,
-								 "CASE WHEN a.attcollation <> t.typcollation "
-								 "THEN a.attcollation ELSE 0 END AS attcollation,\n");
+
+			/*
+			 * FIXME: verify indexes aren't running over end of index,
+			 * e.g. due to a maliscious server.
+			 */
+			Assert(attnum == 1);
+
+			tbinfo = &tblinfo[tblidx];
+
+			tbinfo->numatts = natts;
+			tbinfo->attnames = (char **) pg_malloc(natts * sizeof(char *));
+			tbinfo->atttypnames = (char **) pg_malloc(natts * sizeof(char *));
+			tbinfo->atttypmod = (int *) pg_malloc(natts * sizeof(int));
+			tbinfo->attstattarget = (int *) pg_malloc(natts * sizeof(int));
+			tbinfo->attstorage = (char *) pg_malloc(natts * sizeof(char));
+			tbinfo->typstorage = (char *) pg_malloc(natts * sizeof(char));
+			tbinfo->attidentity = (char *) pg_malloc(natts * sizeof(char));
+			tbinfo->attgenerated = (char *) pg_malloc(natts * sizeof(char));
+			tbinfo->attisdropped = (bool *) pg_malloc(natts * sizeof(bool));
+			tbinfo->attlen = (int *) pg_malloc(natts * sizeof(int));
+			tbinfo->attalign = (char *) pg_malloc(natts * sizeof(char));
+			tbinfo->attislocal = (bool *) pg_malloc(natts * sizeof(bool));
+			tbinfo->attoptions = (char **) pg_malloc(natts * sizeof(char *));
+			tbinfo->attcollation = (Oid *) pg_malloc(natts * sizeof(Oid));
+			tbinfo->attcompression = (char *) pg_malloc(natts * sizeof(char));
+			tbinfo->attfdwoptions = (char **) pg_malloc(natts * sizeof(char *));
+			tbinfo->attmissingval = (char **) pg_malloc(natts * sizeof(char *));
+			tbinfo->notnull = (bool *) pg_malloc(natts * sizeof(bool));
+			tbinfo->inhNotNull = (bool *) pg_malloc(natts * sizeof(bool));
+			tbinfo->attrdefs = (AttrDefInfo **) pg_malloc(natts * sizeof(AttrDefInfo *));
+
+			tbstart = r;
+			tblidx_prev = tblidx;
+			tbloid_prev = tbloid;
+			natts_prev = natts;
+			natt_prev = 0;
+
+			pg_log_info("processing rows of table \"%s.%s\"",
+						tbinfo->dobj.namespace->dobj.name,
+						tbinfo->dobj.name);
 		}
 		else
-			appendPQExpBufferStr(q,
-								 "0 AS attcollation,\n");
-
-		if (fout->remoteVersion >= 140000)
-			appendPQExpBuffer(q,
-							  "a.attcompression AS attcompression,\n");
-		else
-			appendPQExpBuffer(q,
-							  "'' AS attcompression,\n");
-
-		if (fout->remoteVersion >= 90200)
-			appendPQExpBufferStr(q,
-								 "pg_catalog.array_to_string(ARRAY("
-								 "SELECT pg_catalog.quote_ident(option_name) || "
-								 "' ' || pg_catalog.quote_literal(option_value) "
-								 "FROM pg_catalog.pg_options_to_table(attfdwoptions) "
-								 "ORDER BY option_name"
-								 "), E',\n    ') AS attfdwoptions,\n");
-		else
-			appendPQExpBufferStr(q,
-								 "'' AS attfdwoptions,\n");
-
-		if (fout->remoteVersion >= 100000)
-			appendPQExpBufferStr(q,
-								 "a.attidentity,\n");
-		else
-			appendPQExpBufferStr(q,
-								 "'' AS attidentity,\n");
-
-		if (fout->remoteVersion >= 110000)
-			appendPQExpBufferStr(q,
-								 "CASE WHEN a.atthasmissing AND NOT a.attisdropped "
-								 "THEN a.attmissingval ELSE null END AS attmissingval,\n");
-		else
-			appendPQExpBufferStr(q,
-								 "NULL AS attmissingval,\n");
-
-		if (fout->remoteVersion >= 120000)
-			appendPQExpBufferStr(q,
-								 "a.attgenerated\n");
-		else
-			appendPQExpBufferStr(q,
-								 "'' AS attgenerated\n");
-
-		/* need left join here to not fail on dropped columns ... */
-		appendPQExpBuffer(q,
-						  "FROM pg_catalog.pg_attribute a LEFT JOIN pg_catalog.pg_type t "
-						  "ON a.atttypid = t.oid\n"
-						  "WHERE a.attrelid = '%u'::pg_catalog.oid "
-						  "AND a.attnum > 0::pg_catalog.int2\n"
-						  "ORDER BY a.attnum",
-						  tbinfo->dobj.catId.oid);
-
-		res = ExecuteSqlQuery(fout, q->data, PGRES_TUPLES_OK);
-
-		ntups = PQntuples(res);
-
-		tbinfo->numatts = ntups;
-		tbinfo->attnames = (char **) pg_malloc(ntups * sizeof(char *));
-		tbinfo->atttypnames = (char **) pg_malloc(ntups * sizeof(char *));
-		tbinfo->atttypmod = (int *) pg_malloc(ntups * sizeof(int));
-		tbinfo->attstattarget = (int *) pg_malloc(ntups * sizeof(int));
-		tbinfo->attstorage = (char *) pg_malloc(ntups * sizeof(char));
-		tbinfo->typstorage = (char *) pg_malloc(ntups * sizeof(char));
-		tbinfo->attidentity = (char *) pg_malloc(ntups * sizeof(char));
-		tbinfo->attgenerated = (char *) pg_malloc(ntups * sizeof(char));
-		tbinfo->attisdropped = (bool *) pg_malloc(ntups * sizeof(bool));
-		tbinfo->attlen = (int *) pg_malloc(ntups * sizeof(int));
-		tbinfo->attalign = (char *) pg_malloc(ntups * sizeof(char));
-		tbinfo->attislocal = (bool *) pg_malloc(ntups * sizeof(bool));
-		tbinfo->attoptions = (char **) pg_malloc(ntups * sizeof(char *));
-		tbinfo->attcollation = (Oid *) pg_malloc(ntups * sizeof(Oid));
-		tbinfo->attcompression = (char *) pg_malloc(ntups * sizeof(char));
-		tbinfo->attfdwoptions = (char **) pg_malloc(ntups * sizeof(char *));
-		tbinfo->attmissingval = (char **) pg_malloc(ntups * sizeof(char *));
-		tbinfo->notnull = (bool *) pg_malloc(ntups * sizeof(bool));
-		tbinfo->inhNotNull = (bool *) pg_malloc(ntups * sizeof(bool));
-		tbinfo->attrdefs = (AttrDefInfo **) pg_malloc(ntups * sizeof(AttrDefInfo *));
-		hasdefaults = false;
-
-		i_attnum = PQfnumber(res, "attnum");
-		i_attname = PQfnumber(res, "attname");
-		i_atttypname = PQfnumber(res, "atttypname");
-		i_atttypmod = PQfnumber(res, "atttypmod");
-		i_attstattarget = PQfnumber(res, "attstattarget");
-		i_attstorage = PQfnumber(res, "attstorage");
-		i_typstorage = PQfnumber(res, "typstorage");
-		i_attidentity = PQfnumber(res, "attidentity");
-		i_attgenerated = PQfnumber(res, "attgenerated");
-		i_attisdropped = PQfnumber(res, "attisdropped");
-		i_attlen = PQfnumber(res, "attlen");
-		i_attalign = PQfnumber(res, "attalign");
-		i_attislocal = PQfnumber(res, "attislocal");
-		i_attnotnull = PQfnumber(res, "attnotnull");
-		i_attoptions = PQfnumber(res, "attoptions");
-		i_attcollation = PQfnumber(res, "attcollation");
-		i_attcompression = PQfnumber(res, "attcompression");
-		i_attfdwoptions = PQfnumber(res, "attfdwoptions");
-		i_attmissingval = PQfnumber(res, "attmissingval");
-		i_atthasdef = PQfnumber(res, "atthasdef");
-
-		for (int j = 0; j < ntups; j++)
 		{
-			if (j + 1 != atoi(PQgetvalue(res, j, i_attnum)))
-				fatal("invalid column numbering in table \"%s\"",
-					  tbinfo->dobj.name);
-			tbinfo->attnames[j] = pg_strdup(PQgetvalue(res, j, i_attname));
-			tbinfo->atttypnames[j] = pg_strdup(PQgetvalue(res, j, i_atttypname));
-			tbinfo->atttypmod[j] = atoi(PQgetvalue(res, j, i_atttypmod));
-			tbinfo->attstattarget[j] = atoi(PQgetvalue(res, j, i_attstattarget));
-			tbinfo->attstorage[j] = *(PQgetvalue(res, j, i_attstorage));
-			tbinfo->typstorage[j] = *(PQgetvalue(res, j, i_typstorage));
-			tbinfo->attidentity[j] = *(PQgetvalue(res, j, i_attidentity));
-			tbinfo->attgenerated[j] = *(PQgetvalue(res, j, i_attgenerated));
-			tbinfo->needs_override = tbinfo->needs_override || (tbinfo->attidentity[j] == ATTRIBUTE_IDENTITY_ALWAYS);
-			tbinfo->attisdropped[j] = (PQgetvalue(res, j, i_attisdropped)[0] == 't');
-			tbinfo->attlen[j] = atoi(PQgetvalue(res, j, i_attlen));
-			tbinfo->attalign[j] = *(PQgetvalue(res, j, i_attalign));
-			tbinfo->attislocal[j] = (PQgetvalue(res, j, i_attislocal)[0] == 't');
-			tbinfo->notnull[j] = (PQgetvalue(res, j, i_attnotnull)[0] == 't');
-			tbinfo->attoptions[j] = pg_strdup(PQgetvalue(res, j, i_attoptions));
-			tbinfo->attcollation[j] = atooid(PQgetvalue(res, j, i_attcollation));
-			tbinfo->attcompression[j] = *(PQgetvalue(res, j, i_attcompression));
-			tbinfo->attfdwoptions[j] = pg_strdup(PQgetvalue(res, j, i_attfdwoptions));
-			tbinfo->attmissingval[j] = pg_strdup(PQgetvalue(res, j, i_attmissingval));
-			tbinfo->attrdefs[j] = NULL; /* fix below */
-			if (PQgetvalue(res, j, i_atthasdef)[0] == 't')
-				hasdefaults = true;
-			/* these flags will be set in flagInhAttrs() */
-			tbinfo->inhNotNull[j] = false;
+			/* FIXME: verify attrs match */
+			Assert(tbloid_prev == tbloid);
+			Assert(natts_prev == natts);
+			Assert(natt_prev + 1 == natt);
+			natt_prev++;
 		}
 
-		PQclear(res);
+		natt = r - tbstart;
+
+		if (natt + 1 != attnum)
+			fatal("invalid column numbering in table \"%s\"",
+				  tbinfo->dobj.name);
+		tbinfo->attnames[natt] = pg_strdup(PQgetvalue(res, r, i_attname));
+		tbinfo->atttypnames[natt] = pg_strdup(PQgetvalue(res, r, i_atttypname));
+		tbinfo->atttypmod[natt] = atoi(PQgetvalue(res, r, i_atttypmod));
+		tbinfo->attstattarget[natt] = atoi(PQgetvalue(res, r, i_attstattarget));
+		tbinfo->attstorage[natt] = *(PQgetvalue(res, r, i_attstorage));
+		tbinfo->typstorage[natt] = *(PQgetvalue(res, r, i_typstorage));
+		tbinfo->attidentity[natt] = *(PQgetvalue(res, r, i_attidentity));
+		tbinfo->attgenerated[natt] = *(PQgetvalue(res, r, i_attgenerated));
+		tbinfo->needs_override = tbinfo->needs_override || (tbinfo->attidentity[natt] == ATTRIBUTE_IDENTITY_ALWAYS);
+		tbinfo->attisdropped[natt] = (PQgetvalue(res, r, i_attisdropped)[0] == 't');
+		tbinfo->attlen[natt] = atoi(PQgetvalue(res, r, i_attlen));
+		tbinfo->attalign[natt] = *(PQgetvalue(res, r, i_attalign));
+		tbinfo->attislocal[natt] = (PQgetvalue(res, r, i_attislocal)[0] == 't');
+		tbinfo->notnull[natt] = (PQgetvalue(res, r, i_attnotnull)[0] == 't');
+		tbinfo->attoptions[natt] = pg_strdup(PQgetvalue(res, r, i_attoptions));
+		tbinfo->attcollation[natt] = atooid(PQgetvalue(res, r, i_attcollation));
+		tbinfo->attcompression[natt] = *(PQgetvalue(res, r, i_attcompression));
+		tbinfo->attfdwoptions[natt] = pg_strdup(PQgetvalue(res, r, i_attfdwoptions));
+		tbinfo->attmissingval[natt] = pg_strdup(PQgetvalue(res, r, i_attmissingval));
+		tbinfo->attrdefs[natt] = NULL; /* fix below */
+
+		/* these flags will be set in flagInhAttrs() */
+		tbinfo->inhNotNull[natt] = false;
+
+		hasdefault = PQgetvalue(res, r, i_atthasdef)[0] == 't';
 
 		/*
 		 * Get info about column defaults.  This is skipped for a data-only
 		 * dump, as it is only needed for table schemas.
 		 */
-		if (!dopt->dataOnly && hasdefaults)
+		/*
+		 * dropped columns shouldn't have defaults, but just in case,
+		 * ignore 'em
+		 */
+		if (!dopt->dataOnly && hasdefault)
 		{
-			AttrDefInfo *attrdefs;
-			int			numDefaults;
+			AttrDefInfo *attrdef;
 
-			pg_log_info("finding default expressions of table \"%s.%s\"",
-						tbinfo->dobj.namespace->dobj.name,
-						tbinfo->dobj.name);
+			/*
+			 * dropped columns shouldn't have defaults, but just in case,
+			 * ignore 'em
+			 */
+			if (tbinfo->attisdropped[natt])
+				continue;
 
-			printfPQExpBuffer(q, "SELECT tableoid, oid, adnum, "
-							  "pg_catalog.pg_get_expr(adbin, adrelid) AS adsrc "
-							  "FROM pg_catalog.pg_attrdef "
-							  "WHERE adrelid = '%u'::pg_catalog.oid",
-							  tbinfo->dobj.catId.oid);
+			attrdef = pg_malloc(sizeof(AttrDefInfo));
+			tbinfo->attrdefs[natt] = attrdef;
 
-			res = ExecuteSqlQuery(fout, q->data, PGRES_TUPLES_OK);
+			attrdef->dobj.objType = DO_ATTRDEF;
+			attrdef->dobj.catId.tableoid = AttrDefaultRelationId;
+			attrdef->dobj.catId.oid = atooid(PQgetvalue(res, r, i_attdefoid));
+			AssignDumpId(&attrdef->dobj);
+			attrdef->adtable = tbinfo;
+			attrdef->adnum = natt + 1;
+			attrdef->adef_expr = pg_strdup(PQgetvalue(res, r, i_attdef));
 
-			numDefaults = PQntuples(res);
-			attrdefs = (AttrDefInfo *) pg_malloc(numDefaults * sizeof(AttrDefInfo));
+			attrdef->dobj.name = pg_strdup(tbinfo->dobj.name);
+			attrdef->dobj.namespace = tbinfo->dobj.namespace;
 
-			for (int j = 0; j < numDefaults; j++)
+			attrdef->dobj.dump = tbinfo->dobj.dump;
+
+			/*
+			 * Figure out whether the default/generation expression should
+			 * be dumped as part of the main CREATE TABLE (or similar)
+			 * command or as a separate ALTER TABLE (or similar) command.
+			 * The preference is to put it into the CREATE command, but in
+			 * some cases that's not possible.
+			 */
+			if (tbinfo->attgenerated[natt])
 			{
-				int			adnum;
-
-				adnum = atoi(PQgetvalue(res, j, 2));
-
-				if (adnum <= 0 || adnum > ntups)
-					fatal("invalid adnum value %d for table \"%s\"",
-						  adnum, tbinfo->dobj.name);
-
 				/*
-				 * dropped columns shouldn't have defaults, but just in case,
-				 * ignore 'em
+				 * Column generation expressions cannot be dumped
+				 * separately, because there is no syntax for it.  The
+				 * !shouldPrintColumn case below will be tempted to set
+				 * them to separate if they are attached to an inherited
+				 * column without a local definition, but that would be
+				 * wrong and unnecessary, because generation expressions
+				 * are always inherited, so there is no need to set them
+				 * again in child tables, and there is no syntax for it
+				 * either.  By setting separate to false here we prevent
+				 * the "default" from being processed as its own dumpable
+				 * object, and flagInhAttrs() will remove it from the
+				 * table when it detects that it belongs to an inherited
+				 * column.
 				 */
-				if (tbinfo->attisdropped[adnum - 1])
-					continue;
-
-				attrdefs[j].dobj.objType = DO_ATTRDEF;
-				attrdefs[j].dobj.catId.tableoid = atooid(PQgetvalue(res, j, 0));
-				attrdefs[j].dobj.catId.oid = atooid(PQgetvalue(res, j, 1));
-				AssignDumpId(&attrdefs[j].dobj);
-				attrdefs[j].adtable = tbinfo;
-				attrdefs[j].adnum = adnum;
-				attrdefs[j].adef_expr = pg_strdup(PQgetvalue(res, j, 3));
-
-				attrdefs[j].dobj.name = pg_strdup(tbinfo->dobj.name);
-				attrdefs[j].dobj.namespace = tbinfo->dobj.namespace;
-
-				attrdefs[j].dobj.dump = tbinfo->dobj.dump;
-
-				/*
-				 * Figure out whether the default/generation expression should
-				 * be dumped as part of the main CREATE TABLE (or similar)
-				 * command or as a separate ALTER TABLE (or similar) command.
-				 * The preference is to put it into the CREATE command, but in
-				 * some cases that's not possible.
-				 */
-				if (tbinfo->attgenerated[adnum - 1])
-				{
-					/*
-					 * Column generation expressions cannot be dumped
-					 * separately, because there is no syntax for it.  The
-					 * !shouldPrintColumn case below will be tempted to set
-					 * them to separate if they are attached to an inherited
-					 * column without a local definition, but that would be
-					 * wrong and unnecessary, because generation expressions
-					 * are always inherited, so there is no need to set them
-					 * again in child tables, and there is no syntax for it
-					 * either.  By setting separate to false here we prevent
-					 * the "default" from being processed as its own dumpable
-					 * object, and flagInhAttrs() will remove it from the
-					 * table when it detects that it belongs to an inherited
-					 * column.
-					 */
-					attrdefs[j].separate = false;
-				}
-				else if (tbinfo->relkind == RELKIND_VIEW)
-				{
-					/*
-					 * Defaults on a VIEW must always be dumped as separate
-					 * ALTER TABLE commands.
-					 */
-					attrdefs[j].separate = true;
-				}
-				else if (!shouldPrintColumn(dopt, tbinfo, adnum - 1))
-				{
-					/* column will be suppressed, print default separately */
-					attrdefs[j].separate = true;
-				}
-				else
-				{
-					attrdefs[j].separate = false;
-				}
-
-				if (!attrdefs[j].separate)
-				{
-					/*
-					 * Mark the default as needing to appear before the table,
-					 * so that any dependencies it has must be emitted before
-					 * the CREATE TABLE.  If this is not possible, we'll
-					 * change to "separate" mode while sorting dependencies.
-					 */
-					addObjectDependency(&tbinfo->dobj,
-										attrdefs[j].dobj.dumpId);
-				}
-
-				tbinfo->attrdefs[adnum - 1] = &attrdefs[j];
+				attrdef->separate = false;
+			}
+			else if (tbinfo->relkind == RELKIND_VIEW)
+			{
+				/*
+				 * Defaults on a VIEW must always be dumped as separate
+				 * ALTER TABLE commands.
+				 */
+				attrdef->separate = true;
+			}
+			else if (!shouldPrintColumn(dopt, tbinfo, natt))
+			{
+				/* column will be suppressed, print default separately */
+				attrdef->separate = true;
+			}
+			else
+			{
+				attrdef->separate = false;
+			}
+
+			if (!attrdef->separate)
+			{
+				/*
+				 * Mark the default as needing to appear before the table,
+				 * so that any dependencies it has must be emitted before
+				 * the CREATE TABLE.  If this is not possible, we'll
+				 * change to "separate" mode while sorting dependencies.
+				 */
+				addObjectDependency(&tbinfo->dobj,
+									attrdef->dobj.dumpId);
 			}
-			PQclear(res);
 		}
+	}
+
+	for (int i = 0; i < numTables; i++)
+	{
+		TableInfo  *tbinfo = &tblinfo[i];
+		PGresult   *res;
+
+		/* Don't bother to collect info for sequences */
+		if (tbinfo->relkind == RELKIND_SEQUENCE)
+			continue;
+
+		/* Don't bother with uninteresting tables, either */
+		if (!tbinfo->interesting)
+			continue;
 
 		/*
 		 * Get info about table CHECK constraints.  This is skipped for a
@@ -8851,6 +8950,8 @@ getTableAttrs(Archive *fout, TableInfo *tblinfo, int numTables)
 	}
 
 	destroyPQExpBuffer(q);
+	destroyPQExpBuffer(tblidx);
+	destroyPQExpBuffer(tbloid);
 }
 
 /*
#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#9)
Re: Experimenting with hash tables inside pg_dump

Andres Freund <andres@anarazel.de> writes:

On 2021-10-21 22:13:22 -0400, Tom Lane wrote:

I've thought about doing something like
SELECT unsafe-functions FROM pg_class WHERE oid IN (someoid, someoid, ...)
but in cases with tens of thousands of tables, it seems unlikely that
that's going to behave all that nicely.

That's kinda what I'm doing in the quick hack. But instead of using IN(...) I
made it unnest('{oid, oid, ...}'), that scales much better.

I'm skeptical of that, mainly because it doesn't work in old servers,
and I really don't want to maintain two fundamentally different
versions of getTableAttrs(). I don't think you actually need the
multi-array form of unnest() here --- we know the TableInfo array
is in OID order --- but even the single-array form only works
back to 8.4.

However ... looking through getTableAttrs' main query, it seems
like the only thing there that's potentially unsafe is the
"format_type(t.oid, a.atttypmod)" call. I wonder if it could be
sane to convert it into a single query that just scans all of
pg_attribute, and then deal with creating the formatted type names
separately, perhaps with an improved version of getFormattedTypeName
that could cache the results for non-default typmods. The main
knock on this approach is the temptation for somebody to stick some
unsafe function into the query in future. We could stick a big fat
warning comment into the code, but lately I despair of people reading
comments.

To see where it's worth putting in time it'd be useful if getSchemaData() in
verbose mode printed timing information...

I've been running test cases with log_min_duration_statement = 0,
which serves well enough.

regards, tom lane

#11Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#10)
Re: Experimenting with hash tables inside pg_dump

Hi,

On 2021-10-22 10:53:31 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2021-10-21 22:13:22 -0400, Tom Lane wrote:

I've thought about doing something like
SELECT unsafe-functions FROM pg_class WHERE oid IN (someoid, someoid, ...)
but in cases with tens of thousands of tables, it seems unlikely that
that's going to behave all that nicely.

That's kinda what I'm doing in the quick hack. But instead of using IN(...) I
made it unnest('{oid, oid, ...}'), that scales much better.

I'm skeptical of that, mainly because it doesn't work in old servers,
and I really don't want to maintain two fundamentally different
versions of getTableAttrs(). I don't think you actually need the
multi-array form of unnest() here --- we know the TableInfo array
is in OID order --- but even the single-array form only works
back to 8.4.

I think we can address that, if we think it's overall a promising approach to
pursue. E.g. if we don't need the indexes, we can make it = ANY().

However ... looking through getTableAttrs' main query, it seems
like the only thing there that's potentially unsafe is the
"format_type(t.oid, a.atttypmod)" call.

I assume the default expression bit would also be unsafe?

Greetings,

Andres Freund

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#11)
Re: Experimenting with hash tables inside pg_dump

Andres Freund <andres@anarazel.de> writes:

On 2021-10-22 10:53:31 -0400, Tom Lane wrote:

I'm skeptical of that, mainly because it doesn't work in old servers,

I think we can address that, if we think it's overall a promising approach to
pursue. E.g. if we don't need the indexes, we can make it = ANY().

Hmm ... yeah, I guess we could get away with that. It might not scale
as nicely to a huge database, but probably dumping a huge database
from an ancient server isn't all that interesting.

I'm inclined to think that it could be sane to make getTableAttrs
and getIndexes use this style, but we probably still want functions
and such to use per-object queries. In those other catalogs there
are many built-in objects that we don't really care about. The
prepared-queries hack I was working on last night is probably plenty
good enough there, and it's a much less invasive patch.

Were you planning to pursue this further, or did you want me to?
I'd want to layer it on top of the work I did at [1]/messages/by-id/2273648.1634764485@sss.pgh.pa.us, else there's
going to be lots of merge conflicts.

regards, tom lane

[1]: /messages/by-id/2273648.1634764485@sss.pgh.pa.us

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#6)
Re: Experimenting with hash tables inside pg_dump

Andres Freund <andres@anarazel.de> writes:

On 2021-10-21 18:27:25 -0400, Tom Lane wrote:

(a) the executable size increases by a few KB --- apparently, even
the minimum subset of simplehash.h's functionality is code-wasteful.

If I prevent the compiler from inlining findObjectByCatalogId() in all the
find*ByOid() routines, your version is smaller than master even without other
changes.

Hmm ... seems to depend a lot on which compiler you use.

I was originally looking at it with gcc 8.4.1 (RHEL8 default),
x86_64. On that, adding pg_noinline to findObjectByCatalogId
helps a little, but it's still 3.6K bigger than HEAD.

I then tried gcc 11.2.1/x86_64, finding that the patch adds
about 2K and pg_noinline makes no difference.

I also tried it on Apple's clang 13.0.0, both x86_64 and ARM
versions. On that, the change seems to be a wash or slightly
smaller, with maybe a little benefit from pg_noinline but not
much. It's hard to tell for sure because size(1) seems to be
rounding off to a page boundary on that platform.

Anyway, these are all sub-one-percent changes in the code
size, so probably we should not sweat that much about it.
I'm kind of leaning now towards pushing the patch, just
on the grounds that getting rid of all those single-purpose
index arrays (and likely future need for more of them)
is worth it from a maintenance perspective.

regards, tom lane

#14Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#12)
Re: Experimenting with hash tables inside pg_dump

Hi,

On October 22, 2021 8:54:13 AM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@anarazel.de> writes:

On 2021-10-22 10:53:31 -0400, Tom Lane wrote:

I'm skeptical of that, mainly because it doesn't work in old servers,

I think we can address that, if we think it's overall a promising approach to
pursue. E.g. if we don't need the indexes, we can make it = ANY().

Hmm ... yeah, I guess we could get away with that. It might not scale
as nicely to a huge database, but probably dumping a huge database
from an ancient server isn't all that interesting.

I think compared to the overhead of locking that many tables and sending O(N) queries it shouldn't be a huge factor.

One think that looks like it might be worth doing, and not hard, is to use single row mode. No need to materialize all that data twice in memory.

At a later stage it might be worth sending the array separately as a parameter. Perhaps even binary encoded.

I'm inclined to think that it could be sane to make getTableAttrs
and getIndexes use this style, but we probably still want functions
and such to use per-object queries. In those other catalogs there
are many built-in objects that we don't really care about. The
prepared-queries hack I was working on last night is probably plenty
good enough there, and it's a much less invasive patch.

Yes, that seems reasonable. I think the triggers query would benefit from the batch approach though - I see that taking a long time in aggregate on a test database with many tables I had around (partially due to the self join), and we already materialize it.

Were you planning to pursue this further, or did you want me to?

It seems too nice an improvement to drop on the floor. That said, I don't really have the mental bandwidth to pursue this beyond the POC stage - it seemed complicated enough that suggestion accompanied by a prototype was a good idea. So I'd be happy for you to incorporate this into your other changes.

I'd want to layer it on top of the work I did at [1], else there's
going to be lots of merge conflicts.

Makes sense. Even if nobody else were doing anything in the area I'd probably want to split it into one commit creating the query once, and then separately implement the batching.

Regards,

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#14)
Re: Experimenting with hash tables inside pg_dump

Andres Freund <andres@anarazel.de> writes:

On October 22, 2021 8:54:13 AM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Were you planning to pursue this further, or did you want me to?

It seems too nice an improvement to drop on the floor. That said, I don't really have the mental bandwidth to pursue this beyond the POC stage - it seemed complicated enough that suggestion accompanied by a prototype was a good idea. So I'd be happy for you to incorporate this into your other changes.

Cool, I'll see what I can do with it, as long as I'm poking around
in the area.

regards, tom lane

#16Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#13)
Re: Experimenting with hash tables inside pg_dump

Hi,

On October 22, 2021 10:32:30 AM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@anarazel.de> writes:

On 2021-10-21 18:27:25 -0400, Tom Lane wrote:

(a) the executable size increases by a few KB --- apparently, even
the minimum subset of simplehash.h's functionality is code-wasteful.

If I prevent the compiler from inlining findObjectByCatalogId() in all the
find*ByOid() routines, your version is smaller than master even without other
changes.

Hmm ... seems to depend a lot on which compiler you use.

Inline heuristics change a lot over time, so that'd make sense.

I see some win by marking pg_log_error cold. That might be useful more generally too.

Which made me look at the code invoking it from simplehash. I think the patch that made simplehash work in frontend code isn't quite right, because pg_log_error() returns...

Wonder if we should mark simplehash's grow as noinline? Even with a single caller it seems better to not inline it to remove register allocator pressure.

Anyway, these are all sub-one-percent changes in the code
size, so probably we should not sweat that much about it.
I'm kind of leaning now towards pushing the patch, just
on the grounds that getting rid of all those single-purpose
index arrays (and likely future need for more of them)
is worth it from a maintenance perspective.

+1

The only thought I had wrt the patch is that I'd always create the hash table. That way the related branches can be removed, which is a win code size wise (as well as speed presumably, but I think we're far away from that mattering).

This type of code is where I most wish for a language with proper generic data types/containers...

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#16)
Re: Experimenting with hash tables inside pg_dump

Andres Freund <andres@anarazel.de> writes:

Which made me look at the code invoking it from simplehash. I think the patch that made simplehash work in frontend code isn't quite right, because pg_log_error() returns...

Indeed, that's broken. I guess we want pg_log_fatal then exit(1).

Wonder if we should mark simplehash's grow as noinline? Even with a single caller it seems better to not inline it to remove register allocator pressure.

Seems plausible --- you want me to go change that?

The only thought I had wrt the patch is that I'd always create the hash
table.

That'd require adding an explicit init function and figuring out where to
call it, which we could do but I didn't (and don't) think it's worth the
trouble. One more branch here isn't going to matter, especially given
that we can't even measure the presumed macro improvement.

regards, tom lane

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#17)
Re: Experimenting with hash tables inside pg_dump

I wrote:

Andres Freund <andres@anarazel.de> writes:

Wonder if we should mark simplehash's grow as noinline? Even with a single caller it seems better to not inline it to remove register allocator pressure.

Seems plausible --- you want me to go change that?

Hmm, harder than it sounds. If I remove "inline" from SH_SCOPE then
the compiler complains about unreferenced static functions, while
if I leave it there than adding pg_noinline causes a complaint about
conflicting options. Seems like we need a less quick-and-dirty
approach to dealing with unnecessary simplehash support functions.

regards, tom lane

#19Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#18)
Re: Experimenting with hash tables inside pg_dump

Hi,

Thanks for pushing the error handling cleanup etc!

On 2021-10-22 16:32:39 -0400, Tom Lane wrote:

I wrote:

Andres Freund <andres@anarazel.de> writes:

Wonder if we should mark simplehash's grow as noinline? Even with a single caller it seems better to not inline it to remove register allocator pressure.

Seems plausible --- you want me to go change that?

Hmm, harder than it sounds. If I remove "inline" from SH_SCOPE then
the compiler complains about unreferenced static functions, while
if I leave it there than adding pg_noinline causes a complaint about
conflicting options.

The easy way out would be to to not declare SH_GROW inside SH_DECLARE - that'd
currently work, because there aren't any calls to grow from outside of
simplehash.h. The comment says:
* ... But resizing to the exact input size can be advantageous
* performance-wise, when known at some point.

But perhaps that's sufficiently served to create the table with the correct
size immediately?

If we were to go for that, we'd just put SH_GROW in the SH_DEFINE section not
use SH_SCOPE, but just static. That works here, and I have some hope it'd not
cause warnings on other compilers either, because there'll be references from
the other inline functions. Even if there's a SH_SCOPE=static inline
simplehash use inside a header and there aren't any callers in a TU, there'd
still be static inline references to it.

Another alternative would be to use __attribute__((unused)) or such on
non-static-inline functions that might or might not be used.

Seems like we need a less quick-and-dirty approach to dealing with
unnecessary simplehash support functions.

I don't think the problem is unnecessary ones? It's "cold" functions we don't
want to have inlined into larger functions.

Greetings,

Andres Freund

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#19)
Re: Experimenting with hash tables inside pg_dump

Andres Freund <andres@anarazel.de> writes:

On 2021-10-22 16:32:39 -0400, Tom Lane wrote:

Hmm, harder than it sounds. If I remove "inline" from SH_SCOPE then
the compiler complains about unreferenced static functions, while
if I leave it there than adding pg_noinline causes a complaint about
conflicting options.

The easy way out would be to to not declare SH_GROW inside SH_DECLARE - that'd
currently work, because there aren't any calls to grow from outside of
simplehash.h.

Seems like a reasonable approach. If somebody wanted to call that
from outside, I'd personally feel they were getting way too friendly
with the implementation.

Seems like we need a less quick-and-dirty approach to dealing with
unnecessary simplehash support functions.

I don't think the problem is unnecessary ones?

I was thinking about the stuff like SH_ITERATE, which you might or
might not have use for in any particular file. In the case at hand
here, a file that doesn't call SH_INSERT would be at risk of getting
unused-function complaints about SH_GROW. But as you say, if we do
find that happening, __attribute__((unused)) would probably be
enough to silence it.

regards, tom lane

#21Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#20)
Re: Experimenting with hash tables inside pg_dump

Hi,

On 2021-10-25 13:58:06 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

Seems like we need a less quick-and-dirty approach to dealing with
unnecessary simplehash support functions.

I don't think the problem is unnecessary ones?

I was thinking about the stuff like SH_ITERATE, which you might or
might not have use for in any particular file. In the case at hand
here, a file that doesn't call SH_INSERT would be at risk of getting
unused-function complaints about SH_GROW. But as you say, if we do
find that happening, __attribute__((unused)) would probably be
enough to silence it.

I was hoping that a reference from a static inline function ought to be
sufficient to prevent warning about an unused-static-not-inline function, even
if the referencing static inline function is unused... It does work that way
with at least the last few versions of gcc (tested 8-11) and clang (tested 6.0
to 13).

Greetings,

Andres Freund