SP-GiST support for inet datatypes

Started by Emre Hasegelialmost 10 years ago15 messages

emre@hasegeli.com

almost 10 years ago

2 attachment(s)

Attached patches add SP-GiST support to the inet datatypes. The operator
class comes with a small change on the SP-GiST framework to allow fixed
number of child nodes.

The index is like prefix tree except that it doesn't bother to split the
addresses into parts as text is split. It also doesn't use labels to know
the part after the prefix, but relies on static node numbers.

The GiST index released with version 9.4 performs really bad with real
world data. SP-GiST works much better with the query posted to the
performance list [1]/messages/by-id/alpine.DEB.2.11.1508251504160.31004@pyrite a while ago:

hasegeli=# SELECT DISTINCT route INTO hmm FROM routes_view WHERE asn =

2914;

SELECT 732

hasegeli=# EXPLAIN ANALYZE SELECT routes.route FROM routes JOIN hmm ON

routes.route && hmm.route;

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------

Nested Loop (cost=0.41..571742.27 rows=2248 width=7) (actual

time=12.643..20474.813 rows=8127 loops=1)

-> Seq Scan on hmm (cost=0.00..11.32 rows=732 width=7) (actual

time=0.017..0.524 rows=732 loops=1)

-> Index Only Scan using route_gist on routes (cost=0.41..552.05

rows=22900 width=7) (actual time=4.851..27.948 rows=11 loops=732)

Index Cond: (route && (hmm.route)::inet)
Heap Fetches: 8127
Planning time: 1.507 ms
Execution time: 20475.605 ms
(7 rows)

hasegeli=# DROP INDEX route_gist;
DROP INDEX

hasegeli=# CREATE INDEX route_spgist ON routes USING spgist (route);
CREATE INDEX

hasegeli=# EXPLAIN ANALYZE SELECT routes.route FROM routes JOIN hmm ON

routes.route && hmm.route;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------

Nested Loop (cost=0.41..588634.27 rows=2248 width=7) (actual

time=0.081..16.961 rows=8127 loops=1)

-> Seq Scan on hmm (cost=0.00..11.32 rows=732 width=7) (actual

time=0.022..0.079 rows=732 loops=1)

-> Index Only Scan using route_spgist on routes (cost=0.41..575.13

rows=22900 width=7) (actual time=0.014..0.021 rows=11 loops=732)

Index Cond: (route && (hmm.route)::inet)
Heap Fetches: 8127
Planning time: 1.376 ms
Execution time: 15.936 ms

[1]: /messages/by-id/alpine.DEB.2.11.1508251504160.31004@pyrite
/messages/by-id/alpine.DEB.2.11.1508251504160.31004@pyrite

Attachments:

spgist-fixed-nnodes.patchapplication/octet-stream; name=spgist-fixed-nnodes.patchDownload

From 8c0c35635a49a8233d6b7238b71e84cbc9007ea9 Mon Sep 17 00:00:00 2001
From: Emre Hasegeli <emre@hasegeli.com>
Date: Sun, 14 Feb 2016 20:32:55 +0100
Subject: [PATCH] spgist fixed nnodes

---
 doc/src/sgml/spgist.sgml                | 14 +++++++++----
 src/backend/access/spgist/spgdoinsert.c | 36 ++++++++++++++++++++++++++-------
 src/backend/access/spgist/spgtextproc.c | 11 ++++++++--
 src/include/access/spgist.h             |  4 +++-
 4 files changed, 51 insertions(+), 14 deletions(-)

diff --git a/doc/src/sgml/spgist.sgml b/doc/src/sgml/spgist.sgml
index 56827e5..b6441f8 100644
--- a/doc/src/sgml/spgist.sgml
+++ b/doc/src/sgml/spgist.sgml
@@ -321,23 +321,25 @@ typedef struct spgChooseOut
         struct                  /* results for spgAddNode */
         {
             Datum       nodeLabel;  /* new node's label */
             int         nodeN;      /* where to insert it (index from 0) */
         }           addNode;
         struct                  /* results for spgSplitTuple */
         {
             /* Info to form new inner tuple with one node */
             bool        prefixHasPrefix;    /* tuple should have a prefix? */
             Datum       prefixPrefixDatum;  /* if so, its value */
-            Datum       nodeLabel;          /* node's label */
-
+            int         prefixNNodes;       /* number of nodes for new inner tuple */
+            Datum      *prefixNodeLabels;   /* their labels (or NULL for no labels) */
+            
             /* Info to form new lower-level inner tuple with all old nodes */
+            int         postfixNodeN;       /* where to insert it (index from 0) */
             bool        postfixHasPrefix;   /* tuple should have a prefix? */
             Datum       postfixPrefixDatum; /* if so, its value */
         }           splitTuple;
     }           result;
 } spgChooseOut;
 </programlisting>
 
        <structfield>datum</> is the original datum that was to be inserted
        into the index.
        <structfield>leafDatum</> is initially the same as
@@ -398,22 +400,26 @@ typedef struct spgChooseOut
        set <structfield>resultType</> to <literal>spgSplitTuple</>.
        This action moves all the existing nodes into a new lower-level
        inner tuple, and replaces the existing inner tuple with a tuple
        having a single node that links to the new lower-level inner tuple.
        Set <structfield>prefixHasPrefix</> to indicate whether the new
        upper tuple should have a prefix, and if so set
        <structfield>prefixPrefixDatum</> to the prefix value.  This new
        prefix value must be sufficiently less restrictive than the original
        to accept the new value to be indexed, and it should be no longer
        than the original prefix.
-       Set <structfield>nodeLabel</> to the label to be used for the
-       node that will point to the new lower-level inner tuple.
+       Set <structfield>prefixNNodes</> and
+       <structfield>prefixNodeLabels</> for the new prefix.
+       <structfield>prefixNNodes</></structfield> must be at least 1 to
+       put the new lower-level inner tuple.
+       Set <structfield>postfixNodeN</> that will point to the new
+       lower-level inner tuple.
        Set <structfield>postfixHasPrefix</> to indicate whether the new
        lower-level inner tuple should have a prefix, and if so set
        <structfield>postfixPrefixDatum</> to the prefix value.  The
        combination of these two prefixes and the additional label must
        have the same meaning as the original prefix, because there is
        no opportunity to alter the node labels that are moved to the new
        lower-level tuple, nor to change any child index entries.
        After the node has been split, the <function>choose</function>
        function will be called again with the replacement inner tuple.
        That call will usually result in an <literal>spgAddNode</> result,
diff --git a/src/backend/access/spgist/spgdoinsert.c b/src/backend/access/spgist/spgdoinsert.c
index f090ca5..8658f5d 100644
--- a/src/backend/access/spgist/spgdoinsert.c
+++ b/src/backend/access/spgist/spgdoinsert.c
@@ -1699,30 +1699,46 @@ spgSplitNodeAction(Relation index, SpGistState *state,
 	BlockNumber postfixBlkno;
 	OffsetNumber postfixOffset;
 	int			i;
 	spgxlogSplitTuple xlrec;
 	Buffer		newBuffer = InvalidBuffer;
 
 	/* Should not be applied to nulls */
 	Assert(!SpGistPageStoresNulls(current->page));
 
 	/*
-	 * Construct new prefix tuple, containing a single node with the specified
-	 * label.  (We'll update the node's downlink to point to the new postfix
-	 * tuple, below.)
+	 * Construct new prefix tuple with given number of nodes.  We are
+	 * going to update one of those node's downlink to point to the new
+	 * postfix tuple, below.
 	 */
-	node = spgFormNodeTuple(state, out->result.splitTuple.nodeLabel, false);
+	if (out->result.splitTuple.prefixNNodes == 0)
+		elog(ERROR,
+			 "prefix must have at least one node to point to the new postfix");
+	nodes = (SpGistNodeTuple *) palloc(sizeof(SpGistNodeTuple) *
+									   out->result.splitTuple.prefixNNodes);
+
+	for (i = 0; i < out->result.splitTuple.prefixNNodes; i++)
+	{
+		Datum		label = (Datum) 0;
+		bool		labelisnull =
+						(out->result.splitTuple.prefixNodeLabels == NULL);
+
+		if (!labelisnull)
+			label = out->result.splitTuple.prefixNodeLabels[i];
+		nodes[i] = spgFormNodeTuple(state, label, labelisnull);
+	}
 
 	prefixTuple = spgFormInnerTuple(state,
 									out->result.splitTuple.prefixHasPrefix,
 									out->result.splitTuple.prefixPrefixDatum,
-									1, &node);
+									out->result.splitTuple.prefixNNodes,
+									nodes);
 
 	/* it must fit in the space that innerTuple now occupies */
 	if (prefixTuple->size > innerTuple->size)
 		elog(ERROR, "SPGiST inner-tuple split must not produce longer prefix");
 
 	/*
 	 * Construct new postfix tuple, containing all nodes of innerTuple with
 	 * same node datums, but with the prefix specified by the picksplit
 	 * function.
 	 */
@@ -1800,24 +1816,30 @@ spgSplitNodeAction(Relation index, SpGistState *state,
 		xlrec.postfixBlkSame = false;
 	}
 
 	/*
 	 * And set downlink pointer in the prefix tuple to point to postfix tuple.
 	 * (We can't avoid this step by doing the above two steps in opposite
 	 * order, because there might not be enough space on the page to insert
 	 * the postfix tuple first.)  We have to update the local copy of the
 	 * prefixTuple too, because that's what will be written to WAL.
 	 */
-	spgUpdateNodeLink(prefixTuple, 0, postfixBlkno, postfixOffset);
+	if (out->result.splitTuple.postfixNodeN >=
+		out->result.splitTuple.prefixNNodes)
+		elog(ERROR,
+			 "prefix doesn't have enough nodes to point to the new postfix");
+	spgUpdateNodeLink(prefixTuple, out->result.splitTuple.postfixNodeN,
+					  postfixBlkno, postfixOffset);
 	prefixTuple = (SpGistInnerTuple) PageGetItem(current->page,
 							  PageGetItemId(current->page, current->offnum));
-	spgUpdateNodeLink(prefixTuple, 0, postfixBlkno, postfixOffset);
+	spgUpdateNodeLink(prefixTuple, out->result.splitTuple.postfixNodeN,
+					  postfixBlkno, postfixOffset);
 
 	MarkBufferDirty(current->buffer);
 
 	if (RelationNeedsWAL(index))
 	{
 		XLogRecPtr	recptr;
 
 		XLogBeginInsert();
 		XLogRegisterData((char *) &xlrec, sizeof(xlrec));
 		XLogRegisterData((char *) prefixTuple, prefixTuple->size);
diff --git a/src/backend/access/spgist/spgtextproc.c b/src/backend/access/spgist/spgtextproc.c
index e0d8f30..87edabe 100644
--- a/src/backend/access/spgist/spgtextproc.c
+++ b/src/backend/access/spgist/spgtextproc.c
@@ -205,23 +205,27 @@ spg_text_choose(PG_FUNCTION_ARGS)
 			if (commonLen == 0)
 			{
 				out->result.splitTuple.prefixHasPrefix = false;
 			}
 			else
 			{
 				out->result.splitTuple.prefixHasPrefix = true;
 				out->result.splitTuple.prefixPrefixDatum =
 					formTextDatum(prefixStr, commonLen);
 			}
-			out->result.splitTuple.nodeLabel =
+			out->result.splitTuple.prefixNNodes = 1;
+			out->result.splitTuple.prefixNodeLabels =
+				(Datum *) palloc(sizeof(Datum));
+			out->result.splitTuple.prefixNodeLabels[0] =
 				Int16GetDatum(*(unsigned char *) (prefixStr + commonLen));
 
+			out->result.splitTuple.postfixNodeN = 0;
 			if (prefixSize - commonLen == 1)
 			{
 				out->result.splitTuple.postfixHasPrefix = false;
 			}
 			else
 			{
 				out->result.splitTuple.postfixHasPrefix = true;
 				out->result.splitTuple.postfixPrefixDatum =
 					formTextDatum(prefixStr + commonLen + 1,
 								  prefixSize - commonLen - 1);
@@ -273,21 +277,24 @@ spg_text_choose(PG_FUNCTION_ARGS)
 		 * labels as the original tuple.
 		 *
 		 * Note: it might seem tempting to shorten the upper tuple's prefix,
 		 * if it has one, then use its last byte as label for the lower tuple.
 		 * But that doesn't win since we know the incoming value matches the
 		 * whole prefix: we'd just end up splitting the lower tuple again.
 		 */
 		out->resultType = spgSplitTuple;
 		out->result.splitTuple.prefixHasPrefix = in->hasPrefix;
 		out->result.splitTuple.prefixPrefixDatum = in->prefixDatum;
-		out->result.splitTuple.nodeLabel = Int16GetDatum(-2);
+		out->result.splitTuple.prefixNNodes = 1;
+		out->result.splitTuple.prefixNodeLabels = (Datum *) palloc(sizeof(Datum));
+		out->result.splitTuple.prefixNodeLabels[0] = Int16GetDatum(-2);
+		out->result.splitTuple.postfixNodeN = 0;
 		out->result.splitTuple.postfixHasPrefix = false;
 	}
 	else
 	{
 		/* Add a node for the not-previously-seen nodeChar value */
 		out->resultType = spgAddNode;
 		out->result.addNode.nodeLabel = Int16GetDatum(nodeChar);
 		out->result.addNode.nodeN = i;
 	}
 
diff --git a/src/include/access/spgist.h b/src/include/access/spgist.h
index 1994f71..8abc2b3 100644
--- a/src/include/access/spgist.h
+++ b/src/include/access/spgist.h
@@ -86,23 +86,25 @@ typedef struct spgChooseOut
 		struct					/* results for spgAddNode */
 		{
 			Datum		nodeLabel;		/* new node's label */
 			int			nodeN;	/* where to insert it (index from 0) */
 		}			addNode;
 		struct					/* results for spgSplitTuple */
 		{
 			/* Info to form new inner tuple with one node */
 			bool		prefixHasPrefix;		/* tuple should have a prefix? */
 			Datum		prefixPrefixDatum;		/* if so, its value */
-			Datum		nodeLabel;		/* node's label */
+			int			prefixNNodes;	/* number of nodes for new inner tuple */
+			Datum	   *prefixNodeLabels;	/* their labels (or NULL for no labels) */
 
 			/* Info to form new lower-level inner tuple with all old nodes */
+			int			postfixNodeN;	/* where to insert it (index from 0) */
 			bool		postfixHasPrefix;		/* tuple should have a prefix? */
 			Datum		postfixPrefixDatum;		/* if so, its value */
 		}			splitTuple;
 	}			result;
 } spgChooseOut;
 
 /*
  * Argument structs for spg_picksplit method
  */
 typedef struct spgPickSplitIn
-- 
2.5.4 (Apple Git-61)

inet-spgist-v1.patchapplication/octet-stream; name=inet-spgist-v1.patchDownload

From 549bfe4ca1d9db975e825f51c122bfb5da5d49d8 Mon Sep 17 00:00:00 2001
From: Emre Hasegeli <emre@hasegeli.com>
Date: Sun, 14 Feb 2016 16:44:53 +0100
Subject: [PATCH] inet-spgist v1

---
 doc/src/sgml/spgist.sgml                 |  17 +
 src/backend/utils/adt/Makefile           |   2 +-
 src/backend/utils/adt/network.c          |  92 +---
 src/backend/utils/adt/network_spgist.c   | 714 +++++++++++++++++++++++++++++++
 src/include/catalog/pg_amop.h            |  15 +
 src/include/catalog/pg_amproc.h          |   5 +
 src/include/catalog/pg_opclass.h         |   1 +
 src/include/catalog/pg_opfamily.h        |   1 +
 src/include/catalog/pg_proc.h            |  12 +
 src/include/utils/inet.h                 |  10 +
 src/test/regress/expected/inet.out       | 148 +++++++
 src/test/regress/expected/opr_sanity.out |  11 +-
 src/test/regress/sql/inet.sql            |  23 +
 13 files changed, 979 insertions(+), 72 deletions(-)
 create mode 100644 src/backend/utils/adt/network_spgist.c

diff --git a/doc/src/sgml/spgist.sgml b/doc/src/sgml/spgist.sgml
index b6441f8..da55d3c 100644
--- a/doc/src/sgml/spgist.sgml
+++ b/doc/src/sgml/spgist.sgml
@@ -120,20 +120,37 @@
        <literal>&lt;=</>
        <literal>=</>
        <literal>&gt;</>
        <literal>&gt;=</>
        <literal>~&lt;=~</>
        <literal>~&lt;~</>
        <literal>~&gt;=~</>
        <literal>~&gt;~</>
       </entry>
      </row>
+     <row>
+      <entry><literal>inet_ops</></entry>
+      <entry><type>inet</>, <type>cidr</></entry>
+      <entry>
+       <literal>&amp;&amp;</>
+       <literal>&gt;&gt;</>
+       <literal>&gt;&gt;=</>
+       <literal>&gt;</>
+       <literal>&gt;=</>
+       <literal>&lt;&gt;</>
+       <literal>&lt;&lt;</>
+       <literal>&lt;&lt;=</>
+       <literal>&lt;</>
+       <literal>&lt;=</>
+       <literal>=</>
+      </entry>
+     </row>
     </tbody>
    </tgroup>
   </table>
 
  <para>
   Of the two operator classes for type <type>point</>,
   <literal>quad_point_ops</> is the default.  <literal>kd_point_ops</>
   supports the same operators but uses a different index data structure which
   may offer better performance in some applications.
  </para>
diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 2cb7bab..ca7932a 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -10,21 +10,21 @@ include $(top_builddir)/src/Makefile.global
 
 # keep this list arranged alphabetically or it gets to be a mess
 OBJS = acl.o arrayfuncs.o array_expanded.o array_selfuncs.o \
 	array_typanalyze.o array_userfuncs.o arrayutils.o ascii.o \
 	bool.o cash.o char.o date.o datetime.o datum.o dbsize.o domains.o \
 	encode.o enum.o expandeddatum.o \
 	float.o format_type.o formatting.o genfile.o \
 	geo_ops.o geo_selfuncs.o inet_cidr_ntop.o inet_net_pton.o int.o \
 	int8.o json.o jsonb.o jsonb_gin.o jsonb_op.o jsonb_util.o \
 	jsonfuncs.o like.o lockfuncs.o mac.o misc.o nabstime.o name.o \
-	network.o network_gist.o network_selfuncs.o \
+	network.o network_gist.o network_spgist.o network_selfuncs.o \
 	numeric.o numutils.o oid.o oracle_compat.o \
 	orderedsetaggs.o pg_locale.o pg_lsn.o pg_upgrade_support.o \
 	pgstatfuncs.o \
 	pseudotypes.o quote.o rangetypes.o rangetypes_gist.o \
 	rangetypes_selfuncs.o rangetypes_spgist.o rangetypes_typanalyze.o \
 	regexp.o regproc.o ri_triggers.o rowtypes.o ruleutils.o \
 	selfuncs.o tid.o timestamp.o trigfuncs.o \
 	tsginidx.o tsgistidx.o tsquery.o tsquery_cleanup.o tsquery_gist.o \
 	tsquery_op.o tsquery_rewrite.o tsquery_util.o tsrank.o \
 	tsvector.o tsvector_op.o tsvector_parser.o \
diff --git a/src/backend/utils/adt/network.c b/src/backend/utils/adt/network.c
index 1f8469a..bf8a388 100644
--- a/src/backend/utils/adt/network.c
+++ b/src/backend/utils/adt/network.c
@@ -261,55 +261,29 @@ cidr_send(PG_FUNCTION_ARGS)
 	inet	   *addr = PG_GETARG_INET_PP(0);
 
 	PG_RETURN_BYTEA_P(network_send(addr, true));
 }
 
 
 Datum
 inet_to_cidr(PG_FUNCTION_ARGS)
 {
 	inet	   *src = PG_GETARG_INET_PP(0);
-	inet	   *dst;
 	int			bits;
-	int			byte;
-	int			nbits;
-	int			maxbytes;
 
 	bits = ip_bits(src);
 
 	/* safety check */
 	if ((bits < 0) || (bits > ip_maxbits(src)))
 		elog(ERROR, "invalid inet bit length: %d", bits);
 
-	/* clone the original data */
-	dst = (inet *) palloc(VARSIZE_ANY(src));
-	memcpy(dst, src, VARSIZE_ANY(src));
-
-	/* zero out any bits to the right of the netmask */
-	byte = bits / 8;
-
-	nbits = bits % 8;
-	/* clear the first byte, this might be a partial byte */
-	if (nbits != 0)
-	{
-		ip_addr(dst)[byte] &= ~(0xFF >> nbits);
-		byte++;
-	}
-	/* clear remaining bytes */
-	maxbytes = ip_addrsize(dst);
-	while (byte < maxbytes)
-	{
-		ip_addr(dst)[byte] = 0;
-		byte++;
-	}
-
-	PG_RETURN_INET_P(dst);
+	PG_RETURN_INET_P(cidr_set_masklen_internal(src, bits));
 }
 
 Datum
 inet_set_masklen(PG_FUNCTION_ARGS)
 {
 	inet	   *src = PG_GETARG_INET_PP(0);
 	int			bits = PG_GETARG_INT32(1);
 	inet	   *dst;
 
 	if (bits == -1)
@@ -327,58 +301,54 @@ inet_set_masklen(PG_FUNCTION_ARGS)
 	ip_bits(dst) = bits;
 
 	PG_RETURN_INET_P(dst);
 }
 
 Datum
 cidr_set_masklen(PG_FUNCTION_ARGS)
 {
 	inet	   *src = PG_GETARG_INET_PP(0);
 	int			bits = PG_GETARG_INT32(1);
-	inet	   *dst;
-	int			byte;
-	int			nbits;
-	int			maxbytes;
 
 	if (bits == -1)
 		bits = ip_maxbits(src);
 
 	if ((bits < 0) || (bits > ip_maxbits(src)))
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("invalid mask length: %d", bits)));
 
-	/* clone the original data */
-	dst = (inet *) palloc(VARSIZE_ANY(src));
-	memcpy(dst, src, VARSIZE_ANY(src));
+	PG_RETURN_INET_P(cidr_set_masklen_internal(src, bits));
+}
 
+inet *
+cidr_set_masklen_internal(inet *src, int bits)
+{
+	inet	   *dst = (inet *) palloc0(sizeof(inet));
+
+	ip_family(dst) = ip_family(src);
 	ip_bits(dst) = bits;
 
-	/* zero out any bits to the right of the new netmask */
-	byte = bits / 8;
+	if (bits > 0)
+	{
+		/* Clone appropriate bytes of the address */
+		memcpy(ip_addr(dst), ip_addr(src), (bits + 7) / 8);
 
-	nbits = bits % 8;
-	/* clear the first byte, this might be a partial byte */
-	if (nbits != 0)
-	{
-		ip_addr(dst)[byte] &= ~(0xFF >> nbits);
-		byte++;
-	}
-	/* clear remaining bytes */
-	maxbytes = ip_addrsize(dst);
-	while (byte < maxbytes)
-	{
-		ip_addr(dst)[byte] = 0;
-		byte++;
+		/* Clear any unwanted bits in the last partial byte */
+		if (bits % 8 > 0)
+			ip_addr(dst)[bits / 8] &= ~(0xFF >> (bits % 8));
 	}
 
-	PG_RETURN_INET_P(dst);
+	/* Set varlena header correctly */
+	SET_INET_VARSIZE(dst);
+
+	return dst;
 }
 
 /*
  *	Basic comparison function for sorting and inet/cidr comparisons.
  *
  * Comparison is first on the common bits of the network part, then on
  * the length of the network part, and then on the whole unmasked address.
  * The effect is that the network part is the major sort key, and for
  * equal network parts we sort on the host part.  Note this is only sane
  * for CIDR if address bits to the right of the mask are guaranteed zero;
@@ -900,50 +870,32 @@ inet_same_family(PG_FUNCTION_ARGS)
 	PG_RETURN_BOOL(ip_family(a1) == ip_family(a2));
 }
 
 /*
  * Returns the smallest CIDR which contains both of the inputs.
  */
 Datum
 inet_merge(PG_FUNCTION_ARGS)
 {
 	inet	   *a1 = PG_GETARG_INET_PP(0),
-			   *a2 = PG_GETARG_INET_PP(1),
-			   *result;
+			   *a2 = PG_GETARG_INET_PP(1);
 	int			commonbits;
 
 	if (ip_family(a1) != ip_family(a2))
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("cannot merge addresses from different families")));
 
 	commonbits = bitncommon(ip_addr(a1), ip_addr(a2),
 							Min(ip_bits(a1), ip_bits(a2)));
 
-	/* Make sure any unused bits are zeroed. */
-	result = (inet *) palloc0(sizeof(inet));
-
-	ip_family(result) = ip_family(a1);
-	ip_bits(result) = commonbits;
-
-	/* Clone appropriate bytes of the address. */
-	if (commonbits > 0)
-		memcpy(ip_addr(result), ip_addr(a1), (commonbits + 7) / 8);
-
-	/* Clean any unwanted bits in the last partial byte. */
-	if (commonbits % 8 != 0)
-		ip_addr(result)[commonbits / 8] &= ~(0xFF >> (commonbits % 8));
-
-	/* Set varlena header correctly. */
-	SET_INET_VARSIZE(result);
-
-	PG_RETURN_INET_P(result);
+	PG_RETURN_INET_P(cidr_set_masklen_internal(a1, commonbits));
 }
 
 /*
  * Convert a value of a network datatype to an approximate scalar value.
  * This is used for estimating selectivities of inequality operators
  * involving network types.
  */
 double
 convert_network_to_scalar(Datum value, Oid typid)
 {
diff --git a/src/backend/utils/adt/network_spgist.c b/src/backend/utils/adt/network_spgist.c
new file mode 100644
index 0000000..eb78651
--- /dev/null
+++ b/src/backend/utils/adt/network_spgist.c
@@ -0,0 +1,714 @@
+/*-------------------------------------------------------------------------
+ *
+ * network_spgist.c
+ *	  SP-GiST support for network types.
+ *
+ * The index used cidr data type on the inner nodes as the prefix.  All
+ * of the inner nodes has static number of sub-nodes.  It is 2 for
+ * the ones which splits different IP families, and 4 for all others.
+ * 2 for the different IP families are one for version 4 and one for
+ * version 6 addresses.
+ *
+ * 4 nodes for all others are more interesting.  The node numbers 0 and
+ * 1 are for the addresses which has the same masklen as the prefix.
+ * Node numbers 2 and 3 are for the addresses with bigger masklen.  That
+ * makes them smaller networks.  We cannot place bigger networks under
+ * the smaller ones.  Nodes number 0 and 1 are split by the next host
+ * bit of the addresses.  Nodes number 2 and 3 are split by the next
+ * network bit of the addresses.  The ones without any more bits are
+ * naturally placed under node 0.
+ *
+ * This design does not all the addresses to be indexed further after
+ * first host bits of them.  It is not possible to do use, because cidr
+ * data type is used as the prefix.  We would need an additional number
+ * to know which host bit of the address, we have split the tree.
+ *
+ * Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *			src/backend/utils/adt/network_spgist.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/stratnum.h"
+#include "access/spgist.h"
+#include "catalog/pg_type.h"
+#include "utils/inet.h"
+
+static int inet_spg_node_number(inet *orig, int16 commonbits);
+static unsigned char inet_spg_consistent_bitmap(inet *prefix, int nkeys,
+												ScanKey scankeys, bool leaf);
+
+/*
+ * The SP-GiST configuration function
+ */
+Datum
+inet_spg_config(PG_FUNCTION_ARGS)
+{
+	/* spgConfigIn *cfgin = (spgConfigIn *) PG_GETARG_POINTER(0); */
+	spgConfigOut *cfg = (spgConfigOut *) PG_GETARG_POINTER(1);
+
+	cfg->prefixType = CIDROID;
+	cfg->labelType = VOIDOID;
+	cfg->canReturnData = true;
+	cfg->longValuesOK = false;
+
+	PG_RETURN_VOID();
+}
+
+/*
+ * The SP-GiST choose function
+ */
+Datum
+inet_spg_choose(PG_FUNCTION_ARGS)
+{
+	spgChooseIn *in = (spgChooseIn *) PG_GETARG_POINTER(0);
+	spgChooseOut *out = (spgChooseOut *) PG_GETARG_POINTER(1);
+	inet	   *orig = DatumGetInetPP(in->datum),
+			   *prefix;
+	int			commonbits;
+
+	/*
+	 * When there are addresses from the different families, we divide
+	 * them purely on their families.  It can only happen on the top
+	 * level node of the index without a prefix.
+	 */
+	if (!in->hasPrefix)
+	{
+		/*
+		 * It is not okay to choose spgMatchNode when the tuples are
+		 * "all the same".  We rely on the knowledge that the picksplit
+		 * function splits the items based on their families only when
+		 * there are addresses from multiple families.
+		 */
+		Assert(!in->allTheSame);
+
+		out->resultType = spgMatchNode;
+		out->result.matchNode.nodeN = ip_family(orig) == PGSQL_AF_INET ? 0 : 1;
+		out->result.matchNode.restDatum = InetPGetDatum(orig);
+
+		PG_RETURN_VOID();
+	}
+
+	prefix = DatumGetInetPP(in->prefixDatum);
+	commonbits = ip_bits(prefix);
+
+	/*
+	 * We cannot put addresses from different families under the same
+	 * inner node, so we have to split.
+	 */
+	if (ip_family(orig) != ip_family(prefix))
+	{
+		out->resultType = spgSplitTuple;
+		out->result.splitTuple.prefixHasPrefix = false;
+		out->result.splitTuple.prefixNNodes = 2;
+		out->result.splitTuple.prefixNodeLabels = NULL;
+
+		out->result.splitTuple.postfixNodeN =
+								ip_family(prefix) == PGSQL_AF_INET ? 0 : 1;
+		out->result.splitTuple.postfixHasPrefix = true;
+		out->result.splitTuple.postfixPrefixDatum = InetPGetDatum(prefix);
+
+		PG_RETURN_VOID();
+	}
+
+	if (in->allTheSame)
+	{
+		out->resultType = spgMatchNode;
+		/* nodeN will be set by core */
+		out->result.matchNode.restDatum = InetPGetDatum(orig);
+
+		PG_RETURN_VOID();
+	}
+
+	/*
+	 * We cannot put addresses of a bigger network under under an inner
+	 * node of a smaller network, so we have to split.
+	 */
+	if (ip_bits(orig) < commonbits ||
+		bitncmp(ip_addr(prefix), ip_addr(orig), commonbits) != 0)
+	{
+		commonbits = bitncommon(ip_addr(prefix), ip_addr(orig), ip_bits(orig));
+
+		out->resultType = spgSplitTuple;
+		out->result.splitTuple.prefixHasPrefix = true;
+		out->result.splitTuple.prefixPrefixDatum =
+					InetPGetDatum(cidr_set_masklen_internal(orig, commonbits));
+		out->result.splitTuple.prefixNNodes = 4;
+		out->result.splitTuple.prefixNodeLabels = NULL;
+
+		/* We need a new node number for the existing prefix. */
+		out->result.splitTuple.postfixNodeN =
+									inet_spg_node_number(prefix, commonbits);
+		out->result.splitTuple.postfixHasPrefix = true;
+		out->result.splitTuple.postfixPrefixDatum = InetPGetDatum(prefix);
+
+		PG_RETURN_VOID();
+	}
+
+	out->resultType = spgMatchNode;
+	out->result.matchNode.nodeN = inet_spg_node_number(orig, commonbits);
+	out->result.matchNode.restDatum = InetPGetDatum(orig);
+
+	PG_RETURN_VOID();
+}
+
+/*
+ * The GiST PickSplit method
+ *
+ * There are two ways to split.  First one is to split by address
+ * families, if there are multiple families appearing in the input.
+ *
+ * The second and more common way is to split by addresses.  To
+ * achieve this, we determine the number of leading bits shared by all
+ * the keys, then split on the next bit.  We limit those bits to
+ * the minimum masklen of the input addresses, and put the keys with
+ * the same netmask under the first two nodes.
+ */
+Datum
+inet_spg_picksplit(PG_FUNCTION_ARGS)
+{
+	spgPickSplitIn *in = (spgPickSplitIn *) PG_GETARG_POINTER(0);
+	spgPickSplitOut *out = (spgPickSplitOut *) PG_GETARG_POINTER(1);
+	inet	   *prefix,
+			   *tmp;
+	int			i,
+				commonbits;
+	bool		differentFamilies = false;
+
+	/* Initialize the prefix with the first element */
+	prefix = DatumGetInetPP(in->datums[0]);
+	commonbits = ip_bits(prefix);
+
+	for (i = 1; i < in->nTuples; i++)
+	{
+		tmp = DatumGetInetPP(in->datums[i]);
+
+		if (ip_family(tmp) != ip_family(prefix))
+		{
+			differentFamilies = true;
+			break;
+		}
+
+		if (ip_bits(tmp) < commonbits)
+			commonbits = ip_bits(tmp);
+
+		if (commonbits == 0)
+			break;
+
+		/* Find minimum number of bits in common. */
+		commonbits = bitncommon(ip_addr(prefix), ip_addr(tmp), commonbits);
+	}
+
+	out->nodeLabels = NULL;
+	out->mapTuplesToNodes = palloc(sizeof(int) * in->nTuples);
+	out->leafTupleDatums = palloc(sizeof(Datum) * in->nTuples);
+
+	if (differentFamilies)
+	{
+		out->hasPrefix = false;
+		out->nNodes = 2;
+
+		for (i = 0; i < in->nTuples; i++)
+		{
+			tmp = DatumGetInetPP(in->datums[i]);
+
+			out->mapTuplesToNodes[i] = ip_family(tmp) == PGSQL_AF_INET ? 0 : 1;
+			out->leafTupleDatums[i] = InetPGetDatum(tmp);
+		}
+	}
+	else
+	{
+		out->hasPrefix = true;
+		out->prefixDatum =
+				InetPGetDatum(cidr_set_masklen_internal(prefix, commonbits));
+		out->nNodes = 4;
+
+		for (i = 0; i < in->nTuples; i++)
+		{
+			tmp = DatumGetInetPP(in->datums[i]);
+
+			out->mapTuplesToNodes[i] = inet_spg_node_number(tmp, commonbits);
+			out->leafTupleDatums[i] = InetPGetDatum(tmp);
+		}
+	}
+
+	PG_RETURN_VOID();
+}
+
+/*
+ * The SP-GiST query consistency check
+ */
+Datum
+inet_spg_inner_consistent(PG_FUNCTION_ARGS)
+{
+	spgInnerConsistentIn *in = (spgInnerConsistentIn *) PG_GETARG_POINTER(0);
+	spgInnerConsistentOut *out = (spgInnerConsistentOut *) PG_GETARG_POINTER(1);
+	int			i;
+	unsigned char which;
+
+	if (!in->hasPrefix)
+	{
+		Assert(in->nNodes == 2);
+
+		which = 1 | 1 << 1;
+
+		for (i = 0; i < in->nkeys; i++)
+		{
+			StrategyNumber strategy = in->scankeys[i].sk_strategy;
+			inet	   *argument = DatumGetInetPP(in->scankeys[i].sk_argument);
+
+			switch (strategy)
+			{
+				case RTLessStrategyNumber:
+				case RTLessEqualStrategyNumber:
+					if (ip_family(argument) == PGSQL_AF_INET)
+						which &= 1;
+					break;
+
+				case RTGreaterEqualStrategyNumber:
+				case RTGreaterStrategyNumber:
+					if (ip_family(argument) == PGSQL_AF_INET6)
+						which &= 1 << 1;
+					break;
+
+				case RTNotEqualStrategyNumber:
+					break;
+
+				default:
+					if (ip_family(argument) == PGSQL_AF_INET)
+						which &= 1;
+					else
+						which &= 1 << 1;
+			}
+		}
+	}
+	else if (!in->allTheSame)
+	{
+		Assert(in->nNodes == 4);
+
+		which = inet_spg_consistent_bitmap(DatumGetInetPP(in->prefixDatum),
+										   in->nkeys, in->scankeys, false);
+	}
+	else
+		which = 255;
+
+	out->nNodes = 0;
+
+	if (which)
+	{
+		out->nodeNumbers = (int *) palloc(sizeof(int) * in->nNodes);
+
+		for (i = 0; i < in->nNodes; i++)
+		{
+			if (which & 1 << i)
+			{
+				out->nodeNumbers[out->nNodes] = i;
+				out->nNodes++;
+			}
+		}
+	}
+
+	PG_RETURN_VOID();
+}
+
+/*
+ * The SP-GiST leaf consistency check
+ */
+Datum
+inet_spg_leaf_consistent(PG_FUNCTION_ARGS)
+{
+	spgLeafConsistentIn *in = (spgLeafConsistentIn *) PG_GETARG_POINTER(0);
+	spgLeafConsistentOut *out = (spgLeafConsistentOut *) PG_GETARG_POINTER(1);
+	inet	   *leaf = DatumGetInetPP(in->leafDatum);
+
+	/* All tests are exact. */
+	out->recheck = false;
+
+	/* Leaf is what it is... */
+	out->leafValue = InetPGetDatum(leaf);
+
+	PG_RETURN_BOOL(inet_spg_consistent_bitmap(leaf, in->nkeys, in->scankeys,
+											  true));
+}
+
+/*
+ * Calculate node number
+ *
+ * This function returns the node number for the given inet for any
+ * inner node except the one without the prefix which splits the nodes
+ * based on their families.
+ */
+static int
+inet_spg_node_number(inet *orig, int16 commonbits)
+{
+	int		nodeN = 0;
+
+	if (commonbits < ip_maxbits(orig) &&
+		ip_addr(orig)[commonbits / 8] & 1 << (7 - commonbits % 8))
+		nodeN += 1;
+	if (commonbits < ip_bits(orig))
+		nodeN += 2;
+
+	return nodeN;
+}
+
+/*
+ * Calculate bitmap of consistent nodes
+ *
+ * This function returns the bitmap of the selected nodes except the one
+ * without the prefix which splits the nodes based on their families.
+ * It works for the leaf nodes using a single bit of the bitmap.  In
+ * this case the result would be 0 or 1.
+ *
+ * The checks for the inner and leaf nodes are mostly common which is
+ * a good reason to merge them in a same function.  Using the same
+ * function makes it easier to catch inconsistencies.
+ */
+static unsigned char
+inet_spg_consistent_bitmap(inet *prefix, int nkeys, ScanKey scankeys, bool leaf)
+{
+	unsigned char bitmap;
+	int			commonbits,
+				order,
+				i;
+
+	if (leaf)
+		bitmap = 1;
+	else
+		bitmap = 1 | 1 << 1 | 1 << 2 | 1 << 3;
+
+	commonbits = ip_bits(prefix);
+
+	for (i = 0; i < nkeys; i++)
+	{
+		inet	   *argument = DatumGetInetPP(scankeys[i].sk_argument);
+		StrategyNumber strategy = scankeys[i].sk_strategy;
+
+		/*
+		 * Check 0: different families
+		 *
+		 * Matching families do not help any of the strategies.
+		 */
+		if (ip_family(argument) != ip_family(prefix))
+		{
+			switch (strategy)
+			{
+				case RTLessStrategyNumber:
+				case RTLessEqualStrategyNumber:
+					if (ip_family(argument) < ip_family(prefix))
+						bitmap = 0;
+					break;
+
+				case RTGreaterEqualStrategyNumber:
+				case RTGreaterStrategyNumber:
+					if (ip_family(argument) > ip_family(prefix))
+						bitmap = 0;
+					break;
+
+				case RTNotEqualStrategyNumber:
+					break;
+
+				default:
+					/*
+					 * For all other cases, we can be sure there is
+					 * no match.
+					 */
+					bitmap = 0;
+			}
+
+			if (!bitmap)
+				break;
+
+			/* Other checks makes no sense with different families. */
+			continue;
+		}
+
+		/*
+		 * Check 1: network bit count
+		 *
+		 * Network bit count (ip_bits) helps to check leaves for sub
+		 * network and sup network operators.  At non-leaf nodes, we
+		 * know every child value has greater ip_bits, so we can avoid
+		 * descending in some cases too.
+		 *
+		 * This check is less expensive than checking the addresses, so
+		 * we are doing this before, but it has to be done after for
+		 * the basic comparison strategies, because ip_bits only affect
+		 * their results when the common network bits are the same.
+		 */
+		switch (strategy)
+		{
+			case RTSubStrategyNumber:
+				if (commonbits <= ip_bits(argument))
+					bitmap &= 1 << 2 | 1 << 3;
+				break;
+
+			case RTSubEqualStrategyNumber:
+				if (commonbits < ip_bits(argument))
+					bitmap &= 1 << 2 | 1 << 3;
+				break;
+
+			case RTSuperStrategyNumber:
+				if (commonbits == ip_bits(argument) - 1)
+					bitmap &= 1 | 1 << 1;
+				else if (commonbits >= ip_bits(argument))
+					bitmap = 0;
+				break;
+
+			case RTSuperEqualStrategyNumber:
+				if (commonbits == ip_bits(argument))
+					bitmap &= 1 | 1 << 1;
+				else if (commonbits > ip_bits(argument))
+					bitmap = 0;
+				break;
+
+			case RTEqualStrategyNumber:
+				if (commonbits < ip_bits(argument))
+					bitmap &= 1 << 2 | 1 << 3;
+				else if (commonbits == ip_bits(argument))
+					bitmap &= 1 | 1 << 1;
+				else if (commonbits > ip_bits(argument))
+					bitmap = 0;
+				break;
+		}
+
+		if (!bitmap)
+			break;
+
+		/*
+		 * Check 2: common network bits
+		 *
+		 * Compare available common prefix bits to the query, but not
+		 * beyond either the query's netmask or the minimum netmask
+		 * among the represented values.  If these bits don't match
+		 * the query, we have our answer (and may or may not need to
+		 * descend, depending on the operator).
+		 */
+		order = bitncmp(ip_addr(prefix), ip_addr(argument),
+						Min(commonbits, ip_bits(argument)));
+
+		if (order != 0)
+		{
+			switch (strategy)
+			{
+				case RTLessStrategyNumber:
+				case RTLessEqualStrategyNumber:
+					if (order > 0)
+						bitmap = 0;
+					break;
+
+				case RTGreaterEqualStrategyNumber:
+				case RTGreaterStrategyNumber:
+					if (order < 0)
+						bitmap = 0;
+					break;
+
+				case RTNotEqualStrategyNumber:
+					break;
+
+				default:
+					/*
+					 * For all other cases, we can be sure there is
+					 * no match.
+					 */
+					bitmap = 0;
+			}
+
+			if (!bitmap)
+				break;
+
+			/*
+			 * Remaining checks makes no sense, when common bits don't
+			 * match.
+			 */
+			continue;
+		}
+
+		/*
+		 * Check 3: next network bit
+		 *
+		 * We can filter out one branch of the tree, using the next
+		 * network bit of the argument, if it is available.
+		 *
+		 * This check matters for the performance of the search.
+		 * The results would be correct without it.
+		 */
+		if (bitmap & (1 << 2 | 1 << 3) &&
+			commonbits < ip_bits(argument))
+		{
+			unsigned char nextbit;
+
+			nextbit = ip_addr(argument)[commonbits / 8] &
+					  1 << (7 - commonbits % 8);
+
+			switch (strategy)
+			{
+				case RTLessStrategyNumber:
+				case RTLessEqualStrategyNumber:
+					if (!nextbit)
+						bitmap &= 1 | 1 << 1 | 1 << 2;
+					break;
+
+				case RTGreaterEqualStrategyNumber:
+				case RTGreaterStrategyNumber:
+					if (nextbit)
+						bitmap &= 1 | 1 << 1 | 1 << 3;
+					break;
+
+				case RTNotEqualStrategyNumber:
+					break;
+
+				default:
+					if (!nextbit)
+						bitmap &= 1 | 1 << 1 | 1 << 2;
+					else
+						bitmap &= 1 | 1 << 1 | 1 << 3;
+			}
+
+			if (!bitmap)
+				break;
+		}
+
+		/*
+		 * Remaining checks are only for the basic comparison
+		 * strategies.  We are relying on the strategy numbers defined
+		 * on stratnum.h.
+		 */
+		if (strategy < RTEqualStrategyNumber ||
+			strategy > RTGreaterEqualStrategyNumber)
+			continue;
+
+		/*
+		 * Check 4: network bit count
+		 *
+		 * At this point, we know that the common network bits of
+		 * the prefix and the argument are the same, so we can go
+		 * forward and check the ip_bits.
+		 */
+		switch (strategy)
+		{
+			case RTLessStrategyNumber:
+			case RTLessEqualStrategyNumber:
+				if (commonbits == ip_bits(argument))
+					bitmap &= 1 | 1 << 1;
+				else if (commonbits > ip_bits(argument))
+					bitmap = 0;
+				break;
+
+			case RTGreaterEqualStrategyNumber:
+			case RTGreaterStrategyNumber:
+				if (commonbits < ip_bits(argument))
+					bitmap &= 1 << 2 | 1 << 3;
+				break;
+		}
+
+		if (!bitmap)
+			break;
+
+		/* Remaining check doesn't make sense with different ip_bits. */
+		if (commonbits != ip_bits(argument))
+			continue;
+
+		/*
+		 * Check 5: next host bit
+		 *
+		 * We can filter out one branch of the tree, using the next
+		 * host bit of the argument, if it is available.
+		 *
+		 * This check matters for the performance of the search.
+		 * The results would be correct without it.  There is no point
+		 * of running it for the leafs as we have to check the whole
+		 * address on the next step.
+		 */
+		if (!leaf && bitmap & (1 | 1 << 1) &&
+			commonbits < ip_maxbits(argument))
+		{
+			unsigned char nextbit;
+
+			nextbit = ip_addr(argument)[commonbits / 8] &
+					  1 << (7 - commonbits % 8);
+
+			switch (strategy)
+			{
+				case RTLessStrategyNumber:
+				case RTLessEqualStrategyNumber:
+					if (!nextbit)
+						bitmap &= 1 | 1 << 2 | 1 << 3;
+					break;
+
+				case RTGreaterEqualStrategyNumber:
+				case RTGreaterStrategyNumber:
+					if (nextbit)
+						bitmap &= 1 << 1 | 1 << 2 | 1 << 3;
+					break;
+
+				case RTNotEqualStrategyNumber:
+					break;
+
+				default:
+					if (!nextbit)
+						bitmap &= 1 | 1 << 2 | 1 << 3;
+					else
+						bitmap &= 1 << 1 | 1 << 2 | 1 << 3;
+			}
+
+			if (!bitmap)
+				break;
+		}
+
+		/*
+		 * Check 6: whole address
+		 *
+		 * This is the last check for correctness of the basic
+		 * comparison strategies.
+		 */
+		if (leaf)
+		{
+			order = bitncmp(ip_addr(prefix), ip_addr(argument),
+							ip_maxbits(prefix));
+
+			switch (strategy)
+			{
+				case RTLessStrategyNumber:
+					if (order >= 0)
+						bitmap = 0;
+					break;
+
+				case RTLessEqualStrategyNumber:
+					if (order > 0)
+						bitmap = 0;
+					break;
+
+				case RTEqualStrategyNumber:
+					if (order != 0)
+						bitmap = 0;
+					break;
+
+				case RTGreaterEqualStrategyNumber:
+					if (order < 0)
+						bitmap = 0;
+					break;
+
+				case RTGreaterStrategyNumber:
+					if (order <= 0)
+						bitmap = 0;
+					break;
+
+				case RTNotEqualStrategyNumber:
+					if (order == 0)
+						bitmap = 0;
+					break;
+			}
+
+			if (!bitmap)
+				break;
+		}
+	}
+
+	return bitmap;
+}
diff --git a/src/include/catalog/pg_amop.h b/src/include/catalog/pg_amop.h
index c642c39..4a8e097 100644
--- a/src/include/catalog/pg_amop.h
+++ b/src/include/catalog/pg_amop.h
@@ -840,20 +840,35 @@ DATA(insert (	3550	869 869 18 s	1201 783 0 ));
 DATA(insert (	3550	869 869 19 s	1202 783 0 ));
 DATA(insert (	3550	869 869 20 s	1203 783 0 ));
 DATA(insert (	3550	869 869 21 s	1204 783 0 ));
 DATA(insert (	3550	869 869 22 s	1205 783 0 ));
 DATA(insert (	3550	869 869 23 s	1206 783 0 ));
 DATA(insert (	3550	869 869 24 s	931 783 0 ));
 DATA(insert (	3550	869 869 25 s	932 783 0 ));
 DATA(insert (	3550	869 869 26 s	933 783 0 ));
 DATA(insert (	3550	869 869 27 s	934 783 0 ));
 
+/*
+ * SP-GiST inet_ops
+ */
+DATA(insert (	4573	869 869 3 s		3552 4000 0 ));
+DATA(insert (	4573	869 869 18 s	1201 4000 0 ));
+DATA(insert (	4573	869 869 19 s	1202 4000 0 ));
+DATA(insert (	4573	869 869 20 s	1203 4000 0 ));
+DATA(insert (	4573	869 869 21 s	1204 4000 0 ));
+DATA(insert (	4573	869 869 22 s	1205 4000 0 ));
+DATA(insert (	4573	869 869 23 s	1206 4000 0 ));
+DATA(insert (	4573	869 869 24 s	931 4000 0 ));
+DATA(insert (	4573	869 869 25 s	932 4000 0 ));
+DATA(insert (	4573	869 869 26 s	933 4000 0 ));
+DATA(insert (	4573	869 869 27 s	934 4000 0 ));
+
 /* BRIN opclasses */
 /* minmax bytea */
 DATA(insert (	4064	 17   17 1 s	  1957	  3580 0 ));
 DATA(insert (	4064	 17   17 2 s	  1958	  3580 0 ));
 DATA(insert (	4064	 17   17 3 s	  1955	  3580 0 ));
 DATA(insert (	4064	 17   17 4 s	  1960	  3580 0 ));
 DATA(insert (	4064	 17   17 5 s	  1959	  3580 0 ));
 /* minmax "char" */
 DATA(insert (	4062	 18   18 1 s	   631	  3580 0 ));
 DATA(insert (	4062	 18   18 2 s	   632	  3580 0 ));
diff --git a/src/include/catalog/pg_amproc.h b/src/include/catalog/pg_amproc.h
index f0ae008..f543a99 100644
--- a/src/include/catalog/pg_amproc.h
+++ b/src/include/catalog/pg_amproc.h
@@ -436,20 +436,25 @@ DATA(insert (	4015   600 600 5 4022 ));
 DATA(insert (	4016   600 600 1 4023 ));
 DATA(insert (	4016   600 600 2 4024 ));
 DATA(insert (	4016   600 600 3 4025 ));
 DATA(insert (	4016   600 600 4 4026 ));
 DATA(insert (	4016   600 600 5 4022 ));
 DATA(insert (	4017   25 25 1 4027 ));
 DATA(insert (	4017   25 25 2 4028 ));
 DATA(insert (	4017   25 25 3 4029 ));
 DATA(insert (	4017   25 25 4 4030 ));
 DATA(insert (	4017   25 25 5 4031 ));
+DATA(insert (	4573   869	869  1 4574 ));
+DATA(insert (	4573   869	869  2 4575 ));
+DATA(insert (	4573   869	869  3 4576 ));
+DATA(insert (	4573   869	869  4 3577 ));
+DATA(insert (	4573   869	869  5 3578 ));
 
 /* BRIN opclasses */
 /* minmax bytea */
 DATA(insert (	4064	17	  17  1  3383 ));
 DATA(insert (	4064	17	  17  2  3384 ));
 DATA(insert (	4064	17	  17  3  3385 ));
 DATA(insert (	4064	17	  17  4  3386 ));
 /* minmax "char" */
 DATA(insert (	4062	18	  18  1  3383 ));
 DATA(insert (	4062	18	  18  2  3384 ));
diff --git a/src/include/catalog/pg_opclass.h b/src/include/catalog/pg_opclass.h
index a9446b7..49d73fe 100644
--- a/src/include/catalog/pg_opclass.h
+++ b/src/include/catalog/pg_opclass.h
@@ -106,20 +106,21 @@ DATA(insert OID = 3122 ( 403	date_ops	PGNSP PGUID  434 1082 t 0 ));
 #define DATE_BTREE_OPS_OID 3122
 DATA(insert (	405		date_ops			PGNSP PGUID  435 1082 t 0 ));
 DATA(insert (	403		float4_ops			PGNSP PGUID 1970  700 t 0 ));
 DATA(insert (	405		float4_ops			PGNSP PGUID 1971  700 t 0 ));
 DATA(insert OID = 3123 ( 403	float8_ops	PGNSP PGUID 1970  701 t 0 ));
 #define FLOAT8_BTREE_OPS_OID 3123
 DATA(insert (	405		float8_ops			PGNSP PGUID 1971  701 t 0 ));
 DATA(insert (	403		inet_ops			PGNSP PGUID 1974  869 t 0 ));
 DATA(insert (	405		inet_ops			PGNSP PGUID 1975  869 t 0 ));
 DATA(insert (	783		inet_ops			PGNSP PGUID 3550  869 f 0 ));
+DATA(insert (	4000	inet_ops			PGNSP PGUID 4573  869 t 0 ));
 DATA(insert OID = 1979 ( 403	int2_ops	PGNSP PGUID 1976   21 t 0 ));
 #define INT2_BTREE_OPS_OID 1979
 DATA(insert (	405		int2_ops			PGNSP PGUID 1977   21 t 0 ));
 DATA(insert OID = 1978 ( 403	int4_ops	PGNSP PGUID 1976   23 t 0 ));
 #define INT4_BTREE_OPS_OID 1978
 DATA(insert (	405		int4_ops			PGNSP PGUID 1977   23 t 0 ));
 DATA(insert OID = 3124 ( 403	int8_ops	PGNSP PGUID 1976   20 t 0 ));
 #define INT8_BTREE_OPS_OID 3124
 DATA(insert (	405		int8_ops			PGNSP PGUID 1977   20 t 0 ));
 DATA(insert (	403		interval_ops		PGNSP PGUID 1982 1186 t 0 ));
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index fa44446..12e012e 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -72,20 +72,21 @@ DATA(insert OID =  428 (	403		bytea_ops		PGNSP PGUID ));
 DATA(insert OID =  429 (	403		char_ops		PGNSP PGUID ));
 DATA(insert OID =  431 (	405		char_ops		PGNSP PGUID ));
 DATA(insert OID =  434 (	403		datetime_ops	PGNSP PGUID ));
 DATA(insert OID =  435 (	405		date_ops		PGNSP PGUID ));
 DATA(insert OID = 1970 (	403		float_ops		PGNSP PGUID ));
 DATA(insert OID = 1971 (	405		float_ops		PGNSP PGUID ));
 DATA(insert OID = 1974 (	403		network_ops		PGNSP PGUID ));
 #define NETWORK_BTREE_FAM_OID 1974
 DATA(insert OID = 1975 (	405		network_ops		PGNSP PGUID ));
 DATA(insert OID = 3550 (	783		network_ops		PGNSP PGUID ));
+DATA(insert OID = 4573 (	4000	network_ops		PGNSP PGUID ));
 DATA(insert OID = 1976 (	403		integer_ops		PGNSP PGUID ));
 #define INTEGER_BTREE_FAM_OID 1976
 DATA(insert OID = 1977 (	405		integer_ops		PGNSP PGUID ));
 DATA(insert OID = 1982 (	403		interval_ops	PGNSP PGUID ));
 DATA(insert OID = 1983 (	405		interval_ops	PGNSP PGUID ));
 DATA(insert OID = 1984 (	403		macaddr_ops		PGNSP PGUID ));
 DATA(insert OID = 1985 (	405		macaddr_ops		PGNSP PGUID ));
 DATA(insert OID = 1986 (	403		name_ops		PGNSP PGUID ));
 #define NAME_BTREE_FAM_OID 1986
 DATA(insert OID = 1987 (	405		name_ops		PGNSP PGUID ));
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 62b9125..7767996 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2192,20 +2192,32 @@ DATA(insert OID = 3556 (  inet_gist_decompress	PGNSP PGUID 12 1 0 0 0 f f f f t
 DESCR("GiST support");
 DATA(insert OID = 3573 (  inet_gist_fetch		PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 2281 "2281" _null_ _null_ _null_ _null_ _null_ inet_gist_fetch _null_ _null_ _null_ ));
 DESCR("GiST support");
 DATA(insert OID = 3557 (  inet_gist_penalty		PGNSP PGUID 12 1 0 0 0 f f f f t f i s 3 0 2281 "2281 2281 2281" _null_ _null_ _null_ _null_ _null_ inet_gist_penalty _null_ _null_ _null_ ));
 DESCR("GiST support");
 DATA(insert OID = 3558 (  inet_gist_picksplit	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 2281 "2281 2281" _null_ _null_ _null_ _null_ _null_ inet_gist_picksplit _null_ _null_ _null_ ));
 DESCR("GiST support");
 DATA(insert OID = 3559 (  inet_gist_same		PGNSP PGUID 12 1 0 0 0 f f f f t f i s 3 0 2281 "869 869 2281" _null_ _null_ _null_ _null_ _null_ inet_gist_same _null_ _null_ _null_ ));
 DESCR("GiST support");
 
+/* SP-GiST support for inet and cidr */
+DATA(insert OID = 4574 (  inet_spg_config		PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 2278 "2281 2281" _null_ _null_ _null_ _null_ _null_ inet_spg_config _null_ _null_ _null_ ));
+DESCR("SP-GiST support");
+DATA(insert OID = 4575 (  inet_spg_choose		PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 2278 "2281 2281" _null_ _null_ _null_ _null_ _null_ inet_spg_choose _null_ _null_ _null_ ));
+DESCR("SP-GiST support");
+DATA(insert OID = 4576 (  inet_spg_picksplit	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 2278 "2281 2281" _null_ _null_ _null_ _null_ _null_ inet_spg_picksplit _null_ _null_ _null_ ));
+DESCR("SP-GiST support");
+DATA(insert OID = 3577 (  inet_spg_inner_consistent	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 2278 "2281 2281" _null_ _null_ _null_ _null_ _null_ inet_spg_inner_consistent _null_ _null_ _null_ ));
+DESCR("SP-GiST support");
+DATA(insert OID = 3578 (  inet_spg_leaf_consistent	PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 16 "2281 2281" _null_ _null_ _null_ _null_ _null_ inet_spg_leaf_consistent _null_ _null_ _null_ ));
+DESCR("SP-GiST support");
+
 /* Selectivity estimation for inet and cidr */
 DATA(insert OID = 3560 (  networksel		   PGNSP PGUID 12 1 0 0 0 f f f f t f s s 4 0 701 "2281 26 2281 23" _null_ _null_ _null_ _null_ _null_	networksel _null_ _null_ _null_ ));
 DESCR("restriction selectivity for network operators");
 DATA(insert OID = 3561 (  networkjoinsel	   PGNSP PGUID 12 1 0 0 0 f f f f t f s s 5 0 701 "2281 26 2281 21 2281" _null_ _null_ _null_ _null_ _null_	networkjoinsel _null_ _null_ _null_ ));
 DESCR("join selectivity for network operators");
 
 DATA(insert OID = 1690 ( time_mi_time		PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 1186 "1083 1083" _null_ _null_ _null_ _null_ _null_	time_mi_time _null_ _null_ _null_ ));
 
 DATA(insert OID = 1691 (  boolle			PGNSP PGUID 12 1 0 0 0 f f f t t f i s 2 0 16 "16 16" _null_ _null_ _null_ _null_ _null_ boolle _null_ _null_ _null_ ));
 DATA(insert OID = 1692 (  boolge			PGNSP PGUID 12 1 0 0 0 f f f t t f i s 2 0 16 "16 16" _null_ _null_ _null_ _null_ _null_ boolge _null_ _null_ _null_ ));
diff --git a/src/include/utils/inet.h b/src/include/utils/inet.h
index 2fe3ca8..1293e5c 100644
--- a/src/include/utils/inet.h
+++ b/src/include/utils/inet.h
@@ -110,32 +110,42 @@ typedef struct macaddr
 #define PG_RETURN_INET_P(x) return InetPGetDatum(x)
 /* macaddr is a fixed-length pass-by-reference datatype */
 #define DatumGetMacaddrP(X)    ((macaddr *) DatumGetPointer(X))
 #define MacaddrPGetDatum(X)    PointerGetDatum(X)
 #define PG_GETARG_MACADDR_P(n) DatumGetMacaddrP(PG_GETARG_DATUM(n))
 #define PG_RETURN_MACADDR_P(x) return MacaddrPGetDatum(x)
 
 /*
  * Support functions in network.c
  */
+extern inet *cidr_set_masklen_internal(inet *src, int bits);
 extern int	bitncmp(const unsigned char *l, const unsigned char *r, int n);
 extern int	bitncommon(const unsigned char *l, const unsigned char *r, int n);
 
 /*
  * GiST support functions in network_gist.c
  */
 extern Datum inet_gist_fetch(PG_FUNCTION_ARGS);
 extern Datum inet_gist_consistent(PG_FUNCTION_ARGS);
 extern Datum inet_gist_union(PG_FUNCTION_ARGS);
 extern Datum inet_gist_compress(PG_FUNCTION_ARGS);
 extern Datum inet_gist_decompress(PG_FUNCTION_ARGS);
 extern Datum inet_gist_penalty(PG_FUNCTION_ARGS);
 extern Datum inet_gist_picksplit(PG_FUNCTION_ARGS);
 extern Datum inet_gist_same(PG_FUNCTION_ARGS);
 
 /*
+ * SP-GiST support functions in network_gist.c
+ */
+extern Datum inet_spg_config(PG_FUNCTION_ARGS);
+extern Datum inet_spg_choose(PG_FUNCTION_ARGS);
+extern Datum inet_spg_picksplit(PG_FUNCTION_ARGS);
+extern Datum inet_spg_inner_consistent(PG_FUNCTION_ARGS);
+extern Datum inet_spg_leaf_consistent(PG_FUNCTION_ARGS);
+
+/*
  * Estimation functions in network_selfuncs.c
  */
 extern Datum networksel(PG_FUNCTION_ARGS);
 extern Datum networkjoinsel(PG_FUNCTION_ARGS);
 
 #endif   /* INET_H */
diff --git a/src/test/regress/expected/inet.out b/src/test/regress/expected/inet.out
index 9447e03..be9427e 100644
--- a/src/test/regress/expected/inet.out
+++ b/src/test/regress/expected/inet.out
@@ -404,20 +404,168 @@ SELECT i FROM inet_tbl WHERE i << '192.168.1.0/24'::cidr ORDER BY i;
 SELECT i FROM inet_tbl WHERE i << '192.168.1.0/24'::cidr ORDER BY i;
         i         
 ------------------
  192.168.1.0/25
  192.168.1.255/25
  192.168.1.226
 (3 rows)
 
 SET enable_seqscan TO on;
 DROP INDEX inet_idx2;
+-- check that spgist index works correctly
+CREATE INDEX inet_idx3 ON inet_tbl using spgist (i);
+SET enable_seqscan TO off;
+SELECT * FROM inet_tbl WHERE i << '192.168.1.0/24'::cidr ORDER BY i;
+       c        |        i         
+----------------+------------------
+ 192.168.1.0/24 | 192.168.1.0/25
+ 192.168.1.0/24 | 192.168.1.255/25
+ 192.168.1.0/26 | 192.168.1.226
+(3 rows)
+
+SELECT * FROM inet_tbl WHERE i <<= '192.168.1.0/24'::cidr ORDER BY i;
+       c        |        i         
+----------------+------------------
+ 192.168.1.0/24 | 192.168.1.0/24
+ 192.168.1.0/24 | 192.168.1.226/24
+ 192.168.1.0/24 | 192.168.1.255/24
+ 192.168.1.0/24 | 192.168.1.0/25
+ 192.168.1.0/24 | 192.168.1.255/25
+ 192.168.1.0/26 | 192.168.1.226
+(6 rows)
+
+SELECT * FROM inet_tbl WHERE i && '192.168.1.0/24'::cidr ORDER BY i;
+       c        |        i         
+----------------+------------------
+ 192.168.1.0/24 | 192.168.1.0/24
+ 192.168.1.0/24 | 192.168.1.226/24
+ 192.168.1.0/24 | 192.168.1.255/24
+ 192.168.1.0/24 | 192.168.1.0/25
+ 192.168.1.0/24 | 192.168.1.255/25
+ 192.168.1.0/26 | 192.168.1.226
+(6 rows)
+
+SELECT * FROM inet_tbl WHERE i >>= '192.168.1.0/24'::cidr ORDER BY i;
+       c        |        i         
+----------------+------------------
+ 192.168.1.0/24 | 192.168.1.0/24
+ 192.168.1.0/24 | 192.168.1.226/24
+ 192.168.1.0/24 | 192.168.1.255/24
+(3 rows)
+
+SELECT * FROM inet_tbl WHERE i >> '192.168.1.0/24'::cidr ORDER BY i;
+ c | i 
+---+---
+(0 rows)
+
+SELECT * FROM inet_tbl WHERE i < '192.168.1.0/24'::cidr ORDER BY i;
+      c      |      i      
+-------------+-------------
+ 10.0.0.0/8  | 9.1.2.3/8
+ 10.0.0.0/32 | 10.1.2.3/8
+ 10.0.0.0/8  | 10.1.2.3/8
+ 10.0.0.0/8  | 10.1.2.3/8
+ 10.1.0.0/16 | 10.1.2.3/16
+ 10.1.2.0/24 | 10.1.2.3/24
+ 10.1.2.3/32 | 10.1.2.3
+ 10.0.0.0/8  | 11.1.2.3/8
+(8 rows)
+
+SELECT * FROM inet_tbl WHERE i <= '192.168.1.0/24'::cidr ORDER BY i;
+       c        |       i        
+----------------+----------------
+ 10.0.0.0/8     | 9.1.2.3/8
+ 10.0.0.0/8     | 10.1.2.3/8
+ 10.0.0.0/32    | 10.1.2.3/8
+ 10.0.0.0/8     | 10.1.2.3/8
+ 10.1.0.0/16    | 10.1.2.3/16
+ 10.1.2.0/24    | 10.1.2.3/24
+ 10.1.2.3/32    | 10.1.2.3
+ 10.0.0.0/8     | 11.1.2.3/8
+ 192.168.1.0/24 | 192.168.1.0/24
+(9 rows)
+
+SELECT * FROM inet_tbl WHERE i = '192.168.1.0/24'::cidr ORDER BY i;
+       c        |       i        
+----------------+----------------
+ 192.168.1.0/24 | 192.168.1.0/24
+(1 row)
+
+SELECT * FROM inet_tbl WHERE i >= '192.168.1.0/24'::cidr ORDER BY i;
+         c          |        i         
+--------------------+------------------
+ 192.168.1.0/24     | 192.168.1.0/24
+ 192.168.1.0/24     | 192.168.1.226/24
+ 192.168.1.0/24     | 192.168.1.255/24
+ 192.168.1.0/24     | 192.168.1.0/25
+ 192.168.1.0/24     | 192.168.1.255/25
+ 192.168.1.0/26     | 192.168.1.226
+ ::ffff:1.2.3.4/128 | ::4.3.2.1/24
+ 10:23::f1/128      | 10:23::f1/64
+ 10:23::8000/113    | 10:23::ffff
+(9 rows)
+
+SELECT * FROM inet_tbl WHERE i > '192.168.1.0/24'::cidr ORDER BY i;
+         c          |        i         
+--------------------+------------------
+ 192.168.1.0/24     | 192.168.1.226/24
+ 192.168.1.0/24     | 192.168.1.255/24
+ 192.168.1.0/24     | 192.168.1.0/25
+ 192.168.1.0/24     | 192.168.1.255/25
+ 192.168.1.0/26     | 192.168.1.226
+ ::ffff:1.2.3.4/128 | ::4.3.2.1/24
+ 10:23::f1/128      | 10:23::f1/64
+ 10:23::8000/113    | 10:23::ffff
+(8 rows)
+
+SELECT * FROM inet_tbl WHERE i <> '192.168.1.0/24'::cidr ORDER BY i;
+         c          |        i         
+--------------------+------------------
+ 10.0.0.0/8         | 9.1.2.3/8
+ 10.0.0.0/8         | 10.1.2.3/8
+ 10.0.0.0/32        | 10.1.2.3/8
+ 10.0.0.0/8         | 10.1.2.3/8
+ 10.1.0.0/16        | 10.1.2.3/16
+ 10.1.2.0/24        | 10.1.2.3/24
+ 10.1.2.3/32        | 10.1.2.3
+ 10.0.0.0/8         | 11.1.2.3/8
+ 192.168.1.0/24     | 192.168.1.226/24
+ 192.168.1.0/24     | 192.168.1.255/24
+ 192.168.1.0/24     | 192.168.1.0/25
+ 192.168.1.0/24     | 192.168.1.255/25
+ 192.168.1.0/26     | 192.168.1.226
+ ::ffff:1.2.3.4/128 | ::4.3.2.1/24
+ 10:23::f1/128      | 10:23::f1/64
+ 10:23::8000/113    | 10:23::ffff
+(16 rows)
+
+-- test index-only scans
+EXPLAIN (COSTS OFF)
+SELECT i FROM inet_tbl WHERE i << '192.168.1.0/24'::cidr ORDER BY i;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Sort
+   Sort Key: i
+   ->  Index Only Scan using inet_idx3 on inet_tbl
+         Index Cond: (i << '192.168.1.0/24'::inet)
+(4 rows)
+
+SELECT i FROM inet_tbl WHERE i << '192.168.1.0/24'::cidr ORDER BY i;
+        i         
+------------------
+ 192.168.1.0/25
+ 192.168.1.255/25
+ 192.168.1.226
+(3 rows)
+
+SET enable_seqscan TO on;
+DROP INDEX inet_idx3;
 -- simple tests of inet boolean and arithmetic operators
 SELECT i, ~i AS "~i" FROM inet_tbl;
         i         |                     ~i                     
 ------------------+--------------------------------------------
  192.168.1.226/24 | 63.87.254.29/24
  192.168.1.226    | 63.87.254.29
  192.168.1.0/24   | 63.87.254.255/24
  192.168.1.0/25   | 63.87.254.255/25
  192.168.1.255/24 | 63.87.254.0/24
  192.168.1.255/25 | 63.87.254.0/25
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 7c09fa3..aff0a87 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -1729,21 +1729,30 @@ ORDER BY 1, 2, 3;
        4000 |            7 | @>
        4000 |            8 | <@
        4000 |           10 | <^
        4000 |           11 | <
        4000 |           11 | >^
        4000 |           12 | <=
        4000 |           14 | >=
        4000 |           15 | >
        4000 |           16 | @>
        4000 |           18 | =
-(108 rows)
+       4000 |           19 | <>
+       4000 |           20 | <
+       4000 |           21 | <=
+       4000 |           22 | >
+       4000 |           23 | >=
+       4000 |           24 | <<
+       4000 |           25 | <<=
+       4000 |           26 | >>
+       4000 |           27 | >>=
+(117 rows)
 
 -- Check that all opclass search operators have selectivity estimators.
 -- This is not absolutely required, but it seems a reasonable thing
 -- to insist on for all standard datatypes.
 SELECT p1.amopfamily, p1.amopopr, p2.oid, p2.oprname
 FROM pg_amop AS p1, pg_operator AS p2
 WHERE p1.amopopr = p2.oid AND p1.amoppurpose = 's' AND
     (p2.oprrest = 0 OR p2.oprjoin = 0);
  amopfamily | amopopr | oid | oprname 
 ------------+---------+-----+---------
diff --git a/src/test/regress/sql/inet.sql b/src/test/regress/sql/inet.sql
index 007741e..880e115 100644
--- a/src/test/regress/sql/inet.sql
+++ b/src/test/regress/sql/inet.sql
@@ -86,20 +86,43 @@ SELECT * FROM inet_tbl WHERE i > '192.168.1.0/24'::cidr ORDER BY i;
 SELECT * FROM inet_tbl WHERE i <> '192.168.1.0/24'::cidr ORDER BY i;
 
 -- test index-only scans
 EXPLAIN (COSTS OFF)
 SELECT i FROM inet_tbl WHERE i << '192.168.1.0/24'::cidr ORDER BY i;
 SELECT i FROM inet_tbl WHERE i << '192.168.1.0/24'::cidr ORDER BY i;
 
 SET enable_seqscan TO on;
 DROP INDEX inet_idx2;
 
+-- check that spgist index works correctly
+CREATE INDEX inet_idx3 ON inet_tbl using spgist (i);
+SET enable_seqscan TO off;
+SELECT * FROM inet_tbl WHERE i << '192.168.1.0/24'::cidr ORDER BY i;
+SELECT * FROM inet_tbl WHERE i <<= '192.168.1.0/24'::cidr ORDER BY i;
+SELECT * FROM inet_tbl WHERE i && '192.168.1.0/24'::cidr ORDER BY i;
+SELECT * FROM inet_tbl WHERE i >>= '192.168.1.0/24'::cidr ORDER BY i;
+SELECT * FROM inet_tbl WHERE i >> '192.168.1.0/24'::cidr ORDER BY i;
+SELECT * FROM inet_tbl WHERE i < '192.168.1.0/24'::cidr ORDER BY i;
+SELECT * FROM inet_tbl WHERE i <= '192.168.1.0/24'::cidr ORDER BY i;
+SELECT * FROM inet_tbl WHERE i = '192.168.1.0/24'::cidr ORDER BY i;
+SELECT * FROM inet_tbl WHERE i >= '192.168.1.0/24'::cidr ORDER BY i;
+SELECT * FROM inet_tbl WHERE i > '192.168.1.0/24'::cidr ORDER BY i;
+SELECT * FROM inet_tbl WHERE i <> '192.168.1.0/24'::cidr ORDER BY i;
+
+-- test index-only scans
+EXPLAIN (COSTS OFF)
+SELECT i FROM inet_tbl WHERE i << '192.168.1.0/24'::cidr ORDER BY i;
+SELECT i FROM inet_tbl WHERE i << '192.168.1.0/24'::cidr ORDER BY i;
+
+SET enable_seqscan TO on;
+DROP INDEX inet_idx3;
+
 -- simple tests of inet boolean and arithmetic operators
 SELECT i, ~i AS "~i" FROM inet_tbl;
 SELECT i, c, i & c AS "and" FROM inet_tbl;
 SELECT i, c, i | c AS "or" FROM inet_tbl;
 SELECT i, i + 500 AS "i+500" FROM inet_tbl;
 SELECT i, i - 500 AS "i-500" FROM inet_tbl;
 SELECT i, c, i - c AS "minus" FROM inet_tbl;
 SELECT '127.0.0.1'::inet + 257;
 SELECT ('127.0.0.1'::inet + 257) - 257;
 SELECT '127::1'::inet + 257;
-- 
2.5.4 (Apple Git-61)

Oleg Bartunov

obartunov@gmail.com

almost 10 years ago

In reply to: Emre Hasegeli (#1)

Re: SP-GiST support for inet datatypes

On Wed, Mar 2, 2016 at 11:56 PM, Emre Hasegeli <emre@hasegeli.com> wrote:

Attached patches add SP-GiST support to the inet datatypes. The operator
class comes with a small change on the SP-GiST framework to allow fixed
number of child nodes.

The index is like prefix tree except that it doesn't bother to split the
addresses into parts as text is split. It also doesn't use labels to know
the part after the prefix, but relies on static node numbers.

Thanks, Emre for interesting spgist. We are bit busy and will take a look
on your patches when come to our spgist patch.

Show quoted text

The GiST index released with version 9.4 performs really bad with real
world data. SP-GiST works much better with the query posted to the
performance list [1] a while ago:

hasegeli=# SELECT DISTINCT route INTO hmm FROM routes_view WHERE asn =

2914;

SELECT 732

hasegeli=# EXPLAIN ANALYZE SELECT routes.route FROM routes JOIN hmm ON

routes.route && hmm.route;

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------

Nested Loop (cost=0.41..571742.27 rows=2248 width=7) (actual

time=12.643..20474.813 rows=8127 loops=1)

-> Seq Scan on hmm (cost=0.00..11.32 rows=732 width=7) (actual

time=0.017..0.524 rows=732 loops=1)

-> Index Only Scan using route_gist on routes (cost=0.41..552.05

rows=22900 width=7) (actual time=4.851..27.948 rows=11 loops=732)

Index Cond: (route && (hmm.route)::inet)
Heap Fetches: 8127
Planning time: 1.507 ms
Execution time: 20475.605 ms
(7 rows)

hasegeli=# DROP INDEX route_gist;
DROP INDEX

hasegeli=# CREATE INDEX route_spgist ON routes USING spgist (route);
CREATE INDEX

hasegeli=# EXPLAIN ANALYZE SELECT routes.route FROM routes JOIN hmm ON

routes.route && hmm.route;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------

Nested Loop (cost=0.41..588634.27 rows=2248 width=7) (actual

time=0.081..16.961 rows=8127 loops=1)

-> Seq Scan on hmm (cost=0.00..11.32 rows=732 width=7) (actual

time=0.022..0.079 rows=732 loops=1)

-> Index Only Scan using route_spgist on routes (cost=0.41..575.13

rows=22900 width=7) (actual time=0.014..0.021 rows=11 loops=732)

Index Cond: (route && (hmm.route)::inet)
Heap Fetches: 8127
Planning time: 1.376 ms
Execution time: 15.936 ms

[1]
/messages/by-id/alpine.DEB.2.11.1508251504160.31004@pyrite

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Oleg Bartunov

obartunov@gmail.com

almost 10 years ago

In reply to: Oleg Bartunov (#2)

Re: SP-GiST support for inet datatypes

On Thu, Mar 3, 2016 at 8:51 AM, Oleg Bartunov <obartunov@gmail.com> wrote:

On Wed, Mar 2, 2016 at 11:56 PM, Emre Hasegeli <emre@hasegeli.com> wrote:

Attached patches add SP-GiST support to the inet datatypes. The
operator class comes with a small change on the SP-GiST framework to allow
fixed number of child nodes.

The index is like prefix tree except that it doesn't bother to split the
addresses into parts as text is split. It also doesn't use labels to know
the part after the prefix, but relies on static node numbers.

Thanks, Emre for interesting spgist. We are bit busy and will take a look
on your patches when come to our spgist patch.

Emre, I checked original thread and didn't find sample data. Could you
provide them for testing ?

Show quoted text

The GiST index released with version 9.4 performs really bad with real

world data. SP-GiST works much better with the query posted to the
performance list [1] a while ago:

hasegeli=# SELECT DISTINCT route INTO hmm FROM routes_view WHERE asn =

2914;

SELECT 732

hasegeli=# EXPLAIN ANALYZE SELECT routes.route FROM routes JOIN hmm ON

routes.route && hmm.route;

QUERY

PLAN

----------------------------------------------------------------------------------------------------------------------------------------

Nested Loop (cost=0.41..571742.27 rows=2248 width=7) (actual

time=12.643..20474.813 rows=8127 loops=1)

-> Seq Scan on hmm (cost=0.00..11.32 rows=732 width=7) (actual

time=0.017..0.524 rows=732 loops=1)

-> Index Only Scan using route_gist on routes (cost=0.41..552.05

rows=22900 width=7) (actual time=4.851..27.948 rows=11 loops=732)

Index Cond: (route && (hmm.route)::inet)
Heap Fetches: 8127
Planning time: 1.507 ms
Execution time: 20475.605 ms
(7 rows)

hasegeli=# DROP INDEX route_gist;
DROP INDEX

hasegeli=# CREATE INDEX route_spgist ON routes USING spgist (route);
CREATE INDEX

hasegeli=# EXPLAIN ANALYZE SELECT routes.route FROM routes JOIN hmm ON

routes.route && hmm.route;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------

Nested Loop (cost=0.41..588634.27 rows=2248 width=7) (actual

time=0.081..16.961 rows=8127 loops=1)

-> Seq Scan on hmm (cost=0.00..11.32 rows=732 width=7) (actual

time=0.022..0.079 rows=732 loops=1)

-> Index Only Scan using route_spgist on routes (cost=0.41..575.13

rows=22900 width=7) (actual time=0.014..0.021 rows=11 loops=732)

Index Cond: (route && (hmm.route)::inet)
Heap Fetches: 8127
Planning time: 1.376 ms
Execution time: 15.936 ms

[1]
/messages/by-id/alpine.DEB.2.11.1508251504160.31004@pyrite

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Emre Hasegeli

emre@hasegeli.com

almost 10 years ago

In reply to: Oleg Bartunov (#3)

Re: SP-GiST support for inet datatypes

Emre, I checked original thread and didn't find sample data. Could you provide them for testing ?

I found it on the Git history:

https://github.com/job/irrexplorer/blob/9e8b5330d7ef0022abbe1af18291257e044eb24b/data/irrexplorer_dump.sql.gz?raw=true

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Oleg Bartunov

obartunov@gmail.com

almost 10 years ago

In reply to: Emre Hasegeli (#4)

Re: SP-GiST support for inet datatypes

On Thu, Mar 3, 2016 at 11:45 AM, Emre Hasegeli <emre@hasegeli.com> wrote:

Emre, I checked original thread and didn't find sample data. Could you

provide them for testing ?

I found it on the Git history:

https://github.com/job/irrexplorer/blob/9e8b5330d7ef0022abbe1af18291257e044eb24b/data/irrexplorer_dump.sql.gz?raw=true

Thanks !

spgist index creates 2 times faster than gist, but index size is
noticeably bugger

Spgist index tree is much better than gist - 12149 pages vs 1334760 !

EXPLAIN (ANALYZE, buffers) SELECT routes.route FROM routes JOIN hmm ON
routes.route && hmm.route;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.41..570430.27 rows=2338 width=7) (actual
time=5.730..12085.747 rows=8127 loops=1)
Buffers: shared hit=1334760
-> Seq Scan on hmm (cost=0.00..11.32 rows=732 width=7) (actual
time=0.013..0.528 rows=732 loops=1)
Buffers: shared hit=4
-> Index Only Scan using route_gist on routes (cost=0.41..550.26
rows=22900 width=7) (actual time=2.491..16.503 rows=11 loops=732)
Index Cond: (route && (hmm.route)::inet)
Heap Fetches: 8127
Buffers: shared hit=1334756
Planning time: 0.827 ms
Execution time: 12086.513 ms
(10 rows)

EXPLAIN (ANALYZE, buffers) SELECT routes.route FROM routes JOIN hmm ON
routes.route && hmm.route;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.41..588634.27 rows=2338 width=7) (actual
time=0.043..12.150 rows=8127 loops=1)
Buffers: shared hit=12149
-> Seq Scan on hmm (cost=0.00..11.32 rows=732 width=7) (actual
time=0.013..0.075 rows=732 loops=1)
Buffers: shared hit=4
-> Index Only Scan using route_spgist on routes (cost=0.41..575.13
rows=22900 width=7) (actual time=0.011..0.015 rows=11 loops=732)
Index Cond: (route && (hmm.route)::inet)
Heap Fetches: 8127
Buffers: shared hit=12145
Planning time: 0.779 ms
Execution time: 12.603 ms
(10 rows)

Oleg Bartunov

obartunov@gmail.com

almost 10 years ago

In reply to: Oleg Bartunov (#5)

Re: SP-GiST support for inet datatypes

On Tue, Mar 8, 2016 at 11:17 PM, Oleg Bartunov <obartunov@gmail.com> wrote:

On Thu, Mar 3, 2016 at 11:45 AM, Emre Hasegeli <emre@hasegeli.com> wrote:

Emre, I checked original thread and didn't find sample data. Could you

provide them for testing ?

I found it on the Git history:

https://github.com/job/irrexplorer/blob/9e8b5330d7ef0022abbe1af18291257e044eb24b/data/irrexplorer_dump.sql.gz?raw=true

Thanks !

spgist index creates 2 times faster than gist, but index size is
noticeably bugger

\di+ route_*
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+--------------+-------+----------+--------+--------+-------------
public | route_gist | index | postgres | routes | 96 MB |
public | route_spgist | index | postgres | routes | 132 MB |
(2 rows)

Spgist index tree is much better than gist - 12149 pages vs 1334760 !

I also noticed, that spgist is much faster than gist for other inet
operators. I'd like to see in 9.6.

Show quoted text

EXPLAIN (ANALYZE, buffers) SELECT routes.route FROM routes JOIN hmm ON
routes.route && hmm.route;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.41..570430.27 rows=2338 width=7) (actual
time=5.730..12085.747 rows=8127 loops=1)
Buffers: shared hit=1334760
-> Seq Scan on hmm (cost=0.00..11.32 rows=732 width=7) (actual
time=0.013..0.528 rows=732 loops=1)
Buffers: shared hit=4
-> Index Only Scan using route_gist on routes (cost=0.41..550.26
rows=22900 width=7) (actual time=2.491..16.503 rows=11 loops=732)
Index Cond: (route && (hmm.route)::inet)
Heap Fetches: 8127
Buffers: shared hit=1334756
Planning time: 0.827 ms
Execution time: 12086.513 ms
(10 rows)

EXPLAIN (ANALYZE, buffers) SELECT routes.route FROM routes JOIN hmm ON
routes.route && hmm.route;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.41..588634.27 rows=2338 width=7) (actual
time=0.043..12.150 rows=8127 loops=1)
Buffers: shared hit=12149
-> Seq Scan on hmm (cost=0.00..11.32 rows=732 width=7) (actual
time=0.013..0.075 rows=732 loops=1)
Buffers: shared hit=4
-> Index Only Scan using route_spgist on routes (cost=0.41..575.13
rows=22900 width=7) (actual time=0.011..0.015 rows=11 loops=732)
Index Cond: (route && (hmm.route)::inet)
Heap Fetches: 8127
Buffers: shared hit=12145
Planning time: 0.779 ms
Execution time: 12.603 ms
(10 rows)

Emre Hasegeli

emre@hasegeli.com

almost 10 years ago

In reply to: Oleg Bartunov (#6)

Re: SP-GiST support for inet datatypes

Spgist index tree is much better than gist - 12149 pages vs 1334760 !

I assume this is the reason why it is bigger. IP addresses are very
well suited to SP-GiST. They naturally do not overlap.

I also noticed, that spgist is much faster than gist for other inet
operators. I'd like to see in 9.6.

Unfortunately, I missed the deadline of the last commitfest. It is on
the next one:

https://commitfest.postgresql.org/10/571/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

over 9 years ago

In reply to: Emre Hasegeli (#1)

Re: SP-GiST support for inet datatypes

Emre Hasegeli <emre@hasegeli.com> writes:

Attached patches add SP-GiST support to the inet datatypes.

I started to look at this patch. The reported speedup is pretty nice,
but ...

The operator
class comes with a small change on the SP-GiST framework to allow fixed
number of child nodes.

... this part of the patch breaks the existing API for SP-GiST opclasses.
That is a hard sell. It may only matter for one existing opclass in core,
but unless we have reason to think nobody is using any custom SP-GiST
opclasses, that is not a pleasant thing to do. How important is it really
for this opclass? Several of the existing opclasses use fixed numbers of
child nodes, so why does this need something they don't?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Emre Hasegeli

emre@hasegeli.com

over 9 years ago

In reply to: Tom Lane (#8)

Re: SP-GiST support for inet datatypes

... this part of the patch breaks the existing API for SP-GiST opclasses.
That is a hard sell. It may only matter for one existing opclass in core,
but unless we have reason to think nobody is using any custom SP-GiST
opclasses, that is not a pleasant thing to do. How important is it really
for this opclass? Several of the existing opclasses use fixed numbers of
child nodes, so why does this need something they don't?

This addition is required to implement the operator class cleanly. We
could store the id of the child node as a "label", and add nodes when
labels are missing, but this complicates all operator class functions.
Other designs are also possible using the labels, but a simple design
with fixed number of nodes worked best for me.

Currently, SP-GiST framework supports fixed number of child nodes, if
the index is growing by page splits, not by prefix splits. This is
not a fair API. I assume it is designed by only having the prefix
tree in mind. The change makes SP-GiST more reusable.

SP-GiST is not widely adopted in my experience. Breaking this part of
the API would affect prefix tree implementations. I don't think there
are much of them. Supporting the new interface is easy for them. We
can try to support the old interface by side of the new one, but this
wouldn't be very clean either.

I thought we were breaking the C interface in similar ways on major releases.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Tom Lane

tgl@sss.pgh.pa.us

over 9 years ago

In reply to: Emre Hasegeli (#9)

Re: SP-GiST support for inet datatypes

Emre Hasegeli <emre@hasegeli.com> writes:

... Several of the existing opclasses use fixed numbers of
child nodes, so why does this need something they don't?

Currently, SP-GiST framework supports fixed number of child nodes, if
the index is growing by page splits, not by prefix splits. This is
not a fair API.

Hm, I see your point: the spgSplitTuple action only supports forming a
new upper tuple with a single, labeled child node. That is pretty bogus.
You could imagine doing repeated spgAddNode actions to enlarge the new
upper tuple; but that's ridiculously inefficient, and it's also unclear
how you'd avoid confusing searches that descend through a partially-split
upper tuple (either concurrently, or after a crash that prevented
finishing the split).

SP-GiST is not widely adopted in my experience. Breaking this part of
the API would affect prefix tree implementations. I don't think there
are much of them. Supporting the new interface is easy for them. We
can try to support the old interface by side of the new one, but this
wouldn't be very clean either.

We could imagine defining a "spgSplitTupleMulti" action alongside the
existing one, but I agree it's a bit of a wart --- nobody would have
designed it that way in a green field. On the other hand, index AM-to-
opclass APIs are things we don't like to break. We went out of our way
to make past GiST and GIN API changes upward-compatible.

Oleg, Teodor, do you have any idea how many people are using custom
SPGiST opclasses that might be affected by an API break here?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Andrew Gierth

andrew@tao11.riddles.org.uk

over 9 years ago

In reply to: Tom Lane (#8)

Re: SP-GiST support for inet datatypes

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

Emre Hasegeli <emre@hasegeli.com> writes:

Attached patches add SP-GiST support to the inet datatypes.

Tom> I started to look at this patch. The reported speedup is pretty
Tom> nice, but ...

The builtin gist support for inet seems quite surprisingly slow; ip4r
beats it into the ground without difficulty. (I'd be curious how well
this sp-gist version stacks up against ip4r.)

--
Andrew (irc:RhodiumToad)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Tom Lane

tgl@sss.pgh.pa.us

over 9 years ago

In reply to: Andrew Gierth (#11)

Re: SP-GiST support for inet datatypes

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> I started to look at this patch. The reported speedup is pretty
Tom> nice, but ...

The builtin gist support for inet seems quite surprisingly slow; ip4r
beats it into the ground without difficulty. (I'd be curious how well
this sp-gist version stacks up against ip4r.)

Yeah ... what I find interesting about this patch is that the opclass's
data splitting rules don't seem all that much different from what the
existing gist opclass knows how to do. I suspect that the speed
differential arises because the gist opclass's penalty and picksplit
algorithms are somehow extremely stupid, which hints that maybe they
could be made better. In particular, IIRC the test cases that we looked
at when making the gist opclass tended to have only one or a few netmask
lengths, whereas in this data set the netmasks are all over the place.
I theorize that that's what's breaking the gist algorithm. Not sure
exactly how to make it better though.

This example also seems to be showing up some lack of intelligence in
SPGiST itself, in that there are an awful lot of pages with a lot of
empty space on them. Not sure why.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Tom Lane

tgl@sss.pgh.pa.us

over 9 years ago

In reply to: Tom Lane (#12)

Re: SP-GiST support for inet datatypes

I've pushed this patch with mostly (not entirely) cosmetic adjustments.
I still think it'd be worth looking into why the produced index is larger
than the GiST equivalent, and for that matter exactly why the GiST
equivalent is so much slower to search.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Tom Lane

tgl@sss.pgh.pa.us

over 9 years ago

In reply to: Tom Lane (#13)

Re: SP-GiST support for inet datatypes

I wrote:

I've pushed this patch with mostly (not entirely) cosmetic adjustments.
I still think it'd be worth looking into why the produced index is larger
than the GiST equivalent, and for that matter exactly why the GiST
equivalent is so much slower to search.

I spent some time poking at this, and noticed that although the picksplit
rule is guaranteed to produce a non-degenerate split unless the inputs
are all identical, it's entirely capable of producing a very bad split.
Here's some instrumentation printout showing the numbers of items assigned
to the four subnodes, for the test case we've been working with:

58 inet picksplit (227 tuples): 0 0 127 100
65 inet picksplit (227 tuples): 0 0 223 4
69 inet picksplit (225 tuples): 2 0 126 97
69 inet picksplit (227 tuples): 1 0 226 0
72 inet picksplit (227 tuples): 0 0 224 3
72 inet picksplit (227 tuples): 1 0 132 94
82 inet picksplit (226 tuples): 1 0 127 98
86 inet picksplit (227 tuples): 0 0 130 97
95 inet picksplit (227 tuples): 0 0 2 225
99 inet picksplit (227 tuples): 0 0 225 2
117 inet picksplit (227 tuples): 2 0 126 99
118 inet picksplit (227 tuples): 0 0 128 99
137 inet picksplit (227 tuples): 0 0 226 1
151 inet picksplit (227 tuples): 0 0 1 226
270 inet picksplit (227 tuples): 1 0 127 99
499 inet picksplit (122 tuples): 0 0 64 58

(This is from "sort | uniq -c" output, so the first column is the number
of occurrences of identical split counts.)

Aside from the disturbing frequency of 100-to-1 split ratios, it also
looks like the inclusion of the masklen bit is hardly pulling its weight,
though that might be a artifact of this data set.

The bad splits seem to be a direct contributor to the index's relatively
poor space efficiency; they lead to SPGiST having to move nearly all of
a long leaf chain to another page, and then soon having to split it again,
resulting in another mostly-empty page, lather rinse repeat. They can't
be very helpful for search speed either.

I think that it'd be worth investigating changing to a split rule that
uses the next two or three or four bits of the address, not just the
single next bit, to index into more than four subnodes. It's pretty clear
how to adjust the insertion functions to support that, and a crude hack in
that line suggested that the index does get noticeably smaller. However,
I didn't quite grok how to adjust the search (consistent) functions.
Obviously we could apply the same rules in a loop, considering each
successive address bit, but there may be a better way.

Even though I already committed the code, we're a very long way from
v10 release, so I see no reason not to consider incompatible changes
in the way this opclass works.

I also noticed that the index build wastes some space because SPGIST
remembers only one partly-full page within each of its page categories.
A prototype patch to enlarge the size of that cache helped a good bit
too. I'll clean that up and post it later, but I was hoping you'd
work on improving this opclass's split rules.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Emre Hasegeli

emre@hasegeli.com

over 9 years ago

In reply to: Tom Lane (#14)

Re: SP-GiST support for inet datatypes

Aside from the disturbing frequency of 100-to-1 split ratios, it also
looks like the inclusion of the masklen bit is hardly pulling its weight,
though that might be a artifact of this data set.

I was expecting this. Including masklen bit to decision was something
we could easily do. It doesn't make the index more complicated, even
more simpler. I think it would be useful, when host addresses with
masklen are indexed. IRRExplorer dataset is just networks.

I think that it'd be worth investigating changing to a split rule that
uses the next two or three or four bits of the address, not just the
single next bit, to index into more than four subnodes. It's pretty clear
how to adjust the insertion functions to support that, and a crude hack in
that line suggested that the index does get noticeably smaller. However,
I didn't quite grok how to adjust the search (consistent) functions.
Obviously we could apply the same rules in a loop, considering each
successive address bit, but there may be a better way.

I have experimented with such designs before posting this patch, but
couldn't make any of them work. It gets very complicated when more
than one bit is used. When only 2 bits are used, there would be 12
child nodes:

* network bits 00
* network bits 01
* network bits 10
* network bits 11
* network bit 0 and host bit 0
* network bit 0 and host bit 1
* network bit 1 and host bit 0
* network bit 1 and host bit 1
* host bits 00
* host bits 01
* host bits 10
* host bits 11

Another design would be a prefix tree. We could split by using a
byte, and store that byte as label. Then there wouldn't be static
number of nodes. We would need to iterate trough the labels and
re-execute the condition on all of them. Surely this would perform
better for some working sets, but I don't think on all them. I will
experiment with this, if I get the chance.

I belive the current index is useful as it is. The wasted space
depends on the distribution of the keys:

# create table a3 as select (trunc(random() * 256)::text || '.' || trunc(random() * 8)::text || '.0.0')::inet from generate_series(1, 1000000) as i;
SELECT 1000000
# create table a4 as select (trunc(random() * 256)::text || '.' || trunc(random() * 16)::text || '.0.0')::inet from generate_series(1, 1000000) as i;
SELECT 1000000
# create table a5 as select (trunc(random() * 256)::text || '.' || trunc(random() * 32)::text || '.0.0')::inet from generate_series(1, 1000000) as i;
SELECT 1000000
# create table a6 as select (trunc(random() * 256)::text || '.' || trunc(random() * 64)::text || '.0.0')::inet from generate_series(1, 1000000) as i;
SELECT 1000000
# create index on a3 using spgist (inet);
CREATE INDEX
# create index on a4 using spgist (inet);
CREATE INDEX
# create index on a5 using spgist (inet);
CREATE INDEX
# create index on a6 using spgist (inet);
CREATE INDEX
# \di+
List of relations
Schema | Name | Type | Owner | Table | Size | Description
-------+-------------+-------+----------+-------+-------+-------------
public | a3_inet_idx | index | hasegeli | a3 | 42 MB |
public | a4_inet_idx | index | hasegeli | a4 | 46 MB |
public | a5_inet_idx | index | hasegeli | a5 | 55 MB |
public | a6_inet_idx | index | hasegeli | a6 | 56 MB |
(4 rows)

It also gets better with the number of keys:

# create table a6 as select (trunc(random() * 256)::text || '.' || trunc(random() * 64)::text || '.0.0')::inet from generate_series(1, 1000000) as i;
SELECT 1000000
# create table b6 as select (trunc(random() * 256)::text || '.' || trunc(random() * 64)::text || '.0.0')::inet from generate_series(1, 2000000) as i;
SELECT 2000000
# create table c6 as select (trunc(random() * 256)::text || '.' || trunc(random() * 64)::text || '.0.0')::inet from generate_series(1, 4000000) as i;
SELECT 4000000
# create table d6 as select (trunc(random() * 256)::text || '.' || trunc(random() * 64)::text || '.0.0')::inet from generate_series(1, 8000000) as i;
SELECT 8000000
# create index on a6 using spgist (inet);
CREATE INDEX
# create index on b6 using spgist (inet);
CREATE INDEX
# create index on c6 using spgist (inet);
CREATE INDEX
# create index on d6 using spgist (inet);
CREATE INDEX
# \di+
List of relations
Schema | Name | Type | Owner | Table | Size | Description
-------+-------------+-------+----------+-------+--------+-------------
public | a6_inet_idx | index | hasegeli | a6 | 56 MB |
public | b6_inet_idx | index | hasegeli | b6 | 109 MB |
public | c6_inet_idx | index | hasegeli | c6 | 181 MB |
public | d6_inet_idx | index | hasegeli | a6 | 336 MB |
(4 rows)

Thank you for committing this.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers