CIDR/INET improvements

Started by Joachim Wielandabout 20 years ago8 messages
#1Joachim Wieland
joe@mcknight.de

The TODO list contains some items concerning the CIDR/INET datatype.

* %Prevent INET cast to CIDR if the unmasked bits are not zero, or
zero the bits

I added a function for this cast (which zeroes the bits) but then the
opr_sanity regression test failed because there is a cast from INET -> CIDR
but the other way round is considered to be binary compatible.

Actually both types are not binary compatible, since they have a
type component that is either 0 or 1, depending on whether it is of type
INET or CIDR.

If nobody objects, I'll send in a patch that includes a cast function for
the other way round as well.

* Allow INET + INT4 to increment the host part of the address, or
throw an error on overflow

Once at it I wonder how much arithmetic these types need?
What about functions to get/set a specific byte, for example:

inet_get_byte('192.168.1.1'::inet, 0) returns 1
inet_get_byte('192.168.1.1'::inet, 1) returns 1
inet_get_byte('192.168.1.1'::inet, 2) returns 168
inet_get_byte('192.168.1.1'::inet, 3) returns 192

and inet_set_byte('192.168.1.1'::inet, 3, 128) returns '128.168.1.1'

Which other functions have been missing here in the past?

Joachim

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joachim Wieland (#1)
Re: CIDR/INET improvements

Joachim Wieland <joe@mcknight.de> writes:

Actually both types are not binary compatible, since they have a
type component that is either 0 or 1, depending on whether it is of type
INET or CIDR.

The whole question of the relationship of those types really needs to be
looked at more carefully. We've got this schizophrenic idea that they
sometimes are the same type and sometimes are not. ISTM that either
they are the same type (and having a bit within the data is reasonable)
or they are distinct types (in which case the bit within the data should
be redundant). I'm not sure which is better.

I think the reason why things are as they are right now is to avoid
needing a pile of redundant-seeming pg_proc entries, eg you'd need
both abbrev(inet) and abbrev(cidr) if you were taking a hard line
about them being different types.

You can *not* just throw in a cast that removes the bit without breaking
many of those functions for the CIDR case. For instance abbrev behaves
differently depending on the state of the bit:

regression=# select abbrev(cidr '10.1.0.0/16');
abbrev
---------
10.1/16
(1 row)

regression=# select abbrev(inet '10.1.0.0/16');
abbrev
-------------
10.1.0.0/16
(1 row)

What about functions to get/set a specific byte, for example:

I would vote against adding any such thing in the absence of any strong
demand for it. I think functions that expose the underlying data just
encourage people to write IPv4-only code. If you can't define and use
the function in a way that handles both IPv4 and IPv6, you probably
shouldn't have it.

regards, tom lane

#3Joachim Wieland
joe@mcknight.de
In reply to: Tom Lane (#2)
Re: CIDR/INET improvements

On Sat, Jan 07, 2006 at 12:50:23PM -0500, Tom Lane wrote:

Joachim Wieland <joe@mcknight.de> writes:

Actually both types are not binary compatible, since they have a
type component that is either 0 or 1, depending on whether it is of type
INET or CIDR.

The whole question of the relationship of those types really needs to be
looked at more carefully. We've got this schizophrenic idea that they
sometimes are the same type and sometimes are not. ISTM that either
they are the same type (and having a bit within the data is reasonable)
or they are distinct types (in which case the bit within the data should
be redundant). I'm not sure which is better.

What about doing both? ;-)

We could create a few wrapper functions that call the functions that are
there right now. That way there is no need to duplicate the code with the
actual functionality. The outside world sees different types and the
function can distinguish between both if it needs to.

Joachim

#4Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#2)
1 attachment(s)
Re: [HACKERS] CIDR/INET improvements

I looked into this, and it seems the easiest solution is to just call
network() if a cidr-cast value is output and the value is actually an
inet value internally.

Patch for testing attached. Passes regression tests. By affecting only
the output you can internally cast back and forth and only output is
affected. However, if you load in a dump that was interally inet but
was dumped out as cidr-cast, you lose the unmasked bits.

---------------------------------------------------------------------------

Tom Lane wrote:

Joachim Wieland <joe@mcknight.de> writes:

Actually both types are not binary compatible, since they have a
type component that is either 0 or 1, depending on whether it is of type
INET or CIDR.

The whole question of the relationship of those types really needs to be
looked at more carefully. We've got this schizophrenic idea that they
sometimes are the same type and sometimes are not. ISTM that either
they are the same type (and having a bit within the data is reasonable)
or they are distinct types (in which case the bit within the data should
be redundant). I'm not sure which is better.

I think the reason why things are as they are right now is to avoid
needing a pile of redundant-seeming pg_proc entries, eg you'd need
both abbrev(inet) and abbrev(cidr) if you were taking a hard line
about them being different types.

You can *not* just throw in a cast that removes the bit without breaking
many of those functions for the CIDR case. For instance abbrev behaves
differently depending on the state of the bit:

regression=# select abbrev(cidr '10.1.0.0/16');
abbrev
---------
10.1/16
(1 row)

regression=# select abbrev(inet '10.1.0.0/16');
abbrev
-------------
10.1.0.0/16
(1 row)

What about functions to get/set a specific byte, for example:

I would vote against adding any such thing in the absence of any strong
demand for it. I think functions that expose the underlying data just
encourage people to write IPv4-only code. If you can't define and use
the function in a way that handles both IPv4 and IPv6, you probably
shouldn't have it.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Attachments:

/pgpatches/inettext/plainDownload
Index: src/backend/utils/adt/network.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/adt/network.c,v
retrieving revision 1.60
diff -c -c -r1.60 network.c
*** src/backend/utils/adt/network.c	23 Jan 2006 21:49:39 -0000	1.60
--- src/backend/utils/adt/network.c	24 Jan 2006 04:05:52 -0000
***************
*** 166,171 ****
--- 166,181 ----
  Datum
  cidr_out(PG_FUNCTION_ARGS)
  {
+ 	inet	   *src = PG_GETARG_INET_P(0);
+ 
+ 	/* If this is an INET, zero any unmasked bits */
+ 	if (!ip_is_cidr(src))
+ 	{
+ 		Datum		src2;
+ 	
+ 		src2 = DirectFunctionCall1(network_network, PG_GETARG_DATUM(0));
+ 		fcinfo->arg[0] = src2;
+ 	}
  	return inet_out(fcinfo);
  }
  
#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#4)
Re: [HACKERS] CIDR/INET improvements

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Patch for testing attached.

This is an utterly bad idea, because it not only doesn't address the
problem (ie, confusion about whether inet and cidr are distinct types
or not), but it masks mistakes in that realm by hiding data on output.
It'll be almost impossible to debug situations where x is different
from y but they display the same.

regards, tom lane

#6Joachim Wieland
joe@mcknight.de
In reply to: Tom Lane (#5)
1 attachment(s)
Re: [HACKERS] CIDR/INET improvements

On Mon, Jan 23, 2006 at 11:30:58PM -0500, Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Patch for testing attached.

This is an utterly bad idea, because it not only doesn't address the
problem (ie, confusion about whether inet and cidr are distinct types
or not), but it masks mistakes in that realm by hiding data on output.
It'll be almost impossible to debug situations where x is different
from y but they display the same.

FWIW, I append the patch I've done a few weeks ago. It adds an inettocidr
cast function.
I updated it to comply to Bruce's recent "ip_type" -> "ip_is_cidr" change.

Joachim

--
Joachim Wieland joe@mcknight.de
C/ Usandizaga 12 1°B ICQ: 37225940
20002 Donostia / San Sebastian (Spain) GPG key available

Attachments:

pg_inettocidr.3.difftext/plain; charset=us-asciiDownload
diff -cr cvs/pgsql/src/backend/utils/adt/network.c cvs.build/pgsql/src/backend/utils/adt/network.c
*** cvs/pgsql/src/backend/utils/adt/network.c	2006-01-23 23:52:36.000000000 +0100
--- cvs.build/pgsql/src/backend/utils/adt/network.c	2006-01-24 09:53:37.000000000 +0100
***************
*** 325,330 ****
--- 325,387 ----
  	PG_RETURN_INET_P(dst);
  }
  
+ Datum
+ inettocidr(PG_FUNCTION_ARGS)
+ {
+ 	inet	   *src = PG_GETARG_INET_P(0);
+ 	inet	   *dst;
+ 	int			byte;
+ 	int			nbits;
+ 	int			maxbits;
+ 	int			maxbytes;
+ 	unsigned char mask;
+ 
+ 	/* make sure any unused bits are zeroed */
+ 	dst = (inet *) palloc0(VARHDRSZ + sizeof(inet_struct));
+ 
+ 	if (ip_family(src) == PGSQL_AF_INET)
+ 	{
+ 		maxbits = 32;
+ 		maxbytes = 4;
+ 	}
+ 	else
+ 	{
+ 		maxbits = 128;
+ 		maxbytes = 16;
+ 	}
+ 	Assert(ip_bits(dst) <= maxbits);
+ 
+ 	/* copy over */
+ 	ip_family(dst) = ip_family(src);
+ 	ip_bits(dst) = ip_bits(src);
+ 	ip_is_cidr(dst) = 1;  /* 1 represents CIDR */
+ 	memcpy(ip_addr(dst), ip_addr(src), maxbytes);
+ 
+ 	/* zero out the bits that are covered by the netmask */
+ 	if (ip_bits(dst) < maxbits)
+ 	{
+ 		byte = ip_bits(dst) / 8;
+ 		nbits = ip_bits(dst) % 8;
+ 		/* reset the first byte, this might be a partial byte */
+ 		mask = 0xff;
+ 		if (ip_bits(dst) != 0)
+ 		{
+ 			mask >>= nbits;
+ 			ip_addr(dst)[byte] &= ~mask;
+ 			byte++;
+ 		}
+ 
+ 		/* from now on we reset only complete bytes */
+ 		while (byte < maxbytes)
+ 		{
+ 			ip_addr(dst)[byte] = 0;
+ 			byte++;
+ 		}
+ 	}
+ 
+ 	PG_RETURN_INET_P(dst);
+ }
+ 
  /*
   *	Basic comparison function for sorting and inet/cidr comparisons.
   *
diff -cr cvs/pgsql/src/include/catalog/pg_cast.h cvs.build/pgsql/src/include/catalog/pg_cast.h
*** cvs/pgsql/src/include/catalog/pg_cast.h	2005-10-21 17:45:06.000000000 +0200
--- cvs.build/pgsql/src/include/catalog/pg_cast.h	2006-01-24 09:38:15.000000000 +0100
***************
*** 249,255 ****
   * INET category
   */
  DATA(insert (  650	869    0 i ));
! DATA(insert (  869	650    0 i ));
  
  /*
   * BitString category
--- 249,255 ----
   * INET category
   */
  DATA(insert (  650	869    0 i ));
! DATA(insert (  869	650 1265 a ));
  
  /*
   * BitString category
diff -cr cvs/pgsql/src/include/catalog/pg_proc.h cvs.build/pgsql/src/include/catalog/pg_proc.h
*** cvs/pgsql/src/include/catalog/pg_proc.h	2006-01-20 13:52:12.000000000 +0100
--- cvs.build/pgsql/src/include/catalog/pg_proc.h	2006-01-24 09:38:15.000000000 +0100
***************
*** 2359,2364 ****
--- 2359,2366 ----
  DESCR("I/O");
  DATA(insert OID = 911 (  inet_out			PGNSP PGUID 12 f f t f i 1 2275 "869" _null_ _null_ _null_	inet_out - _null_ ));
  DESCR("I/O");
+ DATA(insert OID = 1265 (  inettocidr			PGNSP PGUID 12 f f t f i 1 650 "869" _null_ _null_ _null_	inettocidr - _null_ ));
+ DESCR("convert inet to cidr");
  
  /* for cidr type support */
  DATA(insert OID = 1267 (  cidr_in			PGNSP PGUID 12 f f t f i 1 650 "2275" _null_ _null_ _null_	cidr_in - _null_ ));
diff -cr cvs/pgsql/src/include/utils/builtins.h cvs.build/pgsql/src/include/utils/builtins.h
*** cvs/pgsql/src/include/utils/builtins.h	2006-01-20 13:52:13.000000000 +0100
--- cvs.build/pgsql/src/include/utils/builtins.h	2006-01-24 09:38:15.000000000 +0100
***************
*** 696,701 ****
--- 696,702 ----
  extern Datum inet_out(PG_FUNCTION_ARGS);
  extern Datum inet_recv(PG_FUNCTION_ARGS);
  extern Datum inet_send(PG_FUNCTION_ARGS);
+ extern Datum inettocidr(PG_FUNCTION_ARGS);
  extern Datum cidr_in(PG_FUNCTION_ARGS);
  extern Datum cidr_out(PG_FUNCTION_ARGS);
  extern Datum cidr_recv(PG_FUNCTION_ARGS);
diff -cr cvs/pgsql/src/test/regress/expected/inet.out cvs.build/pgsql/src/test/regress/expected/inet.out
*** cvs/pgsql/src/test/regress/expected/inet.out	2004-10-08 03:45:37.000000000 +0200
--- cvs.build/pgsql/src/test/regress/expected/inet.out	2006-01-24 09:38:15.000000000 +0100
***************
*** 242,244 ****
--- 242,289 ----
  
  SET enable_seqscan TO on;
  DROP INDEX inet_idx1;
+ -- check conversion from inet to cidr (this should zero the unmasked bits)
+ -- no change expected
+ SELECT '192.168.255.255/32'::inet::cidr;
+         cidr        
+ --------------------
+  192.168.255.255/32
+ (1 row)
+ 
+ -- should give 192.168.240.0/20.
+ SELECT '192.168.255.255/20'::inet::cidr;
+        cidr       
+ ------------------
+  192.168.240.0/20
+ (1 row)
+ 
+ -- cast every entry from the table
+ SELECT i, i::cidr FROM INET_TBL;
+         i         |        i         
+ ------------------+------------------
+  192.168.1.226/24 | 192.168.1.0/24
+  192.168.1.226    | 192.168.1.226/32
+  192.168.1.0/24   | 192.168.1.0/24
+  192.168.1.0/25   | 192.168.1.0/25
+  192.168.1.255/24 | 192.168.1.0/24
+  192.168.1.255/25 | 192.168.1.128/25
+  10.1.2.3/8       | 10.0.0.0/8
+  10.1.2.3/8       | 10.0.0.0/8
+  10.1.2.3         | 10.1.2.3/32
+  10.1.2.3/24      | 10.1.2.0/24
+  10.1.2.3/16      | 10.1.0.0/16
+  10.1.2.3/8       | 10.0.0.0/8
+  11.1.2.3/8       | 11.0.0.0/8
+  9.1.2.3/8        | 9.0.0.0/8
+  10:23::f1/64     | 10:23::/64
+  10:23::ffff      | 10:23::ffff/128
+  ::4.3.2.1/24     | ::/24
+ (17 rows)
+ 
+ -- this lost the /32 once...
+ SELECT '192.168.1.1'::inet::cidr;
+       cidr      
+ ----------------
+  192.168.1.1/32
+ (1 row)
+ 
diff -cr cvs/pgsql/src/test/regress/expected/opr_sanity.out cvs.build/pgsql/src/test/regress/expected/opr_sanity.out
*** cvs/pgsql/src/test/regress/expected/opr_sanity.out	2005-11-07 18:36:47.000000000 +0100
--- cvs.build/pgsql/src/test/regress/expected/opr_sanity.out	2006-01-24 09:38:15.000000000 +0100
***************
*** 274,279 ****
--- 274,283 ----
  -- also binary compatible.  This is legal, but usually not intended.
  -- As of 7.4, this finds the casts from text and varchar to bpchar, because
  -- those are binary-compatible while the reverse way goes through rtrim().
+ -- As of 8.2, this finds the cast from CIDR to INET. Both share the same
+ -- internal representation, but CIDR doesn't allow the unmasked bits to be
+ -- non-zero. That's why <INET>::CIDR will be cast by means of a function, the
+ -- other way round is regarded to be a legal binary compatible cast.
  SELECT *
  FROM pg_cast c
  WHERE c.castfunc = 0 AND
***************
*** 285,291 ****
  ------------+------------+----------+-------------
           25 |       1042 |        0 | i
         1043 |       1042 |        0 | i
! (2 rows)
  
  -- **************** pg_operator ****************
  -- Look for illegal values in pg_operator fields.
--- 289,296 ----
  ------------+------------+----------+-------------
           25 |       1042 |        0 | i
         1043 |       1042 |        0 | i
!         650 |        869 |        0 | i
! (3 rows)
  
  -- **************** pg_operator ****************
  -- Look for illegal values in pg_operator fields.
diff -cr cvs/pgsql/src/test/regress/sql/inet.sql cvs.build/pgsql/src/test/regress/sql/inet.sql
*** cvs/pgsql/src/test/regress/sql/inet.sql	2004-10-08 03:45:37.000000000 +0200
--- cvs.build/pgsql/src/test/regress/sql/inet.sql	2006-01-24 09:38:15.000000000 +0100
***************
*** 65,67 ****
--- 65,78 ----
  SET enable_seqscan TO on;
  DROP INDEX inet_idx1;
  
+ -- check conversion from inet to cidr (this should zero the unmasked bits)
+ -- no change expected
+ SELECT '192.168.255.255/32'::inet::cidr;
+ -- should give 192.168.240.0/20.
+ SELECT '192.168.255.255/20'::inet::cidr;
+ -- cast every entry from the table
+ SELECT i, i::cidr FROM INET_TBL;
+ 
+ -- this lost the /32 once...
+ SELECT '192.168.1.1'::inet::cidr;
+ 
diff -cr cvs/pgsql/src/test/regress/sql/opr_sanity.sql cvs.build/pgsql/src/test/regress/sql/opr_sanity.sql
*** cvs/pgsql/src/test/regress/sql/opr_sanity.sql	2005-07-01 21:19:05.000000000 +0200
--- cvs.build/pgsql/src/test/regress/sql/opr_sanity.sql	2006-01-24 09:38:15.000000000 +0100
***************
*** 223,228 ****
--- 223,233 ----
  -- As of 7.4, this finds the casts from text and varchar to bpchar, because
  -- those are binary-compatible while the reverse way goes through rtrim().
  
+ -- As of 8.2, this finds the cast from CIDR to INET. Both share the same
+ -- internal representation, but CIDR doesn't allow the unmasked bits to be
+ -- non-zero. That's why <INET>::CIDR will be cast by means of a function, the
+ -- other way round is regarded to be a legal binary compatible cast.
+ 
  SELECT *
  FROM pg_cast c
  WHERE c.castfunc = 0 AND
#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joachim Wieland (#6)
Re: [HACKERS] CIDR/INET improvements

Joachim Wieland <joe@mcknight.de> writes:

FWIW, I append the patch I've done a few weeks ago. It adds an inettocidr
cast function.

I think we need to take two steps back and look at the larger picture:
the INET/CIDR situation is conceptually a mess and it's going to take
more than a localized change to clean it up.

I have some ideas about this and will try to post a proposal on -hackers
later today.

regards, tom lane

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joachim Wieland (#6)
Re: [HACKERS] CIDR/INET improvements

Joachim Wieland <joe@mcknight.de> writes:

FWIW, I append the patch I've done a few weeks ago. It adds an inettocidr
cast function.

I've incorporated this code into the INET cleanup patch just committed.
Thanks!

regards, tom lane