Tightening binary receive functions

Started by Heikki Linnakangasover 16 years ago13 messages

heikki.linnakangas@enterprisedb.com

over 16 years ago

1 attachment(s)

After the thread started by Andrew McNamara a while ago:

http://archives.postgresql.org/message-id/E419F08D-B908-446D-9B1E-F3520163CE9C@object-craft.com.au

the question was left hanging if other binary recv functions are missing
sanity checks that the corresponding text input functions have, and
whether they might pose a security issue. Tom Lane went through all the
recv functions after that, and found a few missing checks but nothing
that would present a security issue, but posted them to the
pgsql-security list for a double check. I just went through them again,
and found no security issues either.

We should nevertheless fix the recv functions so that they don't accept
any values that the corresponding text functions reject. While the
server itself handles them, such values will throw an error on
pg_dump/restore, for example. Attached patch adds the required sanity
checks. I'm thinking of applying this to CVS HEAD, but not back-patch,
just like Tom did with the original problem reported with time_recv()
and timetz_recv().

The most notable of these is the change to "char" datatype. The patch
tightens it so that it no longer accepts values >127 with a multi-byte
database encoding. Also, the handling of encoding in xml_recv() was bogus.

Here's the list of issues found:

Tom Lane wrote:

array_recv:

Allows zero-length dimensions (because ArrayGetNItems doesn't complain);
whereas array_in does not. Not sure if this is an issue. We have had
-hackers discussions about the behavior of zero-length arrays, so I
think there may be other ways to get them into the system anyway.

Doesn't check lower bounds at all, so it's possible that lowerbound plus
length overflows an int. Not sure if this is a security problem either,
but it seems like something that should be rejected.

Yes, that seems dangerous.

I note that the handling of large bounds isn't very rigid in the text
input function either:

postgres=# SELECT '[9999999999999999:9999999999999] = {1}'::integer[];
int4
-----------------------------
[2147483647:2147483647]={1}
(1 row)

We use plain atoi() to convert the subscripts to integers, which returns
INT_MAX for an overflow. I didn't do anything about that now, but the
patch adds a check for the upper bound overflow in array_recv().

charrecv

Doesn't bother its pretty little head with maintaining encoding validity.
Of course the entire datatype doesn't, so this is hardly the fault of
the recv routine in particular. It might be that the type is fine but we
ought to constrain char_text() to fail on high-bit-set char values unless
DB encoding is single-byte.

Constraining char_text() seems like a good idea. The current behavior of
char_text() with a high-bit-set byte is not useful, while using the full
range of char can be. However, we have to constrain charout() as well,
or we're back to square one with charout(c)::text. And if we constrain
charout(), then we should constrain charin() as well. Which brings us
back to forbidding such values altogether.

The patch constrains all the functions that you can use to get a "char"
into the system: charin(), charrecv(), i4tochar() and text_char(). They
now reject values with high bit set when using a multi-byte database
encoding

date_recv

Fails to do any range check, so the value might cause odd behavior later.
Should probably limit to the values date_in would accept.

Yep.

float4recv, float8recv, and geometric and other types depending on
pq_getmsgfloatN

These all accept any bit pattern whatever for a float. Now I know of no
machine where float doesn't consider all bitpatterns "valid", but
nonetheless there are some issues here:

* We don't currently allow any other way to inject an IEEE signaling NaN
into the database. As far as I can think, this could only lead to float
traps in places where one might perhaps not have expected one, so I
don't think this is a real problem.

Agreed. This is frankly the first time I even hear about signaling NaNs,
and after reading up on that a bit, I get the impression that even on
platforms that support them, you need a #define or a compiler flag to
enable them.

* Some of these types probably aren't expecting NaNs or Infinities at all.
Can anything really bad happen if they get one?

Geometric types seem to handle NaNs and Infs gracefully, although I
wonder what it means to have e.g a box with the X-coordinate of one
corner being NaN. I think it's ok from a security point of view.

timestamp_recv (with float timestamps) accepts NaN, while timestamp_in
does not. I'm not sure what happens if you pass a NaN timestamp to the
system, but we should forbid that anyway.

interval_recv

Fails to do any range check, but I think it's okay since we don't have any
a-priori restrictions on the range of intervals.

Should disallow Infs and NaNs.

numeric_recv

Allows any weight or dscale value. Can this cause problems?

It seems OK to me. make_result() normalizes and checks for overflow of
those.

tintervalrecv

Bogus --- fails to verify consistency of status with values or proper
ordering of the two values. Probably nothing very bad can happen, but
should be fixed.

tintervalin doesn't check the ordering of the two values either. Status
should indeed be checked.

xml_recv

Encoding handling seems pretty questionable. In particular I think it's
going to treat a document without explicit encoding marking as being in
the database encoding --- shouldn't it assume client encoding?
Converting the encoding twice seems pretty icky as well.

Fixed this so that the input is treated as UTF-8 if no encoding is
specified in the XML header. client_encoding does not affect xml_recv()
at all. Per Peter's description.

Here's two extra ones:

oidvectorrecv
int2vectorrecv

Don't constrain the number of elements to FUNC_MAX_ARGS, like the text
input functions do. They also don't force lbound to 0, like the input
function. We assume in array_ref() and probably elsewhere too that
fixed-size array types are 0-based.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachments:

tighten-recv-funcs-1.patchtext/x-diff; name=tighten-recv-funcs-1.patchDownload

*** a/src/backend/utils/adt/arrayfuncs.c
--- b/src/backend/utils/adt/arrayfuncs.c
***************
*** 1207,1214 **** array_recv(PG_FUNCTION_ARGS)
--- 1207,1223 ----
  
  	for (i = 0; i < ndim; i++)
  	{
+ 		int ub;
+ 
  		dim[i] = pq_getmsgint(buf, 4);
  		lBound[i] = pq_getmsgint(buf, 4);
+ 
+ 		ub = lBound[i] + dim[i] - 1;
+ 		/* overflow? */
+ 		if (lBound[i] > ub)
+ 			ereport(ERROR,
+ 					(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ 					 errmsg("integer out of range")));
  	}
  
  	/* This checks for overflow of array dimensions */
*** a/src/backend/utils/adt/char.c
--- b/src/backend/utils/adt/char.c
***************
*** 18,23 ****
--- 18,24 ----
  #include <limits.h>
  
  #include "libpq/pqformat.h"
+ #include "mb/pg_wchar.h"
  #include "utils/builtins.h"
  
  /*****************************************************************************
***************
*** 34,39 **** charin(PG_FUNCTION_ARGS)
--- 35,52 ----
  {
  	char	   *ch = PG_GETARG_CSTRING(0);
  
+ 	/*
+ 	 * Only allow characters that fit in one byte. The "char" data type
+ 	 * could handle any bit pattern, but it's not clear what charout() would
+ 	 * return for such a character, so it's better to stop such values from
+ 	 * entering the database in the first place.
+ 	 */
+ 	if (pg_database_encoding_max_length() != 1 && IS_HIGHBIT_SET(ch[0]))
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ 				 errmsg("\"char\" out of range"),
+ 				 errdetail("Character must be representable as a single byte in current database encoding")));
+ 
  	PG_RETURN_CHAR(ch[0]);
  }
  
***************
*** 66,73 **** Datum
  charrecv(PG_FUNCTION_ARGS)
  {
  	StringInfo	buf = (StringInfo) PG_GETARG_POINTER(0);
  
! 	PG_RETURN_CHAR(pq_getmsgbyte(buf));
  }
  
  /*
--- 79,96 ----
  charrecv(PG_FUNCTION_ARGS)
  {
  	StringInfo	buf = (StringInfo) PG_GETARG_POINTER(0);
+ 	char ch;
+ 
+ 	ch = pq_getmsgbyte(buf);
  
! 	/* Only allow characters that fit in one byte, see charin(). */
! 	if (pg_database_encoding_max_length() != 1 && IS_HIGHBIT_SET(ch))
! 		ereport(ERROR,
! 				(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
! 				 errmsg("\"char\" out of range"),
! 				 errdetail("Character must be representable as a single byte in current database encoding")));
! 
! 	PG_RETURN_CHAR(ch);
  }
  
  /*
***************
*** 162,174 **** Datum
  i4tochar(PG_FUNCTION_ARGS)
  {
  	int32		arg1 = PG_GETARG_INT32(0);
  
  	if (arg1 < SCHAR_MIN || arg1 > SCHAR_MAX)
  		ereport(ERROR,
  				(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
  				 errmsg("\"char\" out of range")));
  
! 	PG_RETURN_CHAR((int8) arg1);
  }
  
  
--- 185,206 ----
  i4tochar(PG_FUNCTION_ARGS)
  {
  	int32		arg1 = PG_GETARG_INT32(0);
+ 	int8		ch;
  
  	if (arg1 < SCHAR_MIN || arg1 > SCHAR_MAX)
  		ereport(ERROR,
  				(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
  				 errmsg("\"char\" out of range")));
+ 	ch = (int8) arg1;
+ 
+ 	/* Only allow characters that fit in one byte, see charin(). */
+ 	if (pg_database_encoding_max_length() != 1 && IS_HIGHBIT_SET(ch))
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ 				 errmsg("\"char\" out of range"),
+ 				 errdetail("Character must be representable as a single byte in current database encoding")));
  
! 	PG_RETURN_CHAR(ch);
  }
  
  
***************
*** 184,190 **** text_char(PG_FUNCTION_ARGS)
--- 216,231 ----
  	 * discarded.
  	 */
  	if (VARSIZE(arg1) > VARHDRSZ)
+ 	{
  		result = *(VARDATA(arg1));
+ 
+ 		/* Only allow characters that fit in one byte, see charin(). */
+ 		if (pg_database_encoding_max_length() != 1 && IS_HIGHBIT_SET(result))
+ 			ereport(ERROR,
+ 					(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ 					 errmsg("\"char\" out of range"),
+ 					 errdetail("Character must be representable as a single byte in current database encoding")));
+ 	}
  	else
  		result = '\0';
  
*** a/src/backend/utils/adt/date.c
--- b/src/backend/utils/adt/date.c
***************
*** 203,210 **** Datum
  date_recv(PG_FUNCTION_ARGS)
  {
  	StringInfo	buf = (StringInfo) PG_GETARG_POINTER(0);
  
! 	PG_RETURN_DATEADT((DateADT) pq_getmsgint(buf, sizeof(DateADT)));
  }
  
  /*
--- 203,219 ----
  date_recv(PG_FUNCTION_ARGS)
  {
  	StringInfo	buf = (StringInfo) PG_GETARG_POINTER(0);
+ 	DateADT result;
  
! 	result = (DateADT) pq_getmsgint(buf, sizeof(DateADT));
! 
! 	/* Limit to the same range that date_in() accepts. */
! 	if (result < 0 || result > JULIAN_MAX)
! 		ereport(ERROR,
! 				(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
! 				 errmsg("date out of range")));
! 
! 	PG_RETURN_DATEADT(result);
  }
  
  /*
*** a/src/backend/utils/adt/int.c
--- b/src/backend/utils/adt/int.c
***************
*** 225,237 **** int2vectorrecv(PG_FUNCTION_ARGS)
  
  	Assert(!locfcinfo.isnull);
  
! 	/* sanity checks: int2vector must be 1-D, no nulls */
  	if (ARR_NDIM(result) != 1 ||
  		ARR_HASNULL(result) ||
! 		ARR_ELEMTYPE(result) != INT2OID)
  		ereport(ERROR,
  				(errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
  				 errmsg("invalid int2vector data")));
  	PG_RETURN_POINTER(result);
  }
  
--- 225,245 ----
  
  	Assert(!locfcinfo.isnull);
  
! 	/* sanity checks: int2vector must be 1-D, 0-based, no nulls */
  	if (ARR_NDIM(result) != 1 ||
  		ARR_HASNULL(result) ||
! 		ARR_ELEMTYPE(result) != INT2OID ||
! 		ARR_LBOUND(result)[0] != 0)
  		ereport(ERROR,
  				(errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
  				 errmsg("invalid int2vector data")));
+ 
+ 	/* check length for consistency with int2vectorin() */
+ 	if (ARR_DIMS(result)[0] > FUNC_MAX_ARGS)
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 				 errmsg("oidvector has too many elements")));
+ 
  	PG_RETURN_POINTER(result);
  }
  
*** a/src/backend/utils/adt/nabstime.c
--- b/src/backend/utils/adt/nabstime.c
***************
*** 786,805 **** tintervalrecv(PG_FUNCTION_ARGS)
  {
  	StringInfo	buf = (StringInfo) PG_GETARG_POINTER(0);
  	TimeInterval tinterval;
  
  	tinterval = (TimeInterval) palloc(sizeof(TimeIntervalData));
  
  	tinterval->status = pq_getmsgint(buf, sizeof(tinterval->status));
  
! 	if (!(tinterval->status == T_INTERVAL_INVAL ||
! 		  tinterval->status == T_INTERVAL_VALID))
  		ereport(ERROR,
  				(errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
  				 errmsg("invalid status in external \"tinterval\" value")));
  
- 	tinterval->data[0] = pq_getmsgint(buf, sizeof(tinterval->data[0]));
- 	tinterval->data[1] = pq_getmsgint(buf, sizeof(tinterval->data[1]));
- 
  	PG_RETURN_TIMEINTERVAL(tinterval);
  }
  
--- 786,810 ----
  {
  	StringInfo	buf = (StringInfo) PG_GETARG_POINTER(0);
  	TimeInterval tinterval;
+ 	int32 status;
  
  	tinterval = (TimeInterval) palloc(sizeof(TimeIntervalData));
  
  	tinterval->status = pq_getmsgint(buf, sizeof(tinterval->status));
+ 	tinterval->data[0] = pq_getmsgint(buf, sizeof(tinterval->data[0]));
+ 	tinterval->data[1] = pq_getmsgint(buf, sizeof(tinterval->data[1]));
+ 
+ 	if (tinterval->data[0] == INVALID_ABSTIME ||
+ 		tinterval->data[1] == INVALID_ABSTIME)
+ 		status = T_INTERVAL_INVAL;	/* undefined  */
+ 	else
+ 		status = T_INTERVAL_VALID;
  
! 	if (status != tinterval->status)
  		ereport(ERROR,
  				(errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
  				 errmsg("invalid status in external \"tinterval\" value")));
  
  	PG_RETURN_TIMEINTERVAL(tinterval);
  }
  
*** a/src/backend/utils/adt/oid.c
--- b/src/backend/utils/adt/oid.c
***************
*** 276,288 **** oidvectorrecv(PG_FUNCTION_ARGS)
  
  	Assert(!locfcinfo.isnull);
  
! 	/* sanity checks: oidvector must be 1-D, no nulls */
  	if (ARR_NDIM(result) != 1 ||
  		ARR_HASNULL(result) ||
! 		ARR_ELEMTYPE(result) != OIDOID)
  		ereport(ERROR,
  				(errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
  				 errmsg("invalid oidvector data")));
  	PG_RETURN_POINTER(result);
  }
  
--- 276,296 ----
  
  	Assert(!locfcinfo.isnull);
  
! 	/* sanity checks: oidvector must be 1-D, 0-based, no nulls */
  	if (ARR_NDIM(result) != 1 ||
  		ARR_HASNULL(result) ||
! 		ARR_ELEMTYPE(result) != OIDOID ||
! 		ARR_LBOUND(result)[0] != 0)
  		ereport(ERROR,
  				(errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
  				 errmsg("invalid oidvector data")));
+ 
+ 	/* check length for consistency with oidvectorin() */
+ 	if (ARR_DIMS(result)[0] > FUNC_MAX_ARGS)
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 				 errmsg("oidvector has too many elements")));
+ 
  	PG_RETURN_POINTER(result);
  }
  
*** a/src/backend/utils/adt/timestamp.c
--- b/src/backend/utils/adt/timestamp.c
***************
*** 253,258 **** timestamp_recv(PG_FUNCTION_ARGS)
--- 253,263 ----
  	timestamp = (Timestamp) pq_getmsgint64(buf);
  #else
  	timestamp = (Timestamp) pq_getmsgfloat8(buf);
+ 
+ 	if (isnan(timestamp))
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
+ 				 errmsg("timestamp cannot be NaN")));
  #endif
  
  	/* rangecheck: see if timestamp_out would like it */
*** a/src/backend/utils/adt/xml.c
--- b/src/backend/utils/adt/xml.c
***************
*** 109,115 **** static int parse_xml_decl(const xmlChar *str, size_t *lenp,
  static bool print_xml_decl(StringInfo buf, const xmlChar *version,
  			   pg_enc encoding, int standalone);
  static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg,
! 		  bool preserve_whitespace, xmlChar *encoding);
  static text *xml_xmlnodetoxmltype(xmlNodePtr cur);
  #endif   /* USE_LIBXML */
  
--- 109,115 ----
  static bool print_xml_decl(StringInfo buf, const xmlChar *version,
  			   pg_enc encoding, int standalone);
  static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg,
! 		  bool preserve_whitespace, int encoding);
  static text *xml_xmlnodetoxmltype(xmlNodePtr cur);
  #endif   /* USE_LIBXML */
  
***************
*** 183,189 **** xml_in(PG_FUNCTION_ARGS)
  	 * Parse the data to check if it is well-formed XML data.  Assume that
  	 * ERROR occurred if parsing failed.
  	 */
! 	doc = xml_parse(vardata, xmloption, true, NULL);
  	xmlFreeDoc(doc);
  
  	PG_RETURN_XML_P(vardata);
--- 183,189 ----
  	 * Parse the data to check if it is well-formed XML data.  Assume that
  	 * ERROR occurred if parsing failed.
  	 */
! 	doc = xml_parse(vardata, xmloption, true, GetDatabaseEncoding());
  	xmlFreeDoc(doc);
  
  	PG_RETURN_XML_P(vardata);
***************
*** 272,278 **** xml_recv(PG_FUNCTION_ARGS)
  	char	   *newstr;
  	int			nbytes;
  	xmlDocPtr	doc;
! 	xmlChar    *encoding = NULL;
  
  	/*
  	 * Read the data in raw format. We don't know yet what the encoding is, as
--- 272,279 ----
  	char	   *newstr;
  	int			nbytes;
  	xmlDocPtr	doc;
! 	xmlChar    *encodingStr = NULL;
! 	int			encoding;
  
  	/*
  	 * Read the data in raw format. We don't know yet what the encoding is, as
***************
*** 293,299 **** xml_recv(PG_FUNCTION_ARGS)
  	str = VARDATA(result);
  	str[nbytes] = '\0';
  
! 	parse_xml_decl((xmlChar *) str, NULL, NULL, &encoding, NULL);
  
  	/*
  	 * Parse the data to check if it is well-formed XML data.  Assume that
--- 294,308 ----
  	str = VARDATA(result);
  	str[nbytes] = '\0';
  
! 	parse_xml_decl((xmlChar *) str, NULL, NULL, &encodingStr, NULL);
! 
! 	/*
! 	 * If encoding was explicitly specified in the XML header, treat it as
! 	 * UTF-8, as that's the default in XML. This is different from xml_in(),
! 	 * where the input has to go through the normal client to server encoding
! 	 * conversion.
! 	 */
! 	encoding = encodingStr ? xmlChar_to_encoding(encodingStr) : PG_UTF8;
  
  	/*
  	 * Parse the data to check if it is well-formed XML data.  Assume that
***************
*** 305,313 **** xml_recv(PG_FUNCTION_ARGS)
  	/* Now that we know what we're dealing with, convert to server encoding */
  	newstr = (char *) pg_do_encoding_conversion((unsigned char *) str,
  												nbytes,
! 												encoding ?
! 											  xmlChar_to_encoding(encoding) :
! 												PG_UTF8,
  												GetDatabaseEncoding());
  
  	if (newstr != str)
--- 314,320 ----
  	/* Now that we know what we're dealing with, convert to server encoding */
  	newstr = (char *) pg_do_encoding_conversion((unsigned char *) str,
  												nbytes,
! 												encoding,
  												GetDatabaseEncoding());
  
  	if (newstr != str)
***************
*** 659,665 **** xmlparse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace)
  #ifdef USE_LIBXML
  	xmlDocPtr	doc;
  
! 	doc = xml_parse(data, xmloption_arg, preserve_whitespace, NULL);
  	xmlFreeDoc(doc);
  
  	return (xmltype *) data;
--- 666,673 ----
  #ifdef USE_LIBXML
  	xmlDocPtr	doc;
  
! 	doc = xml_parse(data, xmloption_arg, preserve_whitespace,
! 					GetDatabaseEncoding());
  	xmlFreeDoc(doc);
  
  	return (xmltype *) data;
***************
*** 799,805 **** xml_is_document(xmltype *arg)
  	/* We want to catch ereport(INVALID_XML_DOCUMENT) and return false */
  	PG_TRY();
  	{
! 		doc = xml_parse((text *) arg, XMLOPTION_DOCUMENT, true, NULL);
  		result = true;
  	}
  	PG_CATCH();
--- 807,814 ----
  	/* We want to catch ereport(INVALID_XML_DOCUMENT) and return false */
  	PG_TRY();
  	{
! 		doc = xml_parse((text *) arg, XMLOPTION_DOCUMENT, true,
! 						GetDatabaseEncoding());
  		result = true;
  	}
  	PG_CATCH();
***************
*** 1152,1158 **** print_xml_decl(StringInfo buf, const xmlChar *version,
   */
  static xmlDocPtr
  xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace,
! 		  xmlChar *encoding)
  {
  	int32		len;
  	xmlChar    *string;
--- 1161,1167 ----
   */
  static xmlDocPtr
  xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace,
! 		  int encoding)
  {
  	int32		len;
  	xmlChar    *string;
***************
*** 1165,1173 **** xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace,
  
  	utf8string = pg_do_encoding_conversion(string,
  										   len,
! 										   encoding ?
! 										   xmlChar_to_encoding(encoding) :
! 										   GetDatabaseEncoding(),
  										   PG_UTF8);
  
  	/* Start up libxml and its parser (no-ops if already done) */
--- 1174,1180 ----
  
  	utf8string = pg_do_encoding_conversion(string,
  										   len,
! 										   encoding,
  										   PG_UTF8);
  
  	/* Start up libxml and its parser (no-ops if already done) */
*** a/src/include/utils/datetime.h
--- b/src/include/utils/datetime.h
***************
*** 262,267 **** extern const int day_tab[2][13];
--- 262,269 ----
    || (((m) == JULIAN_MINMONTH) && ((d) >= JULIAN_MINDAY))))) \
   && ((y) < JULIAN_MAXYEAR))
  
+ #define JULIAN_MAX (2145031948) /* == date2j(JULIAN_MAXYEAR, 1 ,1) */
+ 
  /* Julian-date equivalents of Day 0 in Unix and Postgres reckoning */
  #define UNIX_EPOCH_JDATE		2440588 /* == date2j(1970, 1, 1) */
  #define POSTGRES_EPOCH_JDATE	2451545 /* == date2j(2000, 1, 1) */

Greg Stark

gsstark@mit.edu

over 16 years ago

In reply to: Heikki Linnakangas (#1)

Re: Tightening binary receive functions

On Mon, Aug 31, 2009 at 9:12 AM, Heikki
Linnakangas<heikki.linnakangas@enterprisedb.com> wrote:

The most notable of these is the change to "char" datatype. The patch
tightens it so that it no longer accepts values >127 with a multi-byte
database encoding.

That doesn't sound right to me. We accept casts from integer to "char"
for all values in range (-128..127). The question should be what the
text representation should be since the raw bytes aren't valid mb
encodings.

--
greg
http://mit.edu/~gsstark/resume.pdf

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 16 years ago

In reply to: Greg Stark (#2)

Re: Tightening binary receive functions

Greg Stark wrote:

On Mon, Aug 31, 2009 at 9:12 AM, Heikki
Linnakangas<heikki.linnakangas@enterprisedb.com> wrote:

The most notable of these is the change to "char" datatype. The patch
tightens it so that it no longer accepts values >127 with a multi-byte
database encoding.

That doesn't sound right to me. We accept casts from integer to "char"
for all values in range (-128..127).

The patch limits that range to 0..127, with multibyte encodings.

The question should be what the
text representation should be since the raw bytes aren't valid mb
encodings.

Hmm, perhaps we should follow what we did to chr() and ascii(): map the
integer to unicode code points if the database encoding is UTF-8, and
restrict the range to 0..127 for other multi-byte encodings.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Greg Stark

gsstark@mit.edu

over 16 years ago

In reply to: Heikki Linnakangas (#3)

Re: Tightening binary receive functions

On Mon, Aug 31, 2009 at 12:01 PM, Heikki
Linnakangas<heikki.linnakangas@enterprisedb.com> wrote:

Hmm, perhaps we should follow what we did to chr() and ascii(): map the
integer to unicode code points if the database encoding is UTF-8, and
restrict the range to 0..127 for other multi-byte encodings.

I don't think we even have to worry about the database's encoding.
Just make the textual representation of "char" be \xxx (or perhaps we
could switch to \xHH now) if the value isn't a printable ascii
character. As long as "char" reads that in properly it doesn't matter
if it's not a reasonable multibyte character.

That allows people to treat it as a 1-byte integer type which happens
to allow input or output as a single ascii character which is
convenient sometimes.

--
greg
http://mit.edu/~gsstark/resume.pdf

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 16 years ago

In reply to: Greg Stark (#4)

Re: Tightening binary receive functions

Greg Stark wrote:

On Mon, Aug 31, 2009 at 12:01 PM, Heikki
Linnakangas<heikki.linnakangas@enterprisedb.com> wrote:

Hmm, perhaps we should follow what we did to chr() and ascii(): map the
integer to unicode code points if the database encoding is UTF-8, and
restrict the range to 0..127 for other multi-byte encodings.

I don't think we even have to worry about the database's encoding.
Just make the textual representation of "char" be \xxx (or perhaps we
could switch to \xHH now) if the value isn't a printable ascii
character. As long as "char" reads that in properly it doesn't matter
if it's not a reasonable multibyte character.

That allows people to treat it as a 1-byte integer type which happens
to allow input or output as a single ascii character which is
convenient sometimes.

Hmm, seems reasonable. However, it would mean that "\123" would be
interpreted differently in the new version than the old. In previous
versions the extra characters are truncated and the value becomes '\',
whereas the new interpretation would be 'S'.

(I committed the rest of the patch, without the "char" changes)

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Tom Lane

tgl@sss.pgh.pa.us

over 16 years ago

In reply to: Heikki Linnakangas (#5)

Re: Tightening binary receive functions

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

Greg Stark wrote:

Just make the textual representation of "char" be \xxx (or perhaps we
could switch to \xHH now) if the value isn't a printable ascii
character. As long as "char" reads that in properly it doesn't matter
if it's not a reasonable multibyte character.

Hmm, seems reasonable. However, it would mean that "\123" would be
interpreted differently in the new version than the old. In previous
versions the extra characters are truncated and the value becomes '\',
whereas the new interpretation would be 'S'.

I think that Greg's proposal might be the sanest solution. Aside from
avoiding the encoding problem, it would give us a more reasonable text
representation for byte value 0 (ie, '\000' not ''). We could probably
address the compatibility risk sufficiently by having charin throw error
whenever the input is more than one byte long and does NOT have the form
'\nnn'. This would also respond to the discussion of bug #5028 about
how charin ought not silently accept multicharacter input.

regards, tom lane

James Pye

lists@jwp.name

about 16 years ago

In reply to: Heikki Linnakangas (#1)

Re: Tightening binary receive functions

On Aug 31, 2009, at 1:12 AM, Heikki Linnakangas wrote:

...

Is the new date_recv() constraint actually correct?

[looking at the "result < 0" part, at least]

src/backend/utils/adt/date.c
...
+       /* Limit to the same range that date_in() accepts. */
+       if (result < 0 || result > JULIAN_MAX)
+               ereport(ERROR,
+                               (errcode 
(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
+                                errmsg("date out of range")));
+
+       PG_RETURN_DATEADT(result);
  }

postgres=# SELECT date_send('500-01-01'::date);
date_send
------------
\xfff7a3e9
(1 row)

...

struct.unpack("!l", b'\xff\xf7\xa3\xe9')

(-547863,)

Perhaps 'result' needs to be adjusted by the postgres epoch for the
comparison?

Andrew Gierth

andrew@tao11.riddles.org.uk

about 16 years ago

In reply to: James Pye (#7)

Re: Tightening binary receive functions

"James" == James Pye <lists@jwp.name> writes:

James> Is the new date_recv() constraint actually correct?

No, it's not:

regression=# create table x (a date);
CREATE TABLE
regression=# insert into x values ('1999-01-01');
INSERT 0 1
regression=# copy x to '/tmp/tst.dmp' binary;
COPY 1
regression=# copy x from '/tmp/tst.dmp' binary;
ERROR: date out of range

--
Andrew (irc:RhodiumToad)

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

about 16 years ago

In reply to: Andrew Gierth (#8)

Re: Tightening binary receive functions

Andrew Gierth wrote:

"James" == James Pye <lists@jwp.name> writes:

James> Is the new date_recv() constraint actually correct?

No, it's not:

Oops, you're right. The check is indeed confusing julian day numbers,
with epoch at 23th of Nov 4714 BC, with postgres-reckoning day numbers,
with epoch at 1th of Jan 2000. Thanks, will fix.

BTW, I just noticed that to_date() doesn't respect those limits either:

postgres=# create table x (a date);
CREATE TABLE
postgres=# insert into x values (to_date('-4713 11 23', 'YYYY MM DD'));
INSERT 0 1
postgres=# select * from x;
a
---------------
4714-11-23 BC
(1 row)

postgres=# copy x to '/tmp/tst.dmp'; -- text mode
COPY 1
postgres=# copy x from '/tmp/tst.dmp';
ERROR: date out of range: "4714-11-23 BC"
CONTEXT: COPY x, line 1, column a: "4714-11-23 BC"

The date arithmetic operators + and - also allow you to create such
dates. I also note that they don't check for overflow.

I'm thinking that we should fix all that by adding range checks to all
those functions (or maybe just in date2j() and the operators).

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#10

Andrew Gierth

andrew@tao11.riddles.org.uk

about 16 years ago

In reply to: Heikki Linnakangas (#9)

Re: Tightening binary receive functions

"Heikki" == Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

Heikki> Oops, you're right. The check is indeed confusing julian day
Heikki> numbers, with epoch at 23th of Nov 4714 BC, with
Heikki> postgres-reckoning day numbers, with epoch at 1th of Jan
Heikki> 2000. Thanks, will fix.

Which reminds me: why isn't there an extract(jday from ...) function
or similar?

--
Andrew.

#11

Bruce Momjian

bruce@momjian.us

about 16 years ago

In reply to: Andrew Gierth (#10)

Re: Tightening binary receive functions

FYI, Heikki has fixed this bug and the fix will appear in Postgres 8.5.

---------------------------------------------------------------------------

Andrew Gierth wrote:

"Heikki" == Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

Heikki> Oops, you're right. The check is indeed confusing julian day
Heikki> numbers, with epoch at 23th of Nov 4714 BC, with
Heikki> postgres-reckoning day numbers, with epoch at 1th of Jan
Heikki> 2000. Thanks, will fix.

Which reminds me: why isn't there an extract(jday from ...) function
or similar?

--
Andrew.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#12

James William Pye

lists@jwp.name

almost 16 years ago

In reply to: Bruce Momjian (#11)

Re: Tightening binary receive functions

On Nov 10, 2009, at 9:54 AM, Bruce Momjian wrote:

FYI, Heikki has fixed this bug and the fix will appear in Postgres 8.5.

Heikki> Oops, you're right. The check is indeed confusing julian day
Heikki> numbers, with epoch at 23th of Nov 4714 BC, with
Heikki> postgres-reckoning day numbers, with epoch at 1th of Jan
Heikki> 2000. Thanks, will fix.

Need a special case for the infinities as well?

postgres=# create table foo (d date);
CREATE TABLE
postgres=# INSERT INTO foo VALUES ('infinity');
INSERT 0 1
postgres=# COPY foo TO '/Users/jwp/foo.copy' WITH BINARY;
COPY 1
postgres=# COPY foo FROM '/Users/jwp/foo.copy' WITH BINARY;
ERROR: date out of range
CONTEXT: COPY foo, line 1, column d
postgres=# DELETE FROM foo;
DELETE 1
postgres=# INSERT INTO foo VALUES ('-infinity');
INSERT 0 1
postgres=# COPY foo TO '/Users/jwp/foo.copy' WITH BINARY;
COPY 1
postgres=# COPY foo FROM '/Users/jwp/foo.copy' WITH BINARY;
ERROR: date out of range
CONTEXT: COPY foo, line 1, column d

postgres=# SELECT version();
version
---------------------------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 8.5devel on i386-apple-darwin10.2.0, compiled by GCC i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5646) (dot 1), 64-bit
(1 row)

#13

Takahiro Itagaki

itagaki.takahiro@oss.ntt.co.jp

almost 16 years ago

In reply to: James William Pye (#12)

1 attachment(s)

Re: Tightening binary receive functions

James William Pye <lists@jwp.name> wrote:

Need a special case for the infinities as well?

postgres=# create table foo (d date);
postgres=# INSERT INTO foo VALUES ('infinity');
postgres=# COPY foo TO '/Users/jwp/foo.copy' WITH BINARY;
postgres=# COPY foo FROM '/Users/jwp/foo.copy' WITH BINARY;
ERROR: date out of range

Exactly. Patch attached.

We have special treatments of infinity in timestamp_recv,
but don't have in date_recv.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

Attachments:

date_recv_infinity_20100218.patchapplication/octet-stream; name=date_recv_infinity_20100218.patchDownload

diff -cprN head/src/backend/utils/adt/date.c work/src/backend/utils/adt/date.c
*** head/src/backend/utils/adt/date.c	2010-01-04 09:10:26.638773000 +0900
--- work/src/backend/utils/adt/date.c	2010-02-18 12:23:36.150736064 +0900
*************** date_recv(PG_FUNCTION_ARGS)
*** 208,215 ****
  	result = (DateADT) pq_getmsgint(buf, sizeof(DateADT));
  
  	/* Limit to the same range that date_in() accepts. */
! 	if (result < -POSTGRES_EPOCH_JDATE ||
! 		result >= JULIAN_MAX - POSTGRES_EPOCH_JDATE)
  		ereport(ERROR,
  				(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
  				 errmsg("date out of range")));
--- 208,217 ----
  	result = (DateADT) pq_getmsgint(buf, sizeof(DateADT));
  
  	/* Limit to the same range that date_in() accepts. */
! 	if (DATE_NOT_FINITE(result))
! 		 /* ok */ ;
! 	else if (result < -POSTGRES_EPOCH_JDATE ||
! 			 result >= JULIAN_MAX - POSTGRES_EPOCH_JDATE)
  		ereport(ERROR,
  				(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
  				 errmsg("date out of range")));