information_schema.columns changes needed for OLEDB
As we discussed at pgcon2009 there are some changes/fixes necessary in
information_schema.columns to allow correct work of applications and
services via OLEDB on Windows. Here are some:
1. data_type field contains types names that are not recognized by MS apps.
Code around: rename types on the fly, e.g.
integer -> int
character varying -> varchar
character -> char
timestamp without time zone -> datetime
bytea -> image
2. character_maximum_length field
Code around: change value for text abd bytea types
[text] 1073741823
[bytea] 2147483647
3. character_octet_length should always be double of
character_maximum_length (due to Unicode character size on Windows which is
2).
4. datetime_precision field is not always correct
Code around: change value of the fly, e.g. if value is not null then
[numeric] keep the value (ok)
[bigint] set value to 19
all other set to 10
5. numeric_precision_radix field should always be equal to 10
6. datetime_precision field, minor changes
Code around: change value on the fly, e.g.
[date] set value to zero
[datetime] set value to 3
On Friday 22 May 2009 18:27:01 Konstantin Izmailov wrote:
As we discussed at pgcon2009 there are some changes/fixes necessary in
information_schema.columns to allow correct work of applications and
services via OLEDB on Windows. Here are some:1. data_type field contains types names that are not recognized by MS apps.
Code around: rename types on the fly, e.g.
integer -> int
character varying -> varchar
character -> char
The spelling of these types in the information schema is fixed by the SQL
standard. We can't change that.
timestamp without time zone -> datetime
And that would certainly be wrong for other applications, because PostgreSQL
doesn't have a datetime type.
bytea -> image
And that we certainly can't do either.
2. character_maximum_length field
Code around: change value for text abd bytea types
[text] 1073741823
(see next item)
[bytea] 2147483647
But bytea is not a character type in the first place, so this value is
meaningless.
3. character_octet_length should always be double of
character_maximum_length (due to Unicode character size on Windows which is
2).
We could do something like that if we exposed the maximum octet length of a
character per encoding. But what I wonder is whether this should reflect the
server or the client encoding. How do your applications use this value?
4. datetime_precision field is not always correct
Code around: change value of the fly, e.g. if value is not null then
[numeric] keep the value (ok)
[bigint] set value to 19
all other set to 10
Why would numeric and bigint affect *datetime*_precision at all?
5. numeric_precision_radix field should always be equal to 10
Why?
6. datetime_precision field, minor changes
Code around: change value on the fly, e.g.
[date] set value to zero
Makes sense. I think this is not correct at the moment.
[datetime] set value to 3
Well, it really depends on what you set it to when you declared the column,
no?
Number 4 is actually numeric_precision (I typed incorrectly). My
recollection is that numeric_precision sometimes expressed in radix 2 and it
caused issues for Windows apps.
I agree on other issues. I was curious if database can help OLEDB driver (to
make it simpler). Anyway it can emulate values for specific Windows apps on
the fly. Thank you!
On Sat, May 23, 2009 at 2:57 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
Show quoted text
On Friday 22 May 2009 18:27:01 Konstantin Izmailov wrote:
As we discussed at pgcon2009 there are some changes/fixes necessary in
information_schema.columns to allow correct work of applications and
services via OLEDB on Windows. Here are some:1. data_type field contains types names that are not recognized by MS
apps.
Code around: rename types on the fly, e.g.
integer -> int
character varying -> varchar
character -> char
The spelling of these types in the information schema is fixed by the SQL
standard. We can't change that.timestamp without time zone -> datetime
And that would certainly be wrong for other applications, because
PostgreSQL
doesn't have a datetime type.bytea -> image
And that we certainly can't do either.
2. character_maximum_length field
Code around: change value for text abd bytea types
[text] 1073741823
(see next item)
[bytea] 2147483647
But bytea is not a character type in the first place, so this value is
meaningless.3. character_octet_length should always be double of
character_maximum_length (due to Unicode character size on Windows whichis
2).
We could do something like that if we exposed the maximum octet length of a
character per encoding. But what I wonder is whether this should reflect
the
server or the client encoding. How do your applications use this value?4. datetime_precision field is not always correct
Code around: change value of the fly, e.g. if value is not null then
[numeric] keep the value (ok)
[bigint] set value to 19
all other set to 10
Why would numeric and bigint affect *datetime*_precision at all?
5. numeric_precision_radix field should always be equal to 10
Why?
6. datetime_precision field, minor changes
Code around: change value on the fly, e.g.
[date] set value to zero
Makes sense. I think this is not correct at the moment.
[datetime] set value to 3
Well, it really depends on what you set it to when you declared the column,
no?
On 5/23/09 7:37 PM, Konstantin Izmailov wrote:
Number 4 is actually numeric_precision (I typed incorrectly). My
recollection is that numeric_precision sometimes expressed in radix 2
and it caused issues for Windows apps.I agree on other issues. I was curious if database can help OLEDB driver
(to make it simpler). Anyway it can emulate values for specific Windows
apps on the fly. Thank you!
You could, of course, create your own ms_information_schema which had
ms_friendly views of the IS.
--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com
Josh Berkus wrote:
On 5/23/09 7:37 PM, Konstantin Izmailov wrote:
Number 4 is actually numeric_precision (I typed incorrectly). My
recollection is that numeric_precision sometimes expressed in radix 2
and it caused issues for Windows apps.I agree on other issues. I was curious if database can help OLEDB driver
(to make it simpler). Anyway it can emulate values for specific Windows
apps on the fly. Thank you!You could, of course, create your own ms_information_schema which had
ms_friendly views of the IS.
This is what I have done for a past project. I do not think we should
part from the standard SQL schema in order to satisfy a certain third
party component.
If the information_schema does not provide all the information, one
could always query the pg_* tables for needed data.
--
Regards,
Gevik
On Sunday 24 May 2009 03:37:28 Konstantin Izmailov wrote:
Number 4 is actually numeric_precision (I typed incorrectly). My
recollection is that numeric_precision sometimes expressed in radix 2 and
it caused issues for Windows apps.
It is measured in radix 2 for floating-point types and in radix 10 for fixed-
point types.
Hi,
I'm not sure that it is related to information_schema but I wanted to let
you know that some Postgres functions are listed in pg_proc while others are
not. For example, all Data Type Formatting function are in pg_proc (to_char,
to_hex, ...). While several of the Date/Time Functions are not there
(extract, localtime, ...).
Why different Postgres function are not equal???
This causes issues to Windows integration as well.
Konstantin
Konstantin Izmailov <pgfizm@gmail.com> writes:
you know that some Postgres functions are listed in pg_proc while others are
not. For example, all Data Type Formatting function are in pg_proc (to_char,
to_hex, ...). While several of the Date/Time Functions are not there
(extract, localtime, ...).
The ones that appear not to be there are ones that the SQL standard
demands special weird syntax for. The grammar translates such calls
to standard function calls to underlying functions, which usually are
named a bit differently to avoid confusion. For instance
extract(field from some_expr) becomes date_part('field', some_expr).
If you want to know what all of these are, see the func_expr production
in parser/gram.y.
This causes issues to Windows integration as well.
Complain to the SQL standards committee, especially to those members
who seem to think COBOL represented the apex of programming language
syntax design :-(
regards, tom lane
Tom,
this is very helpful - thank you so much!
I had to discover those 'missing' functions one by one, usually after users'
complaints.
Konstantin
On Fri, May 29, 2009 at 11:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Show quoted text
Konstantin Izmailov <pgfizm@gmail.com> writes:
you know that some Postgres functions are listed in pg_proc while others
are
not. For example, all Data Type Formatting function are in pg_proc
(to_char,
to_hex, ...). While several of the Date/Time Functions are not there
(extract, localtime, ...).The ones that appear not to be there are ones that the SQL standard
demands special weird syntax for. The grammar translates such calls
to standard function calls to underlying functions, which usually are
named a bit differently to avoid confusion. For instance
extract(field from some_expr) becomes date_part('field', some_expr).If you want to know what all of these are, see the func_expr production
in parser/gram.y.This causes issues to Windows integration as well.
Complain to the SQL standards committee, especially to those members
who seem to think COBOL represented the apex of programming language
syntax design :-(regards, tom lane
Peter Eisentraut <peter_e@gmx.net> writes:
On Friday 22 May 2009 18:27:01 Konstantin Izmailov wrote:
[ discussion of some details of information_schema results ]
Are we going to do anything about any of these issues?
In particular,
6. datetime_precision field, minor changes
Code around: change value on the fly, e.g.
[date] set value to zero
Makes sense. I think this is not correct at the moment.
AFAICS, the SQL standard demands that precision and scale fields be
non-null all the time for those data types where they make sense
(this is encoded in the CHECK CONSTRAINTs that are declared for the
various information-schema tables, see particularly 21.15
DATA_TYPE_DESCRIPTOR base table in SQL99). DATE is clearly wrong
per spec, but it's not the only problem.
Our interpretation has been to set these values to null if the typmod
is defaulted, which is reasonable in the abstract but it's still a
violation of spec. I wonder whether we should be inserting some large
limit value instead.
For the datetime types we actually have a perfectly good default
precision value, namely six digits, if the DB is using integer datetimes
--- and I don't think there's a strong argument not to use this value
for float timestamps too.
I'd prefer to avoid a catversion bump at this stage of the cycle,
but it looks like any changes here would merely involve the bodies of
some functions in information_schema.sql. I think we could just change
them without a catversion bump. Any beta testers who actually care
could easily insert the new definitions without an initdb.
regards, tom lane
On Sun, May 31, 2009 at 11:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I'd prefer to avoid a catversion bump at this stage of the cycle,
but it looks like any changes here would merely involve the bodies of
some functions in information_schema.sql. I think we could just change
them without a catversion bump. Any beta testers who actually care
could easily insert the new definitions without an initdb.
Is this a regression relative to 8.3? If not, why not leave it for
8.5? It seems a little bit late in the game to be messing with this.
I admit I don't have any particular reason to think it will break
anything, but there are other minor patches floating around that meet
that criteria too, so if this one then why not those?
...Robert
Robert Haas <robertmhaas@gmail.com> writes:
Is this a regression relative to 8.3? If not, why not leave it for
8.5?
No. it's not a regression. What I was concerned about was mainly that
it seemed likely to drop off the radar screen without being addressed at
all, since most of the other things Konstantin complained about seemed
to be we-won't-change-this items. I have no objection to a conscious
decision to leave it alone for 8.4, so long as it gets memorialized as
either a pending patch or a TODO item.
regards, tom lane
On Sunday 31 May 2009 18:41:55 Tom Lane wrote:
AFAICS, the SQL standard demands that precision and scale fields be
non-null all the time for those data types where they make sense
(this is encoded in the CHECK CONSTRAINTs that are declared for the
various information-schema tables, see particularly 21.15
DATA_TYPE_DESCRIPTOR base table in SQL99). DATE is clearly wrong
per spec, but it's not the only problem.
The DATE change is the only thing I'd be prepared to make right now.
Our interpretation has been to set these values to null if the typmod
is defaulted, which is reasonable in the abstract but it's still a
violation of spec. I wonder whether we should be inserting some large
limit value instead.
That is something to think about, but it needs more time. We also have some
inconsistency there; for example we produce a large limit value for octet
length. Needs more thought. And if we go down that route, it should also
require less hardcoding of numbers into information_schema.sql.
Peter Eisentraut <peter_e@gmx.net> writes:
On Sunday 31 May 2009 18:41:55 Tom Lane wrote:
AFAICS, the SQL standard demands that precision and scale fields be
non-null all the time for those data types where they make sense
(this is encoded in the CHECK CONSTRAINTs that are declared for the
various information-schema tables, see particularly 21.15
DATA_TYPE_DESCRIPTOR base table in SQL99). DATE is clearly wrong
per spec, but it's not the only problem.
The DATE change is the only thing I'd be prepared to make right now.
At this point I think the clear decision is "we're not changing anything
for 8.4". I've put the issue on the TODO list for future development
cycles.
regards, tom lane
On Monday 08 June 2009 07:12:33 Tom Lane wrote:
Peter Eisentraut <peter_e@gmx.net> writes:
On Sunday 31 May 2009 18:41:55 Tom Lane wrote:
AFAICS, the SQL standard demands that precision and scale fields be
non-null all the time for those data types where they make sense
(this is encoded in the CHECK CONSTRAINTs that are declared for the
various information-schema tables, see particularly 21.15
DATA_TYPE_DESCRIPTOR base table in SQL99). DATE is clearly wrong
per spec, but it's not the only problem.The DATE change is the only thing I'd be prepared to make right now.
At this point I think the clear decision is "we're not changing anything
for 8.4". I've put the issue on the TODO list for future development
cycles.
After gathering that there will probably be some other changes before
release that will require an initdb (even without catversion bump), and
after reexamining the issue, I think it's trivial and uncontroversial to
fix the datetime issues:
diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql
index 9c5672f..cb0296a 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -160,12 +160,12 @@ CREATE FUNCTION _pg_datetime_precision(typid oid, typmod int4) RETURNS integer
RETURNS NULL ON NULL INPUT
AS
$$SELECT
- CASE WHEN $2 = -1 /* default typmod */
- THEN null
+ CASE WHEN $1 IN (1082) /* date */
+ THEN 0
WHEN $1 IN (1083, 1114, 1184, 1266) /* time, timestamp, same + tz */
- THEN $2
+ THEN CASE WHEN $2 = -1 THEN 6 ELSE $2 END
WHEN $1 IN (1186) /* interval */
- THEN $2 & 65535
+ THEN CASE WHEN $2 = -1 THEN 6 ELSE $2 & 65535 END
ELSE null
END$$;
I have also prepared a patch that creates more realistic values for
character_octet_length based on encoding information, which I will propose
for 8.5. The issue of whether to report null or some large value for
"unlimited" length data types needs some more thought.
Peter Eisentraut <peter_e@gmx.net> writes:
diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql index 9c5672f..cb0296a 100644 --- a/src/backend/catalog/information_schema.sql +++ b/src/backend/catalog/information_schema.sql @@ -160,12 +160,12 @@ CREATE FUNCTION _pg_datetime_precision(typid oid, typmod int4) RETURNS integer RETURNS NULL ON NULL INPUT AS $$SELECT - CASE WHEN $2 = -1 /* default typmod */ - THEN null + CASE WHEN $1 IN (1082) /* date */ + THEN 0 WHEN $1 IN (1083, 1114, 1184, 1266) /* time, timestamp, same + tz */ - THEN $2 + THEN CASE WHEN $2 = -1 THEN 6 ELSE $2 END WHEN $1 IN (1186) /* interval */ - THEN $2 & 65535 + THEN CASE WHEN $2 = -1 THEN 6 ELSE $2 & 65535 END ELSE null END$$;
Just for the record, this is a perfect example of why -u format sucks.
How many people think they can look at this and know exactly what the
change does?
I'll be back to weigh in on the merits of the patch after I've converted
it to -c format so I can understand it.
regards, tom lane
Peter Eisentraut <peter_e@gmx.net> writes:
After gathering that there will probably be some other changes before
release that will require an initdb (even without catversion bump), and
after reexamining the issue, I think it's trivial and uncontroversial to
fix the datetime issues:
For the benefit of anyone else finding this hard to read, I've
reformatted as
*** src/backend/catalog/information_schema.sql.orig Tue Feb 24 11:10:16 2009
--- src/backend/catalog/information_schema.sql Tue Jun 9 14:21:37 2009
***************
*** 160,171 ****
RETURNS NULL ON NULL INPUT
AS
$$SELECT
! CASE WHEN $2 = -1 /* default typmod */
! THEN null
WHEN $1 IN (1083, 1114, 1184, 1266) /* time, timestamp, same + tz */
! THEN $2
WHEN $1 IN (1186) /* interval */
! THEN $2 & 65535
ELSE null
END$$;
--- 160,171 ----
RETURNS NULL ON NULL INPUT
AS
$$SELECT
! CASE WHEN $1 IN (1082) /* date */
! THEN 0
WHEN $1 IN (1083, 1114, 1184, 1266) /* time, timestamp, same + tz */
! THEN CASE WHEN $2 = -1 THEN 6 ELSE $2 END
WHEN $1 IN (1186) /* interval */
! THEN CASE WHEN $2 = -1 THEN 6 ELSE $2 & 65535 END
ELSE null
END$$;
I would suggest coding the inner cases as
CASE WHEN $2 < 0 THEN ...default...
since the general practice in the C code is to treat any negative
value as meaning "unspecified". Otherwise, +1.
regards, tom lane
On Friday 22 May 2009 18:27:01 Konstantin Izmailov wrote:
3. character_octet_length should always be double of
character_maximum_length (due to Unicode character size on Windows which is
2).
I have the attached patch that would make character_octet_length the product
of character_octet_length and the maximum octet length of a single character
in the selected server encoding. So for UTF-8, this would be factor 4. This
doesn't exactly correspond to the behavior that you expect, but I think it's
more correct overall anyway.
Attachments:
infoschema-character-octet-length.patchtext/x-patch; charset=UTF-8; name=infoschema-character-octet-length.patchDownload
diff --git a/doc/src/sgml/information_schema.sgml b/doc/src/sgml/information_schema.sgml
index 8e145d7..6460862 100644
--- a/doc/src/sgml/information_schema.sgml
+++ b/doc/src/sgml/information_schema.sgml
@@ -343,10 +343,10 @@
<entry><type>cardinal_number</type></entry>
<entry>
If <literal>data_type</literal> identifies a character type,
- the maximum possible length in octets (bytes) of a datum (this
- should not be of concern to
- <productname>PostgreSQL</productname> users); null for all
- other data types.
+ the maximum possible length in octets (bytes) of a datum; null
+ for all other data types. The maximum octet length depends on
+ the declared character maximum length (see above) and the
+ server encoding.
</entry>
</row>
@@ -947,9 +947,10 @@
<entry><type>cardinal_number</type></entry>
<entry>
If <literal>data_type</literal> identifies a character type,
- the maximum possible length in octets (bytes) of a datum (this
- should not be of concern to <productname>PostgreSQL</productname> users); null for all
- other data types.
+ the maximum possible length in octets (bytes) of a datum; null
+ for all other data types. The maximum octet length depends on
+ the declared character maximum length (see above) and the
+ server encoding.
</entry>
</row>
@@ -1688,9 +1689,9 @@
<entry><type>cardinal_number</type></entry>
<entry>
If the domain has a character type, the maximum possible length
- in octets (bytes) of a datum (this should not be of concern to
- <productname>PostgreSQL</productname> users); null for all
- other data types.
+ in octets (bytes) of a datum; null for all other data types.
+ The maximum octet length depends on the declared character
+ maximum length (see above) and the server encoding.
</entry>
</row>
diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql
index fe75322..cd6258b 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -102,11 +102,7 @@ CREATE FUNCTION _pg_char_octet_length(typid oid, typmod int4) RETURNS integer
IMMUTABLE
RETURNS NULL ON NULL INPUT
AS
-$$SELECT
- CASE WHEN $1 IN (25, 1042, 1043) /* text, char, varchar */
- THEN CAST(2^30 AS integer)
- ELSE null
- END$$;
+$$SELECT information_schema._pg_char_max_length($1, $2) * pg_encoding_max_length((SELECT encoding FROM pg_database WHERE datname = current_database()))$$;
CREATE FUNCTION _pg_numeric_precision(typid oid, typmod int4) RETURNS integer
LANGUAGE sql
diff --git a/src/backend/utils/mb/mbutils.c b/src/backend/utils/mb/mbutils.c
index 753c927..058493c 100644
--- a/src/backend/utils/mb/mbutils.c
+++ b/src/backend/utils/mb/mbutils.c
@@ -482,6 +482,17 @@ length_in_encoding(PG_FUNCTION_ARGS)
}
+Datum
+pg_encoding_max_length_sql(PG_FUNCTION_ARGS)
+{
+ int encoding = PG_GETARG_INT32(0);
+
+ if (PG_VALID_ENCODING(encoding))
+ return pg_wchar_table[encoding].maxmblen;
+ else
+ PG_RETURN_NULL();
+}
+
/*
* convert client encoding to server encoding.
*/
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 0285acd..e194d6a 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2278,6 +2278,9 @@ DESCR("convert encoding name to encoding id");
DATA(insert OID = 1597 ( pg_encoding_to_char PGNSP PGUID 12 1 0 0 f f f t f s 1 0 19 "23" _null_ _null_ _null_ _null_ PG_encoding_to_char _null_ _null_ _null_ ));
DESCR("convert encoding id to encoding name");
+DATA(insert OID = 2319 ( pg_encoding_max_length PGNSP PGUID 12 1 0 0 f f f t f i 1 0 23 "23" _null_ _null_ _null_ _null_ pg_encoding_max_length_sql _null_ _null_ _null_ ));
+DESCR("maximum octet length of a character in an eocidng");
+
DATA(insert OID = 1638 ( oidgt PGNSP PGUID 12 1 0 0 f f f t f i 2 0 16 "26 26" _null_ _null_ _null_ _null_ oidgt _null_ _null_ _null_ ));
DESCR("greater-than");
DATA(insert OID = 1639 ( oidge PGNSP PGUID 12 1 0 0 f f f t f i 2 0 16 "26 26" _null_ _null_ _null_ _null_ oidge _null_ _null_ _null_ ));
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index c1b9393..13fd41a 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -948,6 +948,7 @@ extern Datum pg_convert(PG_FUNCTION_ARGS);
extern Datum pg_convert_to(PG_FUNCTION_ARGS);
extern Datum pg_convert_from(PG_FUNCTION_ARGS);
extern Datum length_in_encoding(PG_FUNCTION_ARGS);
+extern Datum pg_encoding_max_length_sql(PG_FUNCTION_ARGS);
/* format_type.c */
extern Datum format_type(PG_FUNCTION_ARGS);
Peter Eisentraut <peter_e@gmx.net> writes:
I have the attached patch that would make character_octet_length the product
of character_octet_length and the maximum octet length of a single character
in the selected server encoding. So for UTF-8, this would be factor 4. This
doesn't exactly correspond to the behavior that you expect, but I think it's
more correct overall anyway.
+1, but that new query isn't very schema-safe ... I think it needs a few
"pg_catalog." qualifications.
regards, tom lane
On Monday 06 July 2009 22:16:12 Tom Lane wrote:
Peter Eisentraut <peter_e@gmx.net> writes:
I have the attached patch that would make character_octet_length the
product of character_octet_length and the maximum octet length of a
single character in the selected server encoding. So for UTF-8, this
would be factor 4. This doesn't exactly correspond to the behavior that
you expect, but I think it's more correct overall anyway.+1, but that new query isn't very schema-safe ... I think it needs a few
"pg_catalog." qualifications.
Applied with fixes.