numerics lose scale and precision in views of unions
It appears that when you create a view of a union, numeric data types
loose their scale and precision.
db=> create table t1 (c1 numeric(4,2));
CREATE TABLE
db=> create view v1 as select c1 from t1;
CREATE VIEW
db=> \d v1
View "brian.v1"
Column | Type | Modifiers
--------+--------------+-----------
c1 | numeric(4,2) |
View definition:
SELECT t1.c1
FROM t1;
db=> create view v2 as select c1 from t1 union select c1 from t1;
CREATE VIEW
db=> \d v2
View "brian.v2"
Column | Type | Modifiers
--------+---------+-----------
c1 | numeric |
View definition:
SELECT t1.c1
FROM t1
UNION
SELECT t1.c1
FROM t1;
db=> create view v3 as select c1::numeric(4,2) from t1 union select c1::numeric(4,2) from t1;
CREATE VIEW
db=> \d v3
View "brian.v3"
Column | Type | Modifiers
--------+---------+-----------
c1 | numeric |
View definition:
SELECT t1.c1
FROM t1
UNION
SELECT t1.c1
FROM t1;
--
Brian C. DeRocher @ Mitretek Systems
This email was signed using OpenPGP.
"Brian C. DeRocher" <brian.derocher@mitretek.org> writes:
It appears that when you create a view of a union, numeric data types
loose their scale and precision.
I think this is the same issue discussed here:
http://archives.postgresql.org/pgsql-hackers/2004-12/msg00408.php
That was just before 8.0 release, so the change got postponed and then
seems to have slipped through the cracks :-(. Any objections to
fixing it for 8.2?
regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
"Brian C. DeRocher" <brian.derocher@mitretek.org> writes:
It appears that when you create a view of a union, numeric data types
loose their scale and precision.I think this is the same issue discussed here:
http://archives.postgresql.org/pgsql-hackers/2004-12/msg00408.phpThat was just before 8.0 release, so the change got postponed and then
seems to have slipped through the cracks :-(. Any objections to
fixing it for 8.2?
Sounds good to me. I'd like to talk a bit about the expected behavior
of a numeric hash function. This is the current behavior:
abc=# select * from test1;
a1
--------
1.00
1.0000
1.0
(3 rows)
abc=# select * from test1 group by a1;
a1
------
1.00
(1 row)
abc=# select distinct a1 from test1;
a1
------
1.00
(1 row)
I just want to double-check that this is the correct/expected behavior
(hopefully from the SQL spec?) and that a hash function need not concern
itself with the scale?
I'm guessing it's too late for 8.2, or would such a relatively
isolated/simple change be acceptable during the beta period (adding hash
functions for numeric types)? Don't swear I'll actually get to them but
I'd like to and so I'd like to be hash out (hah!) the correct semantics.
Thanks!
Stephen
Stephen Frost <sfrost@snowman.net> writes:
Sounds good to me. I'd like to talk a bit about the expected behavior
of a numeric hash function. This is the current behavior:
You're hijacking the thread, tsk tsk.
abc=# select * from test1;
a1
--------
1.00
1.0000
1.0
(3 rows)
abc=# select * from test1 group by a1;
a1
------
1.00
(1 row)
abc=# select distinct a1 from test1;
a1
------
1.00
(1 row)
Yeah, because numeric_cmp says that 1.0 and 1.00 are equal (what else
could it say? "less" and "greater" are surely wrong). So you need to
ensure that dscale is not included in the hash calculation. The
comments in numeric.h claim that you should not need to worry about
leading or trailing zeroes, but it sounds like you do need to worry
about differing weights for zero. Might be easiest and safest to use
only nonzero digit words in the hash calculation.
regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
Stephen Frost <sfrost@snowman.net> writes:
Sounds good to me. I'd like to talk a bit about the expected behavior
of a numeric hash function. This is the current behavior:You're hijacking the thread, tsk tsk.
Yeah, but Brian sits across the hall from me at work... :)
Yeah, because numeric_cmp says that 1.0 and 1.00 are equal (what else
could it say? "less" and "greater" are surely wrong). So you need to
It could say "not equal" pretty reasonably as the scale is
different. With all due respect, it seems to me that we track the scale
for the very reason that it may be different for different numbers.
Unless the SQL spec says differently or we get complaints from people
I'm all for keeping the current semantics though.
ensure that dscale is not included in the hash calculation. The
comments in numeric.h claim that you should not need to worry about
leading or trailing zeroes, but it sounds like you do need to worry
about differing weights for zero. Might be easiest and safest to use
only nonzero digit words in the hash calculation.
Makes sense to me.
Thanks!
Stephen
Stephen Frost <sfrost@snowman.net> writes:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
Yeah, because numeric_cmp says that 1.0 and 1.00 are equal (what else
could it say? "less" and "greater" are surely wrong). So you need to
It could say "not equal" pretty reasonably as the scale is
different.
Nope, there are exactly three options: equal, less, greater.
btree doesn't understand anything else.
Unless the SQL spec says differently or we get complaints from people
I'm all for keeping the current semantics though.
The SQL spec? Oh, that old thing ... I can't find anything very
specific about it in SQL99, but certainly there is nothing mandating
a different treatment than we are using. The closest material I can
find is
5.3 <literal>
3) The numeric value of an <exact numeric literal> is determined
by the normal mathematical interpretation of positional decimal
notation.
8.2 <comparison predicate>
2) Numbers are compared with respect to their algebraic value.
There's certainly not anything in 8.2 contemplating the idea that two
non-nulls could have any other comparison result than less, equal, or
greater.
regards, tom lane
On 2006-08-10, Stephen Frost <sfrost@snowman.net> wrote:
Yeah, because numeric_cmp says that 1.0 and 1.00 are equal (what else
could it say? "less" and "greater" are surely wrong). So you need toIt could say "not equal" pretty reasonably as the scale is
different.
The comparison function must be trichotomous, that is the possible results
are "equal", "greater", and "less". There is no other "not equal" result.
--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
Stephen Frost <sfrost@snowman.net> writes:
It could say "not equal" pretty reasonably as the scale is
different.Nope, there are exactly three options: equal, less, greater.
btree doesn't understand anything else.
Ah, yeah, I can see how something else would cause some difficulties. :)
There's certainly not anything in 8.2 contemplating the idea that two
non-nulls could have any other comparison result than less, equal, or
greater.
Makes me curious if it really makes sense to keep trailing zeros...
Having them there but not treating them any differently except for
display comes across as inconsistant to me.. Either 1.0 and 1.00 are
the same thing (and thus should be displayed the same way), or they
aren't (in which case they should be treated distinctly in, eg, a
'select distinct' clause).
Was there a particular motivation for the current way things are being
done? I seriously doubt anything I'd propose for the hash functions
would have any impact on it either way but it seems like it might be a
change worth considering in some future release (probably post-8.3).
Thanks,
Stephen
Stephen Frost <sfrost@snowman.net> writes:
Makes me curious if it really makes sense to keep trailing zeros...
AFAIR we consider them mainly as a display artifact. An application
that's declared a column as numeric(7,2) is likely to expect to see
exactly two digits after the decimal point.
Either 1.0 and 1.00 are
the same thing (and thus should be displayed the same way), or they
aren't (in which case they should be treated distinctly in, eg, a
'select distinct' clause).
Consistency has never been SQL's strong point ;-)
regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
Stephen Frost <sfrost@snowman.net> writes:
Makes me curious if it really makes sense to keep trailing zeros...
AFAIR we consider them mainly as a display artifact. An application
that's declared a column as numeric(7,2) is likely to expect to see
exactly two digits after the decimal point.
Hmm. I should have mentioned this previously (since I was thinking
about it at the time...) but this display artifact is unfortunately not
without consequences. I'm about 80% sure that having the scale too
large (as in, larger than about 6 or 7 decimal places) breaks MS Access
using ODBC. It complains about not being able to represent the number
(even though it's just trailing zeros). It might be possible to handle
that in the ODBC driver but I don't think it'd be very clean
(considering you would want to fail cases where it's not just trailing
zeros).
This was using just a straight-up 'numeric' data type though. Perhaps
for that case we could drop the unnecessary zeros? I can understand
having them there when a specific scale is specified (I suppose...) but
when there isn't a specific scale given any application would have to
deal with the variability in the number of digits after the decimal
point anyway.
Either 1.0 and 1.00 are
the same thing (and thus should be displayed the same way), or they
aren't (in which case they should be treated distinctly in, eg, a
'select distinct' clause).Consistency has never been SQL's strong point ;-)
Indeed. I think my suggestion above would be at least a half-step
towards consistancy without breaking things. I *think* this would also
mean that we'd always have either a fixed number of decimal places
(scale specified), or no trailing zeros.
This would, in fact, be awfully nice for me since I wouldn't have to
deal with things like:
78.4
2.3625
4.1666675000000000
16.6666675000000000
0.83333250000000000000
where I then get to do some *very* ugly magic to find the records with
the extra zeros off on the end and truncate them (think cast to text and
then use a regexp, not fun at all...). Unfortunately round(a,b) <> a
doesn't work so hot in these cases where you do want the precision just
not the extra zeros off on the end.
Thanks,
Stephen
On Wed, Aug 09, 2006 at 11:35:48PM -0400, Tom Lane wrote:
Stephen Frost <sfrost@snowman.net> writes:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
Yeah, because numeric_cmp says that 1.0 and 1.00 are equal (what else
could it say? "less" and "greater" are surely wrong). So you need toIt could say "not equal" pretty reasonably as the scale is
different.Nope, there are exactly three options: equal, less, greater.
btree doesn't understand anything else.
Mathematically, 1.0 = 0.9500... -> 1.05000...
In theory, B-Tree could be fine with this. As long as the operators
for =, <>, <, and > are made to consistently understand this principle.
For example:
1.0 = 0.95
1.0 = 1.05
1.0 <> 0.94
1.0 <> 1.06
0.94 < 1.0
1.06 > 1.0
I'm not sure that it warrants being changed, though. Storing mixed
precision values in a column and expecting the database to understand
rounding rules does not appeal to me. Too magical.
Cheers,
mark
--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...
On Thu, Aug 10, 2006 at 03:40:11AM -0000, Andrew - Supernews wrote:
On 2006-08-10, Stephen Frost <sfrost@snowman.net> wrote:
Yeah, because numeric_cmp says that 1.0 and 1.00 are equal (what else
could it say? "less" and "greater" are surely wrong). So you need toIt could say "not equal" pretty reasonably as the scale is
different.The comparison function must be trichotomous, that is the possible results
are "equal", "greater", and "less". There is no other "not equal" result.
And it is wrong - because they are equal. But 1.0 is also equal to 1.01.
We shouldn't go there. :-)
Cheers,
mark
--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...
Stephen Frost <sfrost@snowman.net> writes:
This was using just a straight-up 'numeric' data type though. Perhaps
for that case we could drop the unnecessary zeros?
That would make numeric useless for the common scientific/engineering
usage where you write the number of decimal places you think are
significant in your measurement. In that usage, "1.0" and "1.000"
do have different meanings.
regards, tom lane
On Thu, Aug 10, 2006 at 09:20:09AM -0400, mark@mark.mielke.cc wrote:
Mathematically, 1.0 = 0.9500... -> 1.05000...
In theory, B-Tree could be fine with this. As long as the operators
for =, <>, <, and > are made to consistently understand this principle.
For example:1.0 = 0.95
1.0 = 1.05
1.0 <> 0.94
1.0 <> 1.06
0.94 < 1.0
1.06 > 1.0I'm not sure that it warrants being changed, though. Storing mixed
precision values in a column and expecting the database to understand
rounding rules does not appeal to me. Too magical.
I take this back. It's magical enough to be impossible.
If 1.0 is inserted, and then 1.05, it could consider them 'equal' and
put them together. If 1.1 and 1.14 were inserted, they could be 'equal'
and put together. Then an insertion of '1' would have to re-group them
all together, making the B-Tree ineffective.
So - taking back what I say above.
Cheers,
mark
--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...