numerics lose scale and precision in views of unions

Started by Brian C. DeRocherover 19 years ago14 messages

brian.derocher@mitretek.org

over 19 years ago

It appears that when you create a view of a union, numeric data types
loose their scale and precision.

db=> create table t1 (c1 numeric(4,2));
CREATE TABLE

db=> create view v1 as select c1 from t1;
CREATE VIEW

db=> \d v1
View "brian.v1"
Column | Type | Modifiers
--------+--------------+-----------
c1 | numeric(4,2) |
View definition:
SELECT t1.c1
FROM t1;

db=> create view v2 as select c1 from t1 union select c1 from t1;
CREATE VIEW
db=> \d v2
View "brian.v2"
Column | Type | Modifiers
--------+---------+-----------
c1 | numeric |
View definition:
SELECT t1.c1
FROM t1
UNION
SELECT t1.c1
FROM t1;

db=> create view v3 as select c1::numeric(4,2) from t1 union select c1::numeric(4,2) from t1;
CREATE VIEW

db=> \d v3
View "brian.v3"
Column | Type | Modifiers
--------+---------+-----------
c1 | numeric |
View definition:
SELECT t1.c1
FROM t1
UNION
SELECT t1.c1
FROM t1;

--
Brian C. DeRocher @ Mitretek Systems
This email was signed using OpenPGP.

Tom Lane

tgl@sss.pgh.pa.us

over 19 years ago

In reply to: Brian C. DeRocher (#1)

Re: numerics lose scale and precision in views of unions

"Brian C. DeRocher" <brian.derocher@mitretek.org> writes:

It appears that when you create a view of a union, numeric data types
loose their scale and precision.

I think this is the same issue discussed here:
http://archives.postgresql.org/pgsql-hackers/2004-12/msg00408.php

That was just before 8.0 release, so the change got postponed and then
seems to have slipped through the cracks :-(. Any objections to
fixing it for 8.2?

regards, tom lane

Stephen Frost

sfrost@snowman.net

over 19 years ago

In reply to: Tom Lane (#2)

Re: [BUGS] numerics lose scale and precision in views of unions

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

"Brian C. DeRocher" <brian.derocher@mitretek.org> writes:

It appears that when you create a view of a union, numeric data types
loose their scale and precision.

I think this is the same issue discussed here:
http://archives.postgresql.org/pgsql-hackers/2004-12/msg00408.php

That was just before 8.0 release, so the change got postponed and then
seems to have slipped through the cracks :-(. Any objections to
fixing it for 8.2?

Sounds good to me. I'd like to talk a bit about the expected behavior
of a numeric hash function. This is the current behavior:

abc=# select * from test1;
a1
--------
1.00
1.0000
1.0
(3 rows)

abc=# select * from test1 group by a1;
a1
------
1.00
(1 row)

abc=# select distinct a1 from test1;
a1
------
1.00
(1 row)

I just want to double-check that this is the correct/expected behavior
(hopefully from the SQL spec?) and that a hash function need not concern
itself with the scale?

I'm guessing it's too late for 8.2, or would such a relatively
isolated/simple change be acceptable during the beta period (adding hash
functions for numeric types)? Don't swear I'll actually get to them but
I'd like to and so I'd like to be hash out (hah!) the correct semantics.

Thanks!

Stephen

Tom Lane

tgl@sss.pgh.pa.us

over 19 years ago

In reply to: Stephen Frost (#3)

Re: [BUGS] numerics lose scale and precision in views of unions

Stephen Frost <sfrost@snowman.net> writes:

Sounds good to me. I'd like to talk a bit about the expected behavior
of a numeric hash function. This is the current behavior:

You're hijacking the thread, tsk tsk.

abc=# select * from test1;
a1
--------
1.00
1.0000
1.0
(3 rows)

abc=# select * from test1 group by a1;
a1
------
1.00
(1 row)

abc=# select distinct a1 from test1;
a1
------
1.00
(1 row)

Yeah, because numeric_cmp says that 1.0 and 1.00 are equal (what else
could it say? "less" and "greater" are surely wrong). So you need to
ensure that dscale is not included in the hash calculation. The
comments in numeric.h claim that you should not need to worry about
leading or trailing zeroes, but it sounds like you do need to worry
about differing weights for zero. Might be easiest and safest to use
only nonzero digit words in the hash calculation.

regards, tom lane

Stephen Frost

sfrost@snowman.net

over 19 years ago

In reply to: Tom Lane (#4)

Re: [BUGS] numerics lose scale and precision in views of unions

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Stephen Frost <sfrost@snowman.net> writes:

Sounds good to me. I'd like to talk a bit about the expected behavior
of a numeric hash function. This is the current behavior:

You're hijacking the thread, tsk tsk.

Yeah, but Brian sits across the hall from me at work... :)

Yeah, because numeric_cmp says that 1.0 and 1.00 are equal (what else
could it say? "less" and "greater" are surely wrong). So you need to

It could say "not equal" pretty reasonably as the scale is
different. With all due respect, it seems to me that we track the scale
for the very reason that it may be different for different numbers.
Unless the SQL spec says differently or we get complaints from people
I'm all for keeping the current semantics though.

ensure that dscale is not included in the hash calculation. The
comments in numeric.h claim that you should not need to worry about
leading or trailing zeroes, but it sounds like you do need to worry
about differing weights for zero. Might be easiest and safest to use
only nonzero digit words in the hash calculation.

Makes sense to me.

Thanks!

Stephen

Tom Lane

tgl@sss.pgh.pa.us

over 19 years ago

In reply to: Stephen Frost (#5)

Re: [BUGS] numerics lose scale and precision in views of unions

Stephen Frost <sfrost@snowman.net> writes:

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Yeah, because numeric_cmp says that 1.0 and 1.00 are equal (what else
could it say? "less" and "greater" are surely wrong). So you need to

It could say "not equal" pretty reasonably as the scale is
different.

Nope, there are exactly three options: equal, less, greater.
btree doesn't understand anything else.

Unless the SQL spec says differently or we get complaints from people
I'm all for keeping the current semantics though.

The SQL spec? Oh, that old thing ... I can't find anything very
specific about it in SQL99, but certainly there is nothing mandating
a different treatment than we are using. The closest material I can
find is

5.3 <literal>
3) The numeric value of an <exact numeric literal> is determined
by the normal mathematical interpretation of positional decimal
notation.

8.2 <comparison predicate>
2) Numbers are compared with respect to their algebraic value.

There's certainly not anything in 8.2 contemplating the idea that two
non-nulls could have any other comparison result than less, equal, or
greater.

regards, tom lane

Andrew - Supernews

andrew+nonews@supernews.com

over 19 years ago

In reply to: Brian C. DeRocher (#1)

Re: [BUGS] numerics lose scale and precision in views of unions

On 2006-08-10, Stephen Frost <sfrost@snowman.net> wrote:

Yeah, because numeric_cmp says that 1.0 and 1.00 are equal (what else
could it say? "less" and "greater" are surely wrong). So you need to

It could say "not equal" pretty reasonably as the scale is
different.

The comparison function must be trichotomous, that is the possible results
are "equal", "greater", and "less". There is no other "not equal" result.

--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

Stephen Frost

sfrost@snowman.net

over 19 years ago

In reply to: Tom Lane (#6)

Re: [BUGS] numerics lose scale and precision in views of unions

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Stephen Frost <sfrost@snowman.net> writes:

It could say "not equal" pretty reasonably as the scale is
different.

Nope, there are exactly three options: equal, less, greater.
btree doesn't understand anything else.

Ah, yeah, I can see how something else would cause some difficulties. :)

There's certainly not anything in 8.2 contemplating the idea that two
non-nulls could have any other comparison result than less, equal, or
greater.

Makes me curious if it really makes sense to keep trailing zeros...
Having them there but not treating them any differently except for
display comes across as inconsistant to me.. Either 1.0 and 1.00 are
the same thing (and thus should be displayed the same way), or they
aren't (in which case they should be treated distinctly in, eg, a
'select distinct' clause).

Was there a particular motivation for the current way things are being
done? I seriously doubt anything I'd propose for the hash functions
would have any impact on it either way but it seems like it might be a
change worth considering in some future release (probably post-8.3).

Thanks,

Stephen

Tom Lane

tgl@sss.pgh.pa.us

over 19 years ago

In reply to: Stephen Frost (#8)

Re: [BUGS] numerics lose scale and precision in views of unions

Stephen Frost <sfrost@snowman.net> writes:

Makes me curious if it really makes sense to keep trailing zeros...

AFAIR we consider them mainly as a display artifact. An application
that's declared a column as numeric(7,2) is likely to expect to see
exactly two digits after the decimal point.

Either 1.0 and 1.00 are
the same thing (and thus should be displayed the same way), or they
aren't (in which case they should be treated distinctly in, eg, a
'select distinct' clause).

Consistency has never been SQL's strong point ;-)

regards, tom lane

#10

Stephen Frost

sfrost@snowman.net

over 19 years ago

In reply to: Tom Lane (#9)

Re: [BUGS] numerics lose scale and precision in views of unions

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Stephen Frost <sfrost@snowman.net> writes:

Makes me curious if it really makes sense to keep trailing zeros...

AFAIR we consider them mainly as a display artifact. An application
that's declared a column as numeric(7,2) is likely to expect to see
exactly two digits after the decimal point.

Hmm. I should have mentioned this previously (since I was thinking
about it at the time...) but this display artifact is unfortunately not
without consequences. I'm about 80% sure that having the scale too
large (as in, larger than about 6 or 7 decimal places) breaks MS Access
using ODBC. It complains about not being able to represent the number
(even though it's just trailing zeros). It might be possible to handle
that in the ODBC driver but I don't think it'd be very clean
(considering you would want to fail cases where it's not just trailing
zeros).

This was using just a straight-up 'numeric' data type though. Perhaps
for that case we could drop the unnecessary zeros? I can understand
having them there when a specific scale is specified (I suppose...) but
when there isn't a specific scale given any application would have to
deal with the variability in the number of digits after the decimal
point anyway.

Either 1.0 and 1.00 are
the same thing (and thus should be displayed the same way), or they
aren't (in which case they should be treated distinctly in, eg, a
'select distinct' clause).

Consistency has never been SQL's strong point ;-)

Indeed. I think my suggestion above would be at least a half-step
towards consistancy without breaking things. I *think* this would also
mean that we'd always have either a fixed number of decimal places
(scale specified), or no trailing zeros.

This would, in fact, be awfully nice for me since I wouldn't have to
deal with things like:

78.4
2.3625
4.1666675000000000
16.6666675000000000
0.83333250000000000000

where I then get to do some *very* ugly magic to find the records with
the extra zeros off on the end and truncate them (think cast to text and
then use a regexp, not fun at all...). Unfortunately round(a,b) <> a
doesn't work so hot in these cases where you do want the precision just
not the extra zeros off on the end.

Thanks,

Stephen

#11

Noname

mark@mark.mielke.cc

over 19 years ago

In reply to: Tom Lane (#6)

Re: [BUGS] numerics lose scale and precision in views of unions

On Wed, Aug 09, 2006 at 11:35:48PM -0400, Tom Lane wrote:

Stephen Frost <sfrost@snowman.net> writes:

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Yeah, because numeric_cmp says that 1.0 and 1.00 are equal (what else
could it say? "less" and "greater" are surely wrong). So you need to

It could say "not equal" pretty reasonably as the scale is
different.

Nope, there are exactly three options: equal, less, greater.
btree doesn't understand anything else.

Mathematically, 1.0 = 0.9500... -> 1.05000...

In theory, B-Tree could be fine with this. As long as the operators
for =, <>, <, and > are made to consistently understand this principle.
For example:

1.0 = 0.95
1.0 = 1.05
1.0 <> 0.94
1.0 <> 1.06
0.94 < 1.0
1.06 > 1.0

I'm not sure that it warrants being changed, though. Storing mixed
precision values in a column and expecting the database to understand
rounding rules does not appeal to me. Too magical.

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

#12

Noname

mark@mark.mielke.cc

over 19 years ago

In reply to: Andrew - Supernews (#7)

Re: [BUGS] numerics lose scale and precision in views of unions

On Thu, Aug 10, 2006 at 03:40:11AM -0000, Andrew - Supernews wrote:

On 2006-08-10, Stephen Frost <sfrost@snowman.net> wrote:

Yeah, because numeric_cmp says that 1.0 and 1.00 are equal (what else
could it say? "less" and "greater" are surely wrong). So you need to

It could say "not equal" pretty reasonably as the scale is
different.

The comparison function must be trichotomous, that is the possible results
are "equal", "greater", and "less". There is no other "not equal" result.

And it is wrong - because they are equal. But 1.0 is also equal to 1.01.
We shouldn't go there. :-)

Cheers,
mark

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

#13

Tom Lane

tgl@sss.pgh.pa.us

over 19 years ago

In reply to: Stephen Frost (#10)

Re: [BUGS] numerics lose scale and precision in views of unions

Stephen Frost <sfrost@snowman.net> writes:

This was using just a straight-up 'numeric' data type though. Perhaps
for that case we could drop the unnecessary zeros?

That would make numeric useless for the common scientific/engineering
usage where you write the number of decimal places you think are
significant in your measurement. In that usage, "1.0" and "1.000"
do have different meanings.

regards, tom lane

#14

Noname

mark@mark.mielke.cc

over 19 years ago

In reply to: Noname (#11)

Re: [BUGS] numerics lose scale and precision in views of unions

On Thu, Aug 10, 2006 at 09:20:09AM -0400, mark@mark.mielke.cc wrote:

Mathematically, 1.0 = 0.9500... -> 1.05000...

In theory, B-Tree could be fine with this. As long as the operators
for =, <>, <, and > are made to consistently understand this principle.
For example:

1.0 = 0.95
1.0 = 1.05
1.0 <> 0.94
1.0 <> 1.06
0.94 < 1.0
1.06 > 1.0

I'm not sure that it warrants being changed, though. Storing mixed
precision values in a column and expecting the database to understand
rounding rules does not appeal to me. Too magical.

I take this back. It's magical enough to be impossible.

If 1.0 is inserted, and then 1.05, it could consider them 'equal' and
put them together. If 1.1 and 1.14 were inserted, they could be 'equal'
and put together. Then an insertion of '1' would have to re-group them
all together, making the B-Tree ineffective.

So - taking back what I say above.

Cheers,
mark

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/