numeric/decimal docs bug?
In datatype.sgml:
The type numeric can store numbers of practically
unlimited size and precision,...
I think this is simply wrong since the current implementation of
numeric and decimal data types limit the precision up to 1000.
#define NUMERIC_MAX_PRECISION 1000
Comments?
--
Tatsuo Ishii
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
In datatype.sgml:
The type numeric can store numbers of practically
unlimited size and precision,...
I think this is simply wrong since the current implementation of
numeric and decimal data types limit the precision up to 1000.
#define NUMERIC_MAX_PRECISION 1000
I was thinking just the other day that there's no reason for that
limit to be so low. Jan, couldn't we bump it up to 8 or 16K or so?
(Not that I'd care to do heavy arithmetic on such numbers, or that
I believe there's any practical use for them ... but why set the
limit lower than we must?)
regards, tom lane
Are there other cases where the pgsql docs may say unlimited where it might
not be?
I remember when the FAQ stated unlimited columns per table (it's been
corrected now so that's good).
Not asking for every limit to be documented but while documentation is
written if one does not yet know (or remember) the actual (or even
rough/estimated) limit it's better to skip it for later than to falsely say
"unlimited". Better to have no signal than noise in this case.
Regards,
Link.
At 11:14 PM 02-03-2002 +0900, Tatsuo Ishii wrote:
Show quoted text
In datatype.sgml:
The type numeric can store numbers of practically
unlimited size and precision,...I think this is simply wrong since the current implementation of
numeric and decimal data types limit the precision up to 1000.#define NUMERIC_MAX_PRECISION 1000
Comments?
Tom Lane writes:
#define NUMERIC_MAX_PRECISION 1000
I was thinking just the other day that there's no reason for that
limit to be so low. Jan, couldn't we bump it up to 8 or 16K or so?
Why have an arbitrary limit at all? Set it to INT_MAX, or whatever the
index variables have for a type.
--
Peter Eisentraut peter_e@gmx.net
Peter Eisentraut <peter_e@gmx.net> writes:
Tom Lane writes:
#define NUMERIC_MAX_PRECISION 1000I was thinking just the other day that there's no reason for that
limit to be so low. Jan, couldn't we bump it up to 8 or 16K or so?
Why have an arbitrary limit at all? Set it to INT_MAX,
The hard limit is certainly no more than 64K, since we store these
numbers in half of an atttypmod. In practice I suspect the limit may
be less; Jan would be more likely to remember...
regards, tom lane
Tom Lane wrote:
Peter Eisentraut <peter_e@gmx.net> writes:
Tom Lane writes:
#define NUMERIC_MAX_PRECISION 1000I was thinking just the other day that there's no reason for that
limit to be so low. Jan, couldn't we bump it up to 8 or 16K or so?Why have an arbitrary limit at all? Set it to INT_MAX,
The hard limit is certainly no more than 64K, since we store these
numbers in half of an atttypmod. In practice I suspect the limit may
be less; Jan would be more likely to remember...
It is arbitrary of course. I don't recall completely, have to
dig into the code, but there might be some side effect when
mucking with it.
The NUMERIC code increases the actual internal precision when
doing multiply and divide, what happens a gazillion times
when doing higher functions like trigonometry. I think there
was some connection between the max precision and how high
this internal precision can grow, so increasing the precision
might affect the computational performance of such higher
functions significantly.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #
_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com
Jan Wieck wrote:
The hard limit is certainly no more than 64K, since we store these
numbers in half of an atttypmod. In practice I suspect the limit may
be less; Jan would be more likely to remember...It is arbitrary of course. I don't recall completely, have to
dig into the code, but there might be some side effect when
mucking with it.The NUMERIC code increases the actual internal precision when
doing multiply and divide, what happens a gazillion times
when doing higher functions like trigonometry. I think there
was some connection between the max precision and how high
this internal precision can grow, so increasing the precision
might affect the computational performance of such higher
functions significantly.
Oh, interesting, maybe we should just leave it alone.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian wrote:
Jan Wieck wrote:
The hard limit is certainly no more than 64K, since we store these
numbers in half of an atttypmod. In practice I suspect the limit may
be less; Jan would be more likely to remember...It is arbitrary of course. I don't recall completely, have to
dig into the code, but there might be some side effect when
mucking with it.The NUMERIC code increases the actual internal precision when
doing multiply and divide, what happens a gazillion times
when doing higher functions like trigonometry. I think there
was some connection between the max precision and how high
this internal precision can grow, so increasing the precision
might affect the computational performance of such higher
functions significantly.Oh, interesting, maybe we should just leave it alone.
As said, I have to look at the code. I'm pretty sure that it
currently will not use hundreds of digits internally if you
use only a few digits in your schema. So changing it isn't
that dangerous.
But who's going to write and run a regression test, ensuring
that the new high limit can really be supported. I didn't
even run the numeric_big test lately, which tests with 500
digits precision at least ... and therefore takes some time
(yawn). Increasing the number of digits used you first have
to have some other tool to generate the test data (I
originally used bc(1) with some scripts). Based on that we
still claim that our system deals correctly with up to 1,000
digits precision.
I don't like the idea of bumping up that number to some
higher nonsense, claiming we support 32K digits precision on
exact numeric, and noone ever tested if natural log really
returns it's result in that precision instead of a 30,000
digit precise approximation.
I missed some of the discussion, because I considered the
1,000 digits already beeing complete nonsense and dropped the
thread. So could someone please enlighten me what the real
reason for increasing our precision is? AFAIR it had
something to do with the docs. If it's just because the docs
and the code aren't in sync, I'd vote for changing the docs.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #
_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com
Jan Wieck wrote:
The hard limit is certainly no more than 64K, since we store these
numbers in half of an atttypmod. In practice I suspect the limit may
be less; Jan would be more likely to remember...It is arbitrary of course. I don't recall completely, have to
dig into the code, but there might be some side effect when
mucking with it.The NUMERIC code increases the actual internal precision when
doing multiply and divide, what happens a gazillion times
when doing higher functions like trigonometry. I think there
was some connection between the max precision and how high
this internal precision can grow, so increasing the precision
might affect the computational performance of such higher
functions significantly.Oh, interesting, maybe we should just leave it alone.
So are we going to just fix the docs?
--
Tatsuo Ishii
Jan Wieck wrote:
Bruce Momjian wrote:
Jan Wieck wrote:
The hard limit is certainly no more than 64K, since we store these
numbers in half of an atttypmod. In practice I suspect the limit may
be less; Jan would be more likely to remember...It is arbitrary of course. I don't recall completely, have to
dig into the code, but there might be some side effect when
mucking with it.The NUMERIC code increases the actual internal precision when
doing multiply and divide, what happens a gazillion times
when doing higher functions like trigonometry. I think there
was some connection between the max precision and how high
this internal precision can grow, so increasing the precision
might affect the computational performance of such higher
functions significantly.Oh, interesting, maybe we should just leave it alone.
As said, I have to look at the code. I'm pretty sure that it
currently will not use hundreds of digits internally if you
use only a few digits in your schema. So changing it isn't
that dangerous.But who's going to write and run a regression test, ensuring
that the new high limit can really be supported. I didn't
even run the numeric_big test lately, which tests with 500
digits precision at least ... and therefore takes some time
(yawn). Increasing the number of digits used you first have
to have some other tool to generate the test data (I
originally used bc(1) with some scripts). Based on that we
still claim that our system deals correctly with up to 1,000
digits precision.I don't like the idea of bumping up that number to some
higher nonsense, claiming we support 32K digits precision on
exact numeric, and noone ever tested if natural log really
returns it's result in that precision instead of a 30,000
digit precise approximation.I missed some of the discussion, because I considered the
1,000 digits already beeing complete nonsense and dropped the
thread. So could someone please enlighten me what the real
reason for increasing our precision is? AFAIR it had
something to do with the docs. If it's just because the docs
and the code aren't in sync, I'd vote for changing the docs.
I have done a little more research on this. If you create a numeric
with no precision:
CREATE TABLE test (x numeric);
You can insert numerics that are greater in length that 1000 digits:
INSERT INTO test values ('1111(continues 1010 times)');
You can even do computations on it:
SELECT x+1 FROM test;
1000 is pretty arbitrary. If we can handle 1000, I can't see how larger
values somehow could fail.
Also, the numeric regression tests takes much longer than the other
tests. I don't see why a test of that length is required, compared to
the other tests. Probably time to pair it back a little.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
...
Also, the numeric regression tests takes much longer than the other
tests. I don't see why a test of that length is required, compared to
the other tests. Probably time to pair it back a little.
The numeric types are inherently slow. You might look at what effect you
can achieve by restructuring that regression test to more closely
resemble the other tests. In particular, it defines several source
tables, each one of which containing similar initial values. And it
defines a results table, into which intermediate results are placed,
which are then immediately queried for display and comparison to obtain
a test result. If handling the values is slow, we could certainly remove
these intermediate steps and still get most of the test coverage.
On another related topic:
I've been wanting to ask: we have in a few cases moved aggregate
calculations from small, fast data types to using numeric as the
accumulator. It would be nice imho to allow, say, an int8 accumulator
for an int4 data type, rather than requiring numeric.
But not all platforms (I assume) have an int8 data type. So we would
need to be able to fall back to numeric for those platforms which need
to use it. What would it take to make some of the catalogs configurable
or sensitive to configuration results?
- Thomas
Bruce Momjian wrote:
Jan Wieck wrote:
I missed some of the discussion, because I considered the
1,000 digits already beeing complete nonsense and dropped the
thread. So could someone please enlighten me what the real
reason for increasing our precision is? AFAIR it had
something to do with the docs. If it's just because the docs
and the code aren't in sync, I'd vote for changing the docs.I have done a little more research on this. If you create a numeric
with no precision:CREATE TABLE test (x numeric);
You can insert numerics that are greater in length that 1000 digits:
INSERT INTO test values ('1111(continues 1010 times)');
You can even do computations on it:
SELECT x+1 FROM test;
1000 is pretty arbitrary. If we can handle 1000, I can't see how larger
values somehow could fail.
And I can't see what more than 1,000 digits would be good
for. Bruce, your research is neat, but IMHO wasted time.
Why do we need to change it now? Is the more important issue
(doing the internal storage representation in base 10,000,
done yet? If not, we can open up for unlimited precision at
that time.
Please, adjust the docs for now, drop the issue and let's do
something useful.
Also, the numeric regression tests takes much longer than the other
tests. I don't see why a test of that length is required, compared to
the other tests. Probably time to pair it back a little.
What exactly do you mean with "pair it back"? Shrinking the
precision of the test or reducing it's coverage of
functionality?
For the former, it only uses 10 of the possible 1,000 digits
after the decimal point. Run the numeric_big test (which
uses 800) at least once and you'll see what kind of
difference precision makes.
And on functionality, it is absolutely insufficient for
numerical functionality that has possible carry, rounding
etc. issues, to check a function just for one single known
value, and if it computes that result correctly, consider it
OK for everything.
I thought the actual test is sloppy already ... but it's
still too much for you ... hmmmm.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #
Thomas Lockhart <lockhart@fourpalms.org> writes:
I've been wanting to ask: we have in a few cases moved aggregate
calculations from small, fast data types to using numeric as the
accumulator.
Which ones are you concerned about? As of 7.2, the only ones that use
numeric accumulators for non-numeric input types are
aggname | basetype | aggtransfn | transtype
----------+-------------+---------------------+-------------
avg | int8 | int8_accum | _numeric
sum | int8 | int8_sum | numeric
stddev | int2 | int2_accum | _numeric
stddev | int4 | int4_accum | _numeric
stddev | int8 | int8_accum | _numeric
variance | int2 | int2_accum | _numeric
variance | int4 | int4_accum | _numeric
variance | int8 | int8_accum | _numeric
All of these seem to have good precision/range arguments for using
numeric accumulators, or to be enough off the beaten track that it's
not worth much angst to optimize them.
regards, tom lane
Which ones are you concerned about? As of 7.2, the only ones that use
numeric accumulators for non-numeric input types are
...
OK, I did imply that I've been wanting to ask this for some time. I
should have asked during the 7.1 era, when this was true for more cases.
:)
All of these seem to have good precision/range arguments for using
numeric accumulators, or to be enough off the beaten track that it's
not worth much angst to optimize them.
Well, they *are* on the beaten track for someone, just not you! ;)
I'd think that things like stddev might be OK with 52 bits of
accumulation, so could be done with doubles. Were they implemented that
way at one time? Do we have a need to provide precision greater than
that, or to guard against the (unlikely) case of having so many values
that a double-based accumulator overflows its ability to see the next
value?
I'll point out that for the case of accumulating so many integers that
they can't work with a double, the alternative implementation of using
numeric may approach infinite computation time.
But in any case, I can ask the same question, only reversed:
We now have some aggregate functions which use, say, int4 to accumulate
int4 values, if the target platform does *not* support int8. What would
it take to make the catalogs configurable or able to respond to
configuration results so that, for example, platforms without int8
support could instead use numeric or double values as a substitute?
- Thomas
Thomas Lockhart <lockhart@fourpalms.org> writes:
All of these seem to have good precision/range arguments for using
numeric accumulators, or to be enough off the beaten track that it's
not worth much angst to optimize them.
Well, they *are* on the beaten track for someone, just not you! ;)
I'd think that things like stddev might be OK with 52 bits of
accumulation, so could be done with doubles.
ISTM that people who are willing to have it done in a double can simply
write stddev(x::float8). Of course you will rejoin that if they want
it done in a numeric, they can write stddev(x::numeric) ... but since
we are talking about exact inputs, I would prefer that the default
behavior be to carry out the summation without loss of precision.
The stddev calculation *is* subject to problems if you don't do the
summation as accurately as you can.
Do we have a need to provide precision greater than
that, or to guard against the (unlikely) case of having so many values
that a double-based accumulator overflows its ability to see the next
value?
You don't see the cancellation problems inherent in N*sum(x^2) - sum(x)^2?
You're likely to be subtracting bignums even with not all that many
input values; they just have to be large input values.
But in any case, I can ask the same question, only reversed:
We now have some aggregate functions which use, say, int4 to accumulate
int4 values, if the target platform does *not* support int8. What would
it take to make the catalogs configurable or able to respond to
configuration results so that, for example, platforms without int8
support could instead use numeric or double values as a substitute?
Haven't thought hard about it. I will say that I don't like the idea
of changing the declared output type of the aggregates across platforms.
Changing the internal implementation (ie, transtype) would be acceptable
--- but I doubt it's worth the trouble. In most other arguments that
touch on this point, I seem to be one of the few holdouts for insisting
that we worry about int8-less platforms anymore at all ;-). For those
few old platforms, the 7.2 behavior of avg(int) and sum(int) is no worse
than it was for everyone in all pre-7.1 versions; I am not excited about
expending significant effort to make it better.
regards, tom lane
Jan Wieck wrote:
Bruce Momjian wrote:
Jan Wieck wrote:
I missed some of the discussion, because I considered the
1,000 digits already beeing complete nonsense and dropped the
thread. So could someone please enlighten me what the real
reason for increasing our precision is? AFAIR it had
something to do with the docs. If it's just because the docs
and the code aren't in sync, I'd vote for changing the docs.I have done a little more research on this. If you create a numeric
with no precision:CREATE TABLE test (x numeric);
You can insert numerics that are greater in length that 1000 digits:
INSERT INTO test values ('1111(continues 1010 times)');
You can even do computations on it:
SELECT x+1 FROM test;
1000 is pretty arbitrary. If we can handle 1000, I can't see how larger
values somehow could fail.And I can't see what more than 1,000 digits would be good
for. Bruce, your research is neat, but IMHO wasted time.Why do we need to change it now? Is the more important issue
(doing the internal storage representation in base 10,000,
done yet? If not, we can open up for unlimited precision at
that time.
I certainly would like the 10,000 change done, but few of us are
capable of doing it. :-(
Please, adjust the docs for now, drop the issue and let's do
something useful.
Thats how I got started. The problem is that the limit isn't 1,000.
Looking at NUMERIC_MAX_PRECISION, I see it used in gram.y to prevent
creation of NUMERIC columns that exceed the maximum length, and I see it
used in numeric.c to prevent exponients that exceed the maximum length,
but I don't see other cases that would actually enforce the limit in
INSERT and other cases.
Remember how people complained when I said "unlimited" in the FAQ for
some items that actually had a limit. Well, in this case, we have a
limit that is only enforced in some places. I would like to see this
cleared up on way or the other so the docs would be correct.
Jan, any chance on doing the 10,000 change in your spare time? ;-)
Also, the numeric regression tests takes much longer than the other
tests. I don't see why a test of that length is required, compared to
the other tests. Probably time to pair it back a little.What exactly do you mean with "pair it back"? Shrinking the
precision of the test or reducing it's coverage of
functionality?For the former, it only uses 10 of the possible 1,000 digits
after the decimal point. Run the numeric_big test (which
uses 800) at least once and you'll see what kind of
difference precision makes.And on functionality, it is absolutely insufficient for
numerical functionality that has possible carry, rounding
etc. issues, to check a function just for one single known
value, and if it computes that result correctly, consider it
OK for everything.I thought the actual test is sloppy already ... but it's
still too much for you ... hmmmm.
Well, our regression tests are not intended to test every possible
NUMERIC combination, just a resonable subset. As it is now, I often
think the regression tests have hung because numeric takes so much
longer than any of the other tests. We have had this code in there for
a while now, and it is not OS-specific stuff, so I think we should just
pair it back so we know it is working. We already have bignumeric for a
larger test.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian wrote:
Well, our regression tests are not intended to test every possible
NUMERIC combination, just a resonable subset. As it is now, I often
think the regression tests have hung because numeric takes so much
longer than any of the other tests. We have had this code in there for
a while now, and it is not OS-specific stuff, so I think we should just
pair it back so we know it is working. We already have bignumeric for a
larger test.
Bruce,
have you even taken one single look at the test? It does 100
of each add, sub, mul and div, these are the fast operations
that don't really take much time.
Then it does 10 of each sqrt(), ln(), log10(), pow10() and 10
combined power(ln()). These are the time consuming
operations, working iterative alas Newton, Taylor and
McLaurin. All that is done with 10 digits after the decimal
point only!
So again, WHAT exactly do you mean with "pair it back"?
Sorry, I don't get it. Do you want to remove the entire test?
Reduce it to an INSERT, one SELECT (so that we know the
input- and output functions work) and the four basic
operators used once? Well, that's a hell of a test, makes me
really feel comfortable. Like the mechanic kicking against
the tire then saying "I ain't see noth'n wrong with the
brakes, ya sure can make a trip in the mountains". Yeah, at
least once!
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #
Jan Wieck wrote:
Bruce Momjian wrote:
Well, our regression tests are not intended to test every possible
NUMERIC combination, just a resonable subset. As it is now, I often
think the regression tests have hung because numeric takes so much
longer than any of the other tests. We have had this code in there for
a while now, and it is not OS-specific stuff, so I think we should just
pair it back so we know it is working. We already have bignumeric for a
larger test.Bruce,
have you even taken one single look at the test? It does 100
of each add, sub, mul and div, these are the fast operations
that don't really take much time.Then it does 10 of each sqrt(), ln(), log10(), pow10() and 10
combined power(ln()). These are the time consuming
operations, working iterative alas Newton, Taylor and
McLaurin. All that is done with 10 digits after the decimal
point only!So again, WHAT exactly do you mean with "pair it back"?
Sorry, I don't get it. Do you want to remove the entire test?
Reduce it to an INSERT, one SELECT (so that we know the
input- and output functions work) and the four basic
operators used once? Well, that's a hell of a test, makes me
really feel comfortable. Like the mechanic kicking against
the tire then saying "I ain't see noth'n wrong with the
brakes, ya sure can make a trip in the mountains". Yeah, at
least once!
Jan, regression is not a test of the level a developer would use to make
sure his code works. It is merely to make sure the install works on a
limited number of cases. Having seen zero reports of any numeric
failures since we installed it, and seeing it takes >10x times longer
than the other tests, I think it should be paired back. Do we really
need 10 tests of each complex function? I think one would do the trick.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian wrote:
Jan Wieck wrote:
Bruce Momjian wrote:
Well, our regression tests are not intended to test every possible
NUMERIC combination, just a resonable subset. As it is now, I often
think the regression tests have hung because numeric takes so much
longer than any of the other tests. We have had this code in there for
a while now, and it is not OS-specific stuff, so I think we should just
pair it back so we know it is working. We already have bignumeric for a
larger test.Bruce,
have you even taken one single look at the test? It does 100
of each add, sub, mul and div, these are the fast operations
that don't really take much time.Then it does 10 of each sqrt(), ln(), log10(), pow10() and 10
combined power(ln()). These are the time consuming
operations, working iterative alas Newton, Taylor and
McLaurin. All that is done with 10 digits after the decimal
point only!So again, WHAT exactly do you mean with "pair it back"?
Sorry, I don't get it. Do you want to remove the entire test?
Reduce it to an INSERT, one SELECT (so that we know the
input- and output functions work) and the four basic
operators used once? Well, that's a hell of a test, makes me
really feel comfortable. Like the mechanic kicking against
the tire then saying "I ain't see noth'n wrong with the
brakes, ya sure can make a trip in the mountains". Yeah, at
least once!Jan, regression is not a test of the level a developer would use to make
sure his code works. It is merely to make sure the install works on a
limited number of cases. Having seen zero reports of any numeric
failures since we installed it, and seeing it takes >10x times longer
than the other tests, I think it should be paired back. Do we really
need 10 tests of each complex function? I think one would do the trick.
You forgot who wrote that code originally. I feel alot
better WITH the tests in place :-)
And if it's merely to make sure the install worked, man who
is doing source installations these days and runs the
regression tests anyway? Most people throw in a RPM or the
like, only a few serious users install from sources, and only
a fistfull of them then runs regression.
Aren't it mostly developers and distro-maintainers who use
that directory? I think your entire point isn't just weak,
IMNSVHO you don't really have a point.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #
Jan Wieck wrote:
You forgot who wrote that code originally. I feel alot
better WITH the tests in place :-)And if it's merely to make sure the install worked, man who
is doing source installations these days and runs the
regression tests anyway? Most people throw in a RPM or the
like, only a few serious users install from sources, and only
a fistfull of them then runs regression.Aren't it mostly developers and distro-maintainers who use
that directory? I think your entire point isn't just weak,
IMNSVHO you don't really have a point.
It is my understanding that RPM does run that test. My main issue is
why does numeric have to be so much larger than the other tests? I have
not heard that explained.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
...
Jan, regression is not a test of the level a developer would use to make
sure his code works. It is merely to make sure the install works on a
limited number of cases. Having seen zero reports of any numeric
failures since we installed it, and seeing it takes >10x times longer
than the other tests, I think it should be paired back. Do we really
need 10 tests of each complex function? I think one would do the trick.
Whoops. We rely on the regression tests to make sure that previous
behaviors continue to be valid behaviors. Another use is to verify that
a particular installation can reproduce this same test. But regression
testing is a fundamental and essential development tool, precisely
because it covers cases outside the range you might be thinking of
testing as you do development.
As a group, we might tend to underestimate the value of this, which
could be evidenced by the fact that our regression test suite has not
grown substantially more than it has over the years. It could have many
more tests within each module, and bug reports *could* be fed back into
regression updates to make sure that failures do not reappear.
All imho of course ;)
- Thomas
...
It is my understanding that RPM does run that test. My main issue is
why does numeric have to be so much larger than the other tests? I have
not heard that explained.
afaict it is not larger. It *does* take more time, but the number of
tests is relatively small, or at least compatible with the number of
tests which appear, or should appear, in other tests of data types
covering a large problem space (e.g. date/time).
It does illustrate that BCD-like encodings are expensive, and that
machine-supported math is usually a win. If it is a big deal, jump in
and widen the internal math operations!
- Thomas
Bruce Momjian wrote:
Jan Wieck wrote:
You forgot who wrote that code originally. I feel alot
better WITH the tests in place :-)And if it's merely to make sure the install worked, man who
is doing source installations these days and runs the
regression tests anyway? Most people throw in a RPM or the
like, only a few serious users install from sources, and only
a fistfull of them then runs regression.Aren't it mostly developers and distro-maintainers who use
that directory? I think your entire point isn't just weak,
IMNSVHO you don't really have a point.It is my understanding that RPM does run that test. My main issue is
why does numeric have to be so much larger than the other tests? I have
not heard that explained.
Well, I heard Thomas commenting that it's horribly slow
implemented (or so, don't recall his exact wording). But
he's right.
I think the same test done with float8 would run in less than
a tenth of that time. This is only an explanation "why it
takes so long"? It is no argument pro or con the test itself.
I think I made my point clear enough, that I consider calling
these functions just once is plain sloppy. But that's just
my opinion. What do others think?
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #
Jan Wieck <janwieck@yahoo.com> writes:
I think I made my point clear enough, that I consider calling
these functions just once is plain sloppy. But that's just
my opinion. What do others think?
I don't have a problem with the current length of the numeric test.
The original form of it (now shoved over to bigtests) did seem
excessively slow to me ... but I can live with this one.
I do agree that someone ought to reimplement numeric using base10k
arithmetic ... but it's not bugging me so much that I'm likely
to get around to it anytime soon myself ...
Bruce, why is there no TODO item for that project?
regards, tom lane
Thomas Lockhart wrote:
...
It is my understanding that RPM does run that test. My main issue is
why does numeric have to be so much larger than the other tests? I have
not heard that explained.afaict it is not larger. It *does* take more time, but the number of
tests is relatively small, or at least compatible with the number of
tests which appear, or should appear, in other tests of data types
covering a large problem space (e.g. date/time).It does illustrate that BCD-like encodings are expensive, and that
machine-supported math is usually a win. If it is a big deal, jump in
and widen the internal math operations!
OK, as long as everyone else is fine with the tests, we can leave it
alone. The concept that the number of tests is realisitic, and that
they are just slower than other data types, makes sense.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Tom Lane wrote:
Jan Wieck <janwieck@yahoo.com> writes:
I think I made my point clear enough, that I consider calling
these functions just once is plain sloppy. But that's just
my opinion. What do others think?I don't have a problem with the current length of the numeric test.
The original form of it (now shoved over to bigtests) did seem
excessively slow to me ... but I can live with this one.I do agree that someone ought to reimplement numeric using base10k
arithmetic ... but it's not bugging me so much that I'm likely
to get around to it anytime soon myself ...Bruce, why is there no TODO item for that project?
Not sure. I was aware of it for a while. Added:
* Change NUMERIC data type to use base 10,000 internally
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Tatsuo Ishii wrote:
Jan Wieck wrote:
The hard limit is certainly no more than 64K, since we store these
numbers in half of an atttypmod. In practice I suspect the limit may
be less; Jan would be more likely to remember...It is arbitrary of course. I don't recall completely, have to
dig into the code, but there might be some side effect when
mucking with it.The NUMERIC code increases the actual internal precision when
doing multiply and divide, what happens a gazillion times
when doing higher functions like trigonometry. I think there
was some connection between the max precision and how high
this internal precision can grow, so increasing the precision
might affect the computational performance of such higher
functions significantly.Oh, interesting, maybe we should just leave it alone.
So are we going to just fix the docs?
OK, I have updated the docs. Patch attached.
I have also added this to the TODO list:
* Change NUMERIC to enforce the maximum precision, and increase it
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Attachments:
/bjm/difftext/plainDownload
Index: datatype.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/datatype.sgml,v
retrieving revision 1.87
diff -c -r1.87 datatype.sgml
*** datatype.sgml 3 Apr 2002 05:39:27 -0000 1.87
--- datatype.sgml 13 Apr 2002 01:26:54 -0000
***************
*** 506,518 ****
<title>Arbitrary Precision Numbers</title>
<para>
! The type <type>numeric</type> can store numbers of practically
! unlimited size and precision, while being able to store all
! numbers and carry out all calculations exactly. It is especially
! recommended for storing monetary amounts and other quantities
! where exactness is required. However, the <type>numeric</type>
! type is very slow compared to the floating-point types described
! in the next section.
</para>
<para>
--- 506,517 ----
<title>Arbitrary Precision Numbers</title>
<para>
! The type <type>numeric</type> can store numbers with up to 1,000
! digits of precision and perform calculations exactly. It is
! especially recommended for storing monetary amounts and other
! quantities where exactness is required. However, the
! <type>numeric</type> type is very slow compared to the
! floating-point types described in the next section.
</para>
<para>
Jan Wieck wrote:
Oh, interesting, maybe we should just leave it alone.
As said, I have to look at the code. I'm pretty sure that it
currently will not use hundreds of digits internally if you
use only a few digits in your schema. So changing it isn't
that dangerous.But who's going to write and run a regression test, ensuring
that the new high limit can really be supported. I didn't
even run the numeric_big test lately, which tests with 500
digits precision at least ... and therefore takes some time
(yawn). Increasing the number of digits used you first have
to have some other tool to generate the test data (I
originally used bc(1) with some scripts). Based on that we
still claim that our system deals correctly with up to 1,000
digits precision.I don't like the idea of bumping up that number to some
higher nonsense, claiming we support 32K digits precision on
exact numeric, and noone ever tested if natural log really
returns it's result in that precision instead of a 30,000
digit precise approximation.I missed some of the discussion, because I considered the
1,000 digits already beeing complete nonsense and dropped the
thread. So could someone please enlighten me what the real
reason for increasing our precision is? AFAIR it had
something to do with the docs. If it's just because the docs
and the code aren't in sync, I'd vote for changing the docs.
Jan, if the numeric code works on 100 or 500 digits, could it break with
10,000 digits. Is there a reason to believe longer digits could cause
problems not present in shorter tests?
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Jan, regression is not a test of the level a developer would use to make
sure his code works. It is merely to make sure the install works on a
limited number of cases.
News to me! If anything, I don't think a lot of the current regression
tests are comprehensive enough! For the SET/DROP NOT NULL patch I
submitted, I included a regression test that tests every one of the
preconditions in my code - that way if anything gets changed or broken,
we'll find out very quickly.
I personally don't have a problem with the time taken to regression test -
and I think that trimming the numeric test _might_ be a false economy. Who
knows what's going to turn around and bite us oneday?
Having seen zero reports of any numeric
failures since we installed it, and seeing it takes >10x times longer
than the other tests, I think it should be paired back. Do we really
need 10 tests of each complex function? I think one would do the trick.
A good point tho, I didn't submit a regression test that tries to ALTER 3
different non-existent tables to check for failures - one test was enough...
Chris
Christopher Kings-Lynne wrote:
Having seen zero reports of any numeric
failures since we installed it, and seeing it takes >10x times longer
than the other tests, I think it should be paired back. Do we really
need 10 tests of each complex function? I think one would do the trick.A good point tho, I didn't submit a regression test that tries to ALTER 3
different non-existent tables to check for failures - one test was enough...
That was my point. Is there much value in testing each function ten
times. Anyway, seems only I care so I will drop it.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian wrote:
Christopher Kings-Lynne wrote:
Having seen zero reports of any numeric
failures since we installed it, and seeing it takes >10x times longer
than the other tests, I think it should be paired back. Do we really
need 10 tests of each complex function? I think one would do the trick.A good point tho, I didn't submit a regression test that tries to ALTER 3
different non-existent tables to check for failures - one test was enough...That was my point. Is there much value in testing each function ten
times. Anyway, seems only I care so I will drop it.
Yes there is value in it. There is conditional code in it
that depends on the values. I wrote that before (I said there
are possible carry, rounding etc. issues), and it looked to
me that you simply ignored these facts.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #