regression bigtest needs very long time

Started by SAKAIDAover 26 years ago27 messages

sakaida@psn.co.jp

over 26 years ago

Hi,

I always execute 'regression test' and 'regression bigtest'
when PostgreSQL was enhanced. However,'regression bigtest' needs
the very long processing time in PostgreSQL-6.5. In my computer,
it is taken of about 1 hour.

The reason why the processing time is long is because 1000
digits are calculated using the 'LOG' and 'POWER' function.

Actual statement in "postgresql-6.5/src/test/regress/sql/
numeric_big.sql" is the following.

INSERT INTO num_result SELECT id, 0, POWER('10'::numeric,
LN(ABS(round(val,1000)))) FROM num_data WHERE val != '0.0';

But, the processing ends for a few minutes when this
"LN(ABS(round(val,1000)))" is made to be "LN(ABS(round(val,30)))".

INSERT or SELECT must be tested using the value of 1000 digits,
because to handle NUMERIC and DECIMAL data type to 1000 digits is
possible.

However, I think that there is no necessity of calculating the
value of 1000 digits in the 'LOG' function.

Comments?

--
Regards.

SAKAIDA Masaaki <sakaida@psn.co.jp>
Personal Software, Inc. Osaka Japan

Bruce Momjian

maillist@candle.pha.pa.us

over 26 years ago

In reply to: SAKAIDA (#1)

Re: [HACKERS] regression bigtest needs very long time

Hi,

I always execute 'regression test' and 'regression bigtest'
when PostgreSQL was enhanced. However,'regression bigtest' needs
the very long processing time in PostgreSQL-6.5. In my computer,
it is taken of about 1 hour.

The reason why the processing time is long is because 1000
digits are calculated using the 'LOG' and 'POWER' function.

Actual statement in "postgresql-6.5/src/test/regress/sql/
numeric_big.sql" is the following.

INSERT INTO num_result SELECT id, 0, POWER('10'::numeric,
LN(ABS(round(val,1000)))) FROM num_data WHERE val != '0.0';

But, the processing ends for a few minutes when this
"LN(ABS(round(val,1000)))" is made to be "LN(ABS(round(val,30)))".

INSERT or SELECT must be tested using the value of 1000 digits,
because to handle NUMERIC and DECIMAL data type to 1000 digits is
possible.

However, I think that there is no necessity of calculating the
value of 1000 digits in the 'LOG' function.

numeric/decimal is a new type for this release. I assume this extra
processing will be removed once we are sure it works.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

SAKAIDA

sakaida@psn.co.jp

over 26 years ago

In reply to: Bruce Momjian (#2)

Re: [HACKERS] regression bigtest needs very long time

Bruce Momjian <maillist@candle.pha.pa.us> wrote:

SAKAIDA wrote:
However, I think that there is no necessity of calculating the
value of 1000 digits in the 'LOG' function.

numeric/decimal is a new type for this release. I assume this extra
processing will be removed once we are sure it works.

Thank you for your reply. At the next version, I hope that
'regression test/bigtest' ends in the short time.

The patch as an example which I considered is the following.
If this patch is applied, the processing which requires 1.5 hours
in the current ends for 5 minutes.

--
Regards.

SAKAIDA Masaaki <sakaida@psn.co.jp>
Osaka, Japan

*** postgresql-6.5/src/test/regress/sql/numeric.sql.orig	Fri Jun 11 02:49:31 1999
--- postgresql-6.5/src/test/regress/sql/numeric.sql	Wed Jun 16 13:46:41 1999
***************
*** 626,632 ****
  -- * POWER(10, LN(value)) check
  -- ******************************
  DELETE FROM num_result;
! INSERT INTO num_result SELECT id, 0, POWER('10'::numeric, LN(ABS(round(val,300))))
      FROM num_data
      WHERE val != '0.0';
  SELECT t1.id1, t1.result, t2.expected
--- 626,632 ----
  -- * POWER(10, LN(value)) check
  -- ******************************
  DELETE FROM num_result;
! INSERT INTO num_result SELECT id, 0, POWER('10'::numeric, LN(ABS(round(val,30))))
      FROM num_data
      WHERE val != '0.0';
  SELECT t1.id1, t1.result, t2.expected

*** postgresql-6.5/src/test/regress/sql/numeric_big.sql.orig	Thu Jun 17 19:22:53 1999
--- postgresql-6.5/src/test/regress/sql/numeric_big.sql	Thu Jun 17 19:27:36 1999
***************
*** 602,608 ****
  -- * Natural logarithm check
  -- ******************************
  DELETE FROM num_result;
! INSERT INTO num_result SELECT id, 0, LN(ABS(val))
      FROM num_data
      WHERE val != '0.0';
  SELECT t1.id1, t1.result, t2.expected
--- 602,608 ----
  -- * Natural logarithm check
  -- ******************************
  DELETE FROM num_result;
! INSERT INTO num_result SELECT id, 0, LN(round(ABS(val),30))
      FROM num_data
      WHERE val != '0.0';
  SELECT t1.id1, t1.result, t2.expected
***************
*** 614,620 ****
  -- * Logarithm base 10 check
  -- ******************************
  DELETE FROM num_result;
! INSERT INTO num_result SELECT id, 0, LOG('10'::numeric, ABS(val))
      FROM num_data
      WHERE val != '0.0';
  SELECT t1.id1, t1.result, t2.expected
--- 614,620 ----
  -- * Logarithm base 10 check
  -- ******************************
  DELETE FROM num_result;
! INSERT INTO num_result SELECT id, 0, LOG('10'::numeric, round(ABS(val),30))
      FROM num_data
      WHERE val != '0.0';
  SELECT t1.id1, t1.result, t2.expected
***************
*** 626,632 ****
  -- * POWER(10, LN(value)) check
  -- ******************************
  DELETE FROM num_result;
! INSERT INTO num_result SELECT id, 0, POWER('10'::numeric, LN(ABS(round(val,1000))))
      FROM num_data
      WHERE val != '0.0';
  SELECT t1.id1, t1.result, t2.expected
--- 626,632 ----
  -- * POWER(10, LN(value)) check
  -- ******************************
  DELETE FROM num_result;
! INSERT INTO num_result SELECT id, 0, POWER('10'::numeric, LN(ABS(round(val,30))))
      FROM num_data
      WHERE val != '0.0';
  SELECT t1.id1, t1.result, t2.expected

Bruce Momjian

maillist@candle.pha.pa.us

over 26 years ago

In reply to: SAKAIDA (#3)

Re: [HACKERS] regression bigtest needs very long time

Bruce Momjian <maillist@candle.pha.pa.us> wrote:

SAKAIDA wrote:
However, I think that there is no necessity of calculating the
value of 1000 digits in the 'LOG' function.

numeric/decimal is a new type for this release. I assume this extra
processing will be removed once we are sure it works.

Thank you for your reply. At the next version, I hope that
'regression test/bigtest' ends in the short time.

The patch as an example which I considered is the following.
If this patch is applied, the processing which requires 1.5 hours
in the current ends for 5 minutes.

Just don't run bigtest. It is only for people who are having trouble
with the new numeric type.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Tom Lane

tgl@sss.pgh.pa.us

over 26 years ago

In reply to: Bruce Momjian (#4)

Re: [HACKERS] regression bigtest needs very long time

Bruce Momjian <maillist@candle.pha.pa.us> writes:

Just don't run bigtest. It is only for people who are having trouble
with the new numeric type.

I don't mind too much that bigtest takes forever --- as you say,
it shouldn't be run except by people who want a thorough test.

But I *am* unhappy that the regular numeric test takes much longer than
all the other regression tests put together. That's an unreasonable
amount of effort spent on one feature, and it gets really annoying for
someone like me who's in the habit of running the regress tests after
any update. Is there anything this test is likely to catch that
wouldn't get caught with a much narrower field width (say 10 digits
instead of 30)?

regards, tom lane

Import Notes

Reply to msg id not found: YourmessageofSat26Jun1999115141-0400199906261551.LAA10079@candle.pha.pa.us | Resolved by subject fallback

Bruce Momjian

maillist@candle.pha.pa.us

over 26 years ago

In reply to: Tom Lane (#5)

Re: [HACKERS] regression bigtest needs very long time

Bruce Momjian <maillist@candle.pha.pa.us> writes:

Just don't run bigtest. It is only for people who are having trouble
with the new numeric type.

I don't mind too much that bigtest takes forever --- as you say,
it shouldn't be run except by people who want a thorough test.

But I *am* unhappy that the regular numeric test takes much longer than
all the other regression tests put together. That's an unreasonable
amount of effort spent on one feature, and it gets really annoying for
someone like me who's in the habit of running the regress tests after
any update. Is there anything this test is likely to catch that
wouldn't get caught with a much narrower field width (say 10 digits
instead of 30)?

Oh, I didn't realize this. We certainly should think about reducing the
time spent on it, though it is kind of lame to be testing numeric in a
precision that is less than the standard int4 type.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Noname

wieck@debis.com

over 26 years ago

In reply to: SAKAIDA (#3)

Re: [HACKERS] regression bigtest needs very long time

SAKAIDA wrote:

Bruce Momjian <maillist@candle.pha.pa.us> wrote:

SAKAIDA wrote:
However, I think that there is no necessity of calculating the
value of 1000 digits in the 'LOG' function.

numeric/decimal is a new type for this release. I assume this extra
processing will be removed once we are sure it works.

Thank you for your reply. At the next version, I hope that
'regression test/bigtest' ends in the short time.

The patch as an example which I considered is the following.
If this patch is applied, the processing which requires 1.5 hours
in the current ends for 5 minutes.

The test was intended to check the internal low level
functions of the NUMERIC datatype against MANY possible
values. That's the reason for the high precision resulting in
this runtime. That was a wanted side effect, not a bug!

Jan

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck@debis.com (Jan Wieck) #

SAKAIDA

sakaida@psn.co.jp

over 26 years ago

In reply to: Bruce Momjian (#6)

Re: [HACKERS] regression bigtest needs very long time

Hi,

Bruce Momjian wrote:
Just don't run bigtest. It is only for people who are having trouble
with the new numeric type.

Tom Lane wrote:
I don't mind too much that bigtest takes forever --- as you say,
it shouldn't be run except by people who want a thorough test.

At the end of regression normal test, the following message
is displayed.

"To run the optional huge test(s) too type 'make bigtest'"

Many users, especialy those who install "PostgreSQL" for the
first time, may type 'make bigtest' and may feel the PostgreSQL
unstable. Because the bigtest outputs no messages for long time
and seems to be no executing at the point of numeric testing.

Therefore, if it is not necessary for the general user to
execute "regression bigtest", I think that the message of
'make bigtest' should be removed or that the message should be
changed like "it takes several hours ....".

But I *am* unhappy that the regular numeric test takes much longer than
all the other regression tests put together. That's an unreasonable
amount of effort spent on one feature, and it gets really annoying for
someone like me who's in the habit of running the regress tests after
any update.

I think so too.

Is there anything this test is likely to catch that
wouldn't get caught with a much narrower field width (say 10 digits
instead of 30)?

Bruce Momjian wrote:
Oh, I didn't realize this. We certainly should think about reducing the
time spent on it, though it is kind of lame to be testing numeric in a
precision that is less than the standard int4 type.

--
Regards.

SAKAIDA Masaaki <sakaida@psn.co.jp>
Osaka, Japan

Noname

wieck@debis.com

over 26 years ago

In reply to: SAKAIDA (#8)

Re: [HACKERS] regression bigtest needs very long time

Bruce Momjian wrote:

Oh, I didn't realize this. We certainly should think about reducing the
time spent on it, though it is kind of lame to be testing numeric in a
precision that is less than the standard int4 type.

We certainly should think about a general speedup of NUMERIC.

Jan

#10

Bruce Momjian

maillist@candle.pha.pa.us

over 26 years ago

In reply to: Noname (#9)

Re: [HACKERS] regression bigtest needs very long time

Bruce Momjian wrote:

Oh, I didn't realize this. We certainly should think about reducing the
time spent on it, though it is kind of lame to be testing numeric in a
precision that is less than the standard int4 type.

We certainly should think about a general speedup of NUMERIC.

How would we do that? I assumed it was already pretty optimized.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#11

Bruce Momjian

maillist@candle.pha.pa.us

over 26 years ago

In reply to: SAKAIDA (#8)

Re: [HACKERS] regression bigtest needs very long time

At the end of regression normal test, the following message
is displayed.

"To run the optional huge test(s) too type 'make bigtest'"

Many users, especialy those who install "PostgreSQL" for the
first time, may type 'make bigtest' and may feel the PostgreSQL
unstable. Because the bigtest outputs no messages for long time
and seems to be no executing at the point of numeric testing.

Therefore, if it is not necessary for the general user to
execute "regression bigtest", I think that the message of
'make bigtest' should be removed or that the message should be
changed like "it takes several hours ....".

Warning added:

These big tests can take over an hour to complete

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#12

Noname

wieck@debis.com

over 26 years ago

In reply to: Bruce Momjian (#10)

Re: [HACKERS] regression bigtest needs very long time

Bruce Momjian wrote:

Oh, I didn't realize this. We certainly should think about reducing the
time spent on it, though it is kind of lame to be testing numeric in a
precision that is less than the standard int4 type.

We certainly should think about a general speedup of NUMERIC.

How would we do that? I assumed it was already pretty optimized.

By reimplementing the entire internals from scratch again :-)

For now the db storage format is something like packed
decimal. Two digits fit into one byte. Sign, scale and
precision are stored in a header. For computations, this gets
unpacked so every digit is stored in one byte and all the
computations are performed on the digit level and base 10.

Computers are good in performing computations in other bases
(hex, octal etc.). And we can assume that any architecture
where PostgreSQL can be installed supports 32 bit integers.
Thus, a good choice for an internal base whould be 10000 and
the digits(10000) stored in small integers.

1. Converting between decimal (base 10) and base 10000 is
relatively simple. One digit(10000) holds 4 digits(10).

2. Computations using a 32 bit integer for carry/borrow are
safe because the biggest result of a one digit(10000)
add/subtract/multiply cannot exceed the 32 bits.

The speedup (I expect) results from the fact that the inner
loops of add, subtract and multiply will then handle 4
decimal digits per cycle instead of one! Doing a

1234.5678 + 2345.6789

then needs 2 internal cycles instead of 8. And

100.123 + 12030.12345

needs 4 cycles instead of 10 (because the decimal point has
the same meaning in base 10000 the last value is stored
internally as short ints 1, 2030, 1234, 5000). This is the
worst case and it still saved 60% of the innermost cycles!

Rounding and checking for overflow will get a little more
difficult, but I think it's worth the efford.

Jan

#13

Bruce Momjian

maillist@candle.pha.pa.us

over 26 years ago

In reply to: Noname (#12)

Re: [HACKERS] regression bigtest needs very long time

needs 4 cycles instead of 10 (because the decimal point has
the same meaning in base 10000 the last value is stored
internally as short ints 1, 2030, 1234, 5000). This is the
worst case and it still saved 60% of the innermost cycles!

Interesting. How do other Db's do it internally? Anyone know?

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#14

Tom Lane

tgl@sss.pgh.pa.us

over 26 years ago

In reply to: Bruce Momjian (#13)

Re: [HACKERS] regression bigtest needs very long time

Bruce Momjian <maillist@candle.pha.pa.us> writes:

needs 4 cycles instead of 10 (because the decimal point has
the same meaning in base 10000 the last value is stored
internally as short ints 1, 2030, 1234, 5000). This is the
worst case and it still saved 60% of the innermost cycles!

Interesting. How do other Db's do it internally? Anyone know?

Probably the same way, if they want to be portable. What Jan is
describing is a *real* standard technique (it's recommended in Knuth).
AFAIK the only other way to speed up a digit-at-a-time implementation
is to drop down to the assembly level and use packed-decimal
instructions ... if your machine has any ...

One thing worth thinking about is whether the storage format shouldn't
be made the same as the calculation format, so as to eliminate the
conversion costs. At four decimal digits per int2, it wouldn't cost
us anything to do so.

regards, tom lane

PS: BTW, Jan, if you do not have a copy of Knuth's volume 2, I'd
definitely recommend laying your hands on it for this project.
His description of multiprecision arithmetic is the best I've seen
anywhere.

If we thought that the math functions (sqrt, exp, etc) for numerics
were really getting used for anything, it might also be fun to try
to put in some better algorithms for them. I've got a copy of Cody
and Waite, which has been the bible for such things for twenty years.
But my guess is that it wouldn't be worth the trouble, except to the
extent that it speeds up the regression tests ;-)

Import Notes

Reply to msg id not found: YourmessageofTue29Jun1999093331-0400199906291333.JAA20593@candle.pha.pa.us | Resolved by subject fallback

#15

Noname

wieck@debis.com

over 26 years ago

In reply to: Tom Lane (#14)

Re: [HACKERS] regression bigtest needs very long time

Tom Lane wrote:

One thing worth thinking about is whether the storage format shouldn't
be made the same as the calculation format, so as to eliminate the
conversion costs. At four decimal digits per int2, it wouldn't cost
us anything to do so.

That's an extra bonus point from the described internal
format.

regards, tom lane

PS: BTW, Jan, if you do not have a copy of Knuth's volume 2, I'd
definitely recommend laying your hands on it for this project.
His description of multiprecision arithmetic is the best I've seen
anywhere.

I don't have so far - thanks for the hint.

If we thought that the math functions (sqrt, exp, etc) for numerics
were really getting used for anything, it might also be fun to try
to put in some better algorithms for them. I've got a copy of Cody
and Waite, which has been the bible for such things for twenty years.
But my guess is that it wouldn't be worth the trouble, except to the
extent that it speeds up the regression tests ;-)

They are based on the standard Taylor/McLaurin definitions
for those functions.

Most times I need trigonometric functions or the like one of
my sliderules still has enough precision because I'm unable
to draw 0.1mm or more precise with a pencil on a paper. YES,
I love to USE sliderules (I have a dozen now, some regular
ones, some circular ones, some pocket sized and one circular
pocket sized one that looks more like a stopwatch than a
sliderule).

Thus, usually the precision of float8 should be more than
enough for those calculations. Making NUMERIC able to handle
these functions in it's extreme precision shouldn't really be
that time critical.

Remember: The lack of mathematical knowledge never shows up
better than in unappropriate precision of numerical
calculations.
C. F. Gauss
(Sorry for the poor translation)

What Gauss (born 1777 and the first who knew how to take the
square root out of negative numbers) meant by that is, it is
stupid to calculate with a precision of 10 or no digits after
the decimal point if you're able to measure with 4 digits.

Jan

#16

Michael Robinson

robinson@netrinsics.com

over 26 years ago

In reply to: Noname (#15)

Re: [HACKERS] regression bigtest needs very long time

wieck@debis.com (Jan Wieck) writes:

We certainly should think about a general speedup of NUMERIC.

How would we do that? I assumed it was already pretty optimized.

The speedup (I expect) results from the fact that the inner
loops of add, subtract and multiply will then handle 4
decimal digits per cycle instead of one! Doing a

1234.5678 + 2345.6789

then needs 2 internal cycles instead of 8. And

100.123 + 12030.12345

needs 4 cycles instead of 10 (because the decimal point has
the same meaning in base 10000 the last value is stored
internally as short ints 1, 2030, 1234, 5000). This is the
worst case and it still saved 60% of the innermost cycles!

The question, though, becomes what percentage of operations on a
NUMERIC field are arithmetic, and what percentage are storage/retrieval.

For databases that simply store/retrieve data, your "optimization" will have
the effect of significantly increasing format conversion overhead. With a
512-byte table, four packed-decimal digits can be converted in two
primitive operations, but base-10000 will require three divisions,
three subtractions, four additions, plus miscellaneous data shuffling.

-Michael Robinson

Import Notes

Resolved by subject fallback

#17

SAKAIDA

sakaida@psn.co.jp

over 26 years ago

In reply to: Noname (#15)

Re: [HACKERS] regression bigtest needs very long time

Hi,

wieck@debis.com (Jan Wieck) wrote:

Tom Lane wrote:

If we thought that the math functions (sqrt, exp, etc) for numerics
were really getting used for anything, it might also be fun to try
to put in some better algorithms for them. I've got a copy of Cody
and Waite, which has been the bible for such things for twenty years.
But my guess is that it wouldn't be worth the trouble, except to the
extent that it speeds up the regression tests ;-)

(snip)

Thus, usually the precision of float8 should be more than
enough for those calculations. Making NUMERIC able to handle
these functions in it's extreme precision shouldn't really be
that time critical.

There are no problem concerning the NUMERIC test of INSERT/
SELECT and add/subtract/multiply/division. The only problem is
the processing time.

One solution which solves this problem is to change the argument
into *float8*. If the following changes are done, the processing
will become high-speed than a previous about 10 times.

File :"src/regress/sql/numeric.sql"
Statement:"INSERT INTO num_result SELECT id, 0,
POWER('10'::numeric,LN(ABS(round(val,300))) ..."

Change: "LN(ABS(round(val,300))))"
to: "LN(float8(ABS(round(va,300))))"

# Another solution is to automatically convert the argument of the
LOG function into double precision data type in the *inside*.
(But, I do not know what kind of effect will be caused by this
solution.)

--
Regards.

SAKAIDA Masaaki <sakaida@psn.co.jp>
Osaka, Japan

#18

Noname

wieck@debis.com

over 26 years ago

In reply to: SAKAIDA (#17)

Re: [HACKERS] regression bigtest needs very long time

SAKAIDA Masaaki wrote:

There are no problem concerning the NUMERIC test of INSERT/
SELECT and add/subtract/multiply/division. The only problem is
the processing time.

One solution which solves this problem is to change the argument
into *float8*. If the following changes are done, the processing
will become high-speed than a previous about 10 times.

File :"src/regress/sql/numeric.sql"
Statement:"INSERT INTO num_result SELECT id, 0,
POWER('10'::numeric,LN(ABS(round(val,300))) ..."

Change: "LN(ABS(round(val,300))))"
to: "LN(float8(ABS(round(va,300))))"

# Another solution is to automatically convert the argument of the
LOG function into double precision data type in the *inside*.
(But, I do not know what kind of effect will be caused by this
solution.)

The complex functions (LN, LOG, EXP, etc.) where added to
NUMERIC for the case someone really needs higher precision
than float8. The numeric_big test simply ensures that
someone really get's the CORRECT result when computing a
logarithm up to hundreds of digits. All the expected results
fed into the tables are computed by scripts using bc(1) with
a precision 200 digits higher than that used in the test
itself. So I'm pretty sure NUMERIC returns a VERY GOOD
approximation if I ask for the square root of 2 with 1000
digits.

One thing in mathematics that is silently forbidden is to
present a result with digits that aren't significant! But it
is the user to decide where the significance of his INPUT
ends, not the database. So it is up to the user to decide
when to loose precision by switching to float.

Jan

#19

Tom Lane

tgl@sss.pgh.pa.us

over 26 years ago

In reply to: Noname (#18)

Re: [HACKERS] regression bigtest needs very long time

Michael Robinson <robinson@netrinsics.com> writes:

The question, though, becomes what percentage of operations on a
NUMERIC field are arithmetic, and what percentage are storage/retrieval.

Good point.

For databases that simply store/retrieve data, your "optimization" will have
the effect of significantly increasing format conversion overhead. With a
512-byte table, four packed-decimal digits can be converted in two
primitive operations, but base-10000 will require three divisions,
three subtractions, four additions, plus miscellaneous data shuffling.

That is something to worry about, but I think the present implementation
unpacks the storage format into calculation format before converting
to text. Getting rid of the unpack step by making storage and calc
formats the same would probably buy enough speed to pay for the extra
conversion arithmetic.

regards, tom lane

Import Notes

Reply to msg id not found: YourmessageofWed30Jun1999102322+0800199906300223.KAA02554@netrinsics.com | Resolved by subject fallback

#20

Thomas Lockhart

lockhart@alumni.caltech.edu

over 26 years ago

In reply to: Tom Lane (#19)

Re: [HACKERS] regression bigtest needs very long time

The question, though, becomes what percentage of operations on a
NUMERIC field are arithmetic, and what percentage are storage/retrieval.

Good point.

We assume that most data stays inside the database on every query.
That is, one should optimize for comparison/calculation speed, not
formatting speed. If you are comparing a bunch of rows to return one,
you will be much happier if the comparison happens quickly, as opposed
to doing that slowly but formatting the single output value quickly.
An RDBMS can't really try to optimize for the opposite case, since
that isn't how it is usually used...

- Thomas

--
Thomas Lockhart lockhart@alumni.caltech.edu
South Pasadena, California

#21

Noname

wieck@debis.com

over 26 years ago

In reply to: Tom Lane (#19)

Re: [HACKERS] regression bigtest needs very long time

Tom Lane wrote:

Michael Robinson <robinson@netrinsics.com> writes:

The question, though, becomes what percentage of operations on a
NUMERIC field are arithmetic, and what percentage are storage/retrieval.

Good point.

For databases that simply store/retrieve data, your "optimization" will have
the effect of significantly increasing format conversion overhead. With a
512-byte table, four packed-decimal digits can be converted in two
primitive operations, but base-10000 will require three divisions,
three subtractions, four additions, plus miscellaneous data shuffling.

That is something to worry about, but I think the present implementation
unpacks the storage format into calculation format before converting
to text. Getting rid of the unpack step by making storage and calc
formats the same would probably buy enough speed to pay for the extra
conversion arithmetic.

What I'm actually wondering about is why the hell using
NUMERIC data type for fields where the database shouldn't
calculate on. Why not using TEXT in that case?

OTOH, I don't think that the format conversion base 10000->10
overhead will be that significant compared against what in
summary must happen until one tuple is ready to get sent to
the frontend. Then, ALL our output functions allocate memory
for the string representation and at least copy the result to
there. How many arithmetic operations are performed
internally to create the output of an int4 or float8 via
sprintf()?

Jan

#22

Tom Lane

tgl@sss.pgh.pa.us

over 26 years ago

In reply to: Noname (#21)

Re: [HACKERS] regression bigtest needs very long time

wieck@debis.com (Jan Wieck) writes:

What I'm actually wondering about is why the hell using
NUMERIC data type for fields where the database shouldn't
calculate on. Why not using TEXT in that case?

He didn't say his application would be *all* I/O; he was just concerned
about whether the change would be a net loss if he did more I/O than
calculation. Seems like a reasonable concern to me.

OTOH, I don't think that the format conversion base 10000->10
overhead will be that significant compared against what in
summary must happen until one tuple is ready to get sent to
the frontend.

I agree, but it's still good if you can avoid slowing it down.

Meanwhile, I'd still like to see the runtime of the 'numeric'
regression test brought down to something comparable to one
of the other regression tests. How about cutting the precision
it uses from (300,100) down to something sane, like say (30,10)?
I do not believe for a moment that there are any portability bugs
that will be uncovered by the 300-digit case but not by a 30-digit
case.

regards, tom lane

Import Notes

Reply to msg id not found: YourmessageofThu1Jul1999022657+0200m10zUgr-0003ktC@orion.SAPserv.Hamburg.dsh.de | Resolved by subject fallback

#23

Michael Robinson

robinson@netrinsics.com

over 26 years ago

In reply to: Thomas Lockhart (#20)

Re: [HACKERS] regression bigtest needs very long time

Thomas Lockhart <lockhart@alumni.caltech.edu> writes:

The question, though, becomes what percentage of operations on a
NUMERIC field are arithmetic, and what percentage are storage/retrieval.

Good point.

We assume that most data stays inside the database on every query.
That is, one should optimize for comparison/calculation speed, not
formatting speed. If you are comparing a bunch of rows to return one,
you will be much happier if the comparison happens quickly, as opposed
to doing that slowly but formatting the single output value quickly.
An RDBMS can't really try to optimize for the opposite case, since
that isn't how it is usually used...

The optimizations under discussion will not significantly affect comparison
speed one way or the other, so comparison speed is a moot issue.

The question, really, is how often do you do this:

select bignum from table where key = condition

versus this:

select bignum1/bignum2 from table where key = condition

or this:

select * from table where bignum1/bignum2 = condition

-Michael Robinson

#24

Thomas Lockhart

lockhart@alumni.caltech.edu

over 26 years ago

In reply to: Tom Lane (#22)

Re: [HACKERS] regression bigtest needs very long time

I do not believe for a moment that there are any portability bugs
that will be uncovered by the 300-digit case but not by a 30-digit
case.

Yeah, just gratuitous showmanship ;)

And think about those poor 486 machines. Maybe Jan is trying to burn
them out so they get replaced with something reasonable...

- Thomas

--
Thomas Lockhart lockhart@alumni.caltech.edu
South Pasadena, California

#25

SAKAIDA

sakaida@psn.co.jp

over 26 years ago

In reply to: Noname (#18)

Re: [HACKERS] regression bigtest needs very long time

Hi,

wieck@debis.com (Jan Wieck) wrote:

The complex functions (LN, LOG, EXP, etc.) where added to
NUMERIC for the case someone really needs higher precision
than float8. The numeric_big test simply ensures that
someone really get's the CORRECT result when computing a
logarithm up to hundreds of digits. All the expected results
fed into the tables are computed by scripts using bc(1) with
a precision 200 digits higher than that used in the test
itself. So I'm pretty sure NUMERIC returns a VERY GOOD
approximation if I ask for the square root of 2 with 1000
digits.

I was able to understand the specification for the NUMERIC
data type. But, I can not yet understand the specification of
the regression normal test.

File :"src/regress/sql/numeric.sql"
Function : LN(ABS(round(val,300)))
----> LN(ABS(round(val,30))) <---- My hope

Please teach me,

Is there a difference of the calculation algorithm between 30
and 300 digits ?

Is there a difference of something like CPU-dependence or like
compiler-dependence between 30 and 300 digits ?

# If the answer is "NO", I think that the 300 digits case is
not necessary once you are sure that it works, because

1. the 30 digits case is equivalent to the 300 digits case.
2. the 300 digits case is slow.
3. It is sufficiently large value even in 30 digits.

--
Regards.

SAKAIDA Masaaki <sakaida@psn.co.jp>
Osaka, Japan

#26

Tom Lane

tgl@sss.pgh.pa.us

over 26 years ago

In reply to: SAKAIDA (#25)

Re: [HACKERS] regression bigtest needs very long time

Michael Robinson <robinson@netrinsics.com> writes:

Thomas Lockhart <lockhart@alumni.caltech.edu> writes:

We assume that most data stays inside the database on every query.
That is, one should optimize for comparison/calculation speed, not
formatting speed. If you are comparing a bunch of rows to return one,
you will be much happier if the comparison happens quickly, as opposed
to doing that slowly but formatting the single output value quickly.

The optimizations under discussion will not significantly affect comparison
speed one way or the other, so comparison speed is a moot issue.

On what do you base that assertion? I'd expect comparisons to be sped
up significantly: no need to unpack the storage format, and the inner
loop handles four digits per iteration instead of one.

regards, tom lane

Import Notes

Reply to msg id not found: YourmessageofThu1Jul1999105252+0800199907010252.KAA05477@netrinsics.com | Resolved by subject fallback

#27

Michael Robinson

robinson@netrinsics.com

over 26 years ago

In reply to: Tom Lane (#26)

Re: [HACKERS] regression bigtest needs very long time

Tom Lane <tgl@sss.pgh.pa.us> writes:

The optimizations under discussion will not significantly affect comparison
speed one way or the other, so comparison speed is a moot issue.

On what do you base that assertion? I'd expect comparisons to be sped
up significantly: no need to unpack the storage format, and the inner
loop handles four digits per iteration instead of one.

The overwhelming majority of comparisons can be resolved just by looking
at the number of significant digits. Ninety percent of the remainder can
be resolved after looking at the most significant digit, and so on, except
in the case of distributions that vary only in the least significant digits.

Furthermore, on big-endian architectures, four digits of packed representation
can be compared in one iteration as well.

So, I conclude the optimizations under discussion will not significantly
affect comparison speed one way or the other.

-Michael Robinson

Import Notes

Resolved by subject fallback