Calculated values

Started by Camm Maguireabout 25 years ago10 messagesgeneral
Jump to latest
#1Camm Maguire
camm@enhanced.com

Greetings! What is the 'best way' or 'guiding philosophy' if there is
one for dealing with calculated values in a pg database? For example,
say you have a table with a column of floats, and you want to make a
view showing this column as a percent of the total across the column.
Should you

a) create another table with the total, and put on triggers on insert,
update, and delete to modify the right total or

b) create the view with a specific subselect to recalculate the total
at select time. This has the disadvantage that the total seems to be
recalculated for each row. Is there any sql syntax which can merge a
dynamically generated aggregate, *calculated only once*, into each
output row?

Thanks!

--
Camm Maguire camm@enhanced.com
==========================================================================
"The earth is but one country, and mankind its citizens." -- Baha'u'llah

#2Emmanuel Charpentier
charpent@bacbuc.dyndns.org
In reply to: Camm Maguire (#1)
Re: Calculated values

Camm Maguire wrote:

Greetings! What is the 'best way' or 'guiding philosophy' if there is
one for dealing with calculated values in a pg database? For example,
say you have a table with a column of floats, and you want to make a
view showing this column as a percent of the total across the column.
Should you

a) create another table with the total, and put on triggers on insert,
update, and delete to modify the right total or

b) create the view with a specific subselect to recalculate the total
at select time. This has the disadvantage that the total seems to be
recalculated for each row. Is there any sql syntax which can merge a
dynamically generated aggregate, *calculated only once*, into each
output row?

Hmmm ... You want to avoid "remerging", if I follow you ...

Coud you try :

CREATE VIEW my view AS
SELECT id, partialsum, (partialsum/totalsum) AS percentage
FROM (SELECT id, SUM(item) AS partialsum GROUP BY id)
JOIN (SELECT SUM(item) AS totalsum); -- Note : *no* "ON" clause

That way, totalsum should be computed once.

"Caching" computations in a secondary table is a (somewhat hidden) form
of redundancy, and therefore a nice way to ask for trouble ...

Hope this helps ...

--
Emmanuel Charpentier

#3Camm Maguire
camm@enhanced.com
In reply to: Camm Maguire (#1)
Re: Calculated values

Greetings, and thanks for your reply!

Emmanuel Charpentier <charpent@bacbuc.dyndns.org> writes:

Camm Maguire wrote:

Greetings! What is the 'best way' or 'guiding philosophy' if there is
one for dealing with calculated values in a pg database? For example,
say you have a table with a column of floats, and you want to make a
view showing this column as a percent of the total across the column.
Should you

a) create another table with the total, and put on triggers on insert,
update, and delete to modify the right total or

b) create the view with a specific subselect to recalculate the total
at select time. This has the disadvantage that the total seems to be
recalculated for each row. Is there any sql syntax which can merge a
dynamically generated aggregate, *calculated only once*, into each
output row?

Hmmm ... You want to avoid "remerging", if I follow you ...

Coud you try :

CREATE VIEW my view AS
SELECT id, partialsum, (partialsum/totalsum) AS percentage
FROM (SELECT id, SUM(item) AS partialsum GROUP BY id)
JOIN (SELECT SUM(item) AS totalsum); -- Note : *no* "ON" clause

I can't seem to get this syntax to work with pg. No subselects seem
to be accepted in the from clause, and join doesn't seem to be a
reserved word at all. But this kind of thing is exactly what I'm
looking for.

Another idea is to write a plpgsql function which returns tuples of a
given table. Only problem here appears to be that to access the
columns of the result, one has to do something like 'select
id(function),percentage(function),...' which then calls the function
multiple times for each column!

Maybe this is better done outside the db.

Any advice most appreciated!

That way, totalsum should be computed once.

"Caching" computations in a secondary table is a (somewhat hidden) form
of redundancy, and therefore a nice way to ask for trouble ...

Hope this helps ...

--
Emmanuel Charpentier

--
Camm Maguire camm@enhanced.com
==========================================================================
"The earth is but one country, and mankind its citizens." -- Baha'u'llah

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Camm Maguire (#3)
Re: Calculated values

Camm Maguire <camm@enhanced.com> writes:

Emmanuel Charpentier <charpent@bacbuc.dyndns.org> writes:

CREATE VIEW my view AS
SELECT id, partialsum, (partialsum/totalsum) AS percentage
FROM (SELECT id, SUM(item) AS partialsum GROUP BY id)
JOIN (SELECT SUM(item) AS totalsum); -- Note : *no* "ON" clause

I can't seem to get this syntax to work with pg. No subselects seem
to be accepted in the from clause, and join doesn't seem to be a
reserved word at all.

Sounds like you are trying to do it in 7.0 or before. Emmanuel is
relying on 7.1 features --- and the example won't work as given anyway,
since (a) the subselects neglect to specify source tables; (b) you
have to write CROSS JOIN not JOIN if you want to omit ON/USING.

In 7.0 you could accomplish the same thing with temp tables, or more
straightforwardly by something like

SELECT id,
SUM(item) AS partialsum,
SUM(item) / (SELECT SUM(item) FROM table) AS percentage
FROM table
GROUP BY id

This relies for efficiency on the poorly-documented fact that the
sub-select will only be evaluated once, since it has no dependency
on the state of the outer select. (You can check this by seeing that
EXPLAIN shows the subselect as an InitPlan not a SubPlan.)

regards, tom lane

#5Camm Maguire
camm@enhanced.com
In reply to: Camm Maguire (#1)
Re: Calculated values

Greetings, and thanks so much for your reply!

Tom Lane <tgl@sss.pgh.pa.us> writes:

Camm Maguire <camm@enhanced.com> writes:

Emmanuel Charpentier <charpent@bacbuc.dyndns.org> writes:

CREATE VIEW my view AS
SELECT id, partialsum, (partialsum/totalsum) AS percentage
FROM (SELECT id, SUM(item) AS partialsum GROUP BY id)
JOIN (SELECT SUM(item) AS totalsum); -- Note : *no* "ON" clause

I can't seem to get this syntax to work with pg. No subselects seem
to be accepted in the from clause, and join doesn't seem to be a
reserved word at all.

Sounds like you are trying to do it in 7.0 or before. Emmanuel is
relying on 7.1 features --- and the example won't work as given anyway,
since (a) the subselects neglect to specify source tables; (b) you
have to write CROSS JOIN not JOIN if you want to omit ON/USING.

Thanks! Indeed, I'm using 7.0.3.

In 7.0 you could accomplish the same thing with temp tables, or more
straightforwardly by something like

SELECT id,
SUM(item) AS partialsum,
SUM(item) / (SELECT SUM(item) FROM table) AS percentage
FROM table
GROUP BY id

This relies for efficiency on the poorly-documented fact that the
sub-select will only be evaluated once, since it has no dependency
on the state of the outer select. (You can check this by seeing that
EXPLAIN shows the subselect as an InitPlan not a SubPlan.)

Thanks again. Alas, the 'explain' for me still shows the subplan, and
I can see why. Here is my view definition:

create view csp2 as
SELECT c1.asof, c1.ticker, c1.sp, c1.price AS p1,
(((100 * float8(c1.sp)) * c1.price) /
(select sum(price*sp) from sprices,sinfo
where sprices.sinfo = sinfo.sinfo
and sprices.asof = c1.asof)) AS wt,
c2.price AS p2, c2.split, c1.div,
(100 * c1.ret)
FROM csp1 c1, csp1 c2, dates
WHERE ((((c1.asof = dates.asof))
AND (c2.asof = dates.nasof))
AND (c1.id = c2.id));

What is obviously doing this is the 'sprices.asof = c1.asof'
dependency between the inner and outer select. Trouble is, my only
intention is to be able to use this view with a constant, albeit
'runtime-selectable', 'asof' or date, as in

select * from csp where asof = '20000103';

Any other suggestions? This dependency issue slows the above query
down significantly, resulting in the calculation of the same sum ~
1000 times. Do you also agree with the previous respondent that
trying to have a table of sums, updated dynamically with triggers, is
not a good idea? I'm trying to find the right philosophy to the
design of this db, and am ending up searching for a happy medium
between select speed and insert complexity.

regards, tom lane

Thanks again,

--
Camm Maguire camm@enhanced.com
==========================================================================
"The earth is but one country, and mankind its citizens." -- Baha'u'llah

#6chris markiewicz
cmarkiew@commnav.com
In reply to: Camm Maguire (#5)
performance...

hello.

this might be as much of a general database question as it is a postgres
question...

i have a table with 5 columns...a primary key (integer), three small (10
character) text fields, and one semi-large (1400 characters) text field.
note that only a small percentage (5% ?) of the rows contain 1400 characters
in the 5th column...the other 95% have approx 10 characters. it has 1100
rows.

the problem is this - queries (command line) often take a very long time -
anywhere from 5-15 seconds - to execute. the queries use only the primary
key and nothing else in the where clause. no joins. a sample query is:

select * from weather where weatherid = 12372;

from the command line, it seems that the first query can take a very long
time but subsequent queries happen quickly ( < 1 sec). i'm guessing that
this is the result of caching or something.

do the long times make sense? what can i do to shorten them? would a
smaller text field help? i have no reason to think that this would be
faster or slower in another db, so it might be unrelated to postgres itself.

i greatly appreciate your help.

chris

#7Mitch Vincent
mvincent@venux.net
In reply to: chris markiewicz (#6)
Re: performance...

You have weatherid indexed, right? If not, index it and run VACUUM
ANALYZE -- if so make then sure you've run VACUUM ANALYZE after you
created the index..

Also, you might want to EXPLAIN that query to get the query plan (just
type EXPLAIN <query>)if the above doesn't work. Post the ouput of
EXPLAIN along with the schema of your tables and perhaps someone can help
some more..

-Mitch

On Mon, 29 Jan 2001, chris markiewicz wrote:

Show quoted text

hello.

this might be as much of a general database question as it is a postgres
question...

i have a table with 5 columns...a primary key (integer), three small (10
character) text fields, and one semi-large (1400 characters) text field.
note that only a small percentage (5% ?) of the rows contain 1400 characters
in the 5th column...the other 95% have approx 10 characters. it has 1100
rows.

the problem is this - queries (command line) often take a very long time -
anywhere from 5-15 seconds - to execute. the queries use only the primary
key and nothing else in the where clause. no joins. a sample query is:

select * from weather where weatherid = 12372;

from the command line, it seems that the first query can take a very long
time but subsequent queries happen quickly ( < 1 sec). i'm guessing that
this is the result of caching or something.

do the long times make sense? what can i do to shorten them? would a
smaller text field help? i have no reason to think that this would be
faster or slower in another db, so it might be unrelated to postgres itself.

i greatly appreciate your help.

chris

#8adb
adb@Beast.COM
In reply to: chris markiewicz (#6)
Re: performance...

First thing to try is to do

explain select * from weather where weatherid = 12372;

and see if it's doing a sequential scan on your table.
If it is, the index and table stats may be out of date
and I think you fix those using vacuum analyze.

One other thing is that if you only have 1100 rows of the sizes
you describe, even a table scan shouldn't take as long as you
say unless the machine is either under serious load or doesn't
have enough memory to run postgres without swapping. (or the disk is
freaking out with io errors)

Alex.

On Mon, 29 Jan 2001, chris markiewicz wrote:

Show quoted text

hello.

this might be as much of a general database question as it is a postgres
question...

i have a table with 5 columns...a primary key (integer), three small (10
character) text fields, and one semi-large (1400 characters) text field.
note that only a small percentage (5% ?) of the rows contain 1400 characters
in the 5th column...the other 95% have approx 10 characters. it has 1100
rows.

the problem is this - queries (command line) often take a very long time -
anywhere from 5-15 seconds - to execute. the queries use only the primary
key and nothing else in the where clause. no joins. a sample query is:

select * from weather where weatherid = 12372;

from the command line, it seems that the first query can take a very long
time but subsequent queries happen quickly ( < 1 sec). i'm guessing that
this is the result of caching or something.

do the long times make sense? what can i do to shorten them? would a
smaller text field help? i have no reason to think that this would be
faster or slower in another db, so it might be unrelated to postgres itself.

i greatly appreciate your help.

chris

#9Alfred Perlstein
bright@wintelcom.net
In reply to: adb (#8)
Re: performance...

* adb <adb@Beast.COM> [010129 13:40] wrote:

One other thing is that if you only have 1100 rows of the sizes
you describe, even a table scan shouldn't take as long as you
say unless the machine is either under serious load or doesn't
have enough memory to run postgres without swapping. (or the disk is
freaking out with io errors)

Increasing the amount of shared memory for postgresql can help.

#10chris markiewicz
cmarkiew@commnav.com
In reply to: adb (#8)
RE: performance...

vacuum analyze seems to have done the trick...this is the sort of thing that
happens when a non-dba is doing dba work...

sincerest thanks to all that responded!

chris

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org]On Behalf Of adb
Sent: Monday, January 29, 2001 4:36 PM
To: chris markiewicz
Cc: pgsql-general@postgresql.org
Subject: Re: performance...

First thing to try is to do

explain select * from weather where weatherid = 12372;

and see if it's doing a sequential scan on your table.
If it is, the index and table stats may be out of date
and I think you fix those using vacuum analyze.

One other thing is that if you only have 1100 rows of the sizes
you describe, even a table scan shouldn't take as long as you
say unless the machine is either under serious load or doesn't
have enough memory to run postgres without swapping. (or the disk is
freaking out with io errors)

Alex.

On Mon, 29 Jan 2001, chris markiewicz wrote:

hello.

this might be as much of a general database question as it is a postgres
question...

i have a table with 5 columns...a primary key (integer), three small (10
character) text fields, and one semi-large (1400 characters) text field.
note that only a small percentage (5% ?) of the rows contain 1400

characters

in the 5th column...the other 95% have approx 10 characters. it has 1100
rows.

the problem is this - queries (command line) often take a very long time -
anywhere from 5-15 seconds - to execute. the queries use only the primary
key and nothing else in the where clause. no joins. a sample query is:

select * from weather where weatherid = 12372;

from the command line, it seems that the first query can take a very long
time but subsequent queries happen quickly ( < 1 sec). i'm guessing that
this is the result of caching or something.

do the long times make sense? what can i do to shorten them? would a
smaller text field help? i have no reason to think that this would be
faster or slower in another db, so it might be unrelated to postgres

itself.

Show quoted text

i greatly appreciate your help.

chris