Question About Aggregate Functions
Greetings,
I'm a DB novice as well as a pgsql novice. I can manage to run a few basic
queries, but that's all I've really done so far. How do I create a query
that (1) evaluates each boolean field for TRUE/FALSE and (2) counts the
number of rows where each field is TRUE? Also, one field is an integer, so
I want the average from that field, rather than the count. Something along
the lines of:
SELECT COUNT (t1.fielda, t2.fielda, t2.fieldb) AVG(t2.fieldc) FROM t1, t2
WHERE t1.item_id=t2.item_id AND t1.fielda IS TRUE AND t2.fielda IS TRUE AND
t2.fieldb IS TRUE AND t2.fieldc IS NOT NULL
The result is intended to be something of a summary report. t1 contains
basic info about each item, with one field for whether the item is active or
not. The other table contains additional info about whether each item has
particular attributes. I want the query to tell me the average number of
years active items have been active, and the number of items where each
attribute is true. I then want to turn the raw attribute counts into
percentages, so I can say "for n% of the items these attributes are true".
Pointers to good examples/tutorials are welcome. Most I have seen are
rather simplistic, and what I am reading in the manual isn't coming together
very well.
Thanks,
Don
On 9/12/06, Brandon Aiken <BAiken@winemantech.com> wrote:
First, aggregate functions always have to have a GROUP BY clause. If you
want everything in a table or join, you use GROUP BY NULL.
Thanks. I did not realize that.
Next, IS TRUE statements will select anything that is not NULL, 0, or FALSE,
so I'm not sure what you're trying to get because you're getting nearly
everything, and count() already ignores NULL values.
I didn't see that in the manual's coverage, but could have overlooked it.
But count() will include the FALSE values along with the TRUE values -
ignoring only those that are NULL. At least, I think that's the case. So,
for each column I select, I need to be sure I am counting only the TRUE
values. I do have NULL, FALSE and TRUE values in each column, since I do
not always know for sure whether an attribute is TRUE or FALSE when I record
the item. That may be determined later, but not in all cases.
Next, count(x, y, z) isn't a valid function. Count() only has one
parameter, so you'll have to call it several times.
I knew my syntax was wrong - but wasn't sure about calling multiple
functions since I hadn't seen any examples of that in my hunting for info.
I was trying to make a little clearer what I wanted to do.
Depending on what you were hoping count(x, y, z) was returning, you do this:
SELECT count(t1.fielda), count(t2.fielda), count(t2.fieldb), AVG(t2.fieldc
)FROM t1 JOIN t2 ON ON t1.item_id = t2.item_id
GROUP BY NULL;
This one looks more like what I am attempting to do. However, I do need to
be sure my count() functions are counting the values that are TRUE. Is
this a case where I should run a query to select the records where the
values for the desired columns are true, insert that result into a temp
table, and then perform the count() function as above on just those
records? Sure seems like that would be the simple route, now that I think
about it.
<SNIP>
Show quoted text
Greetings,
I'm a DB novice as well as a pgsql novice. I can manage to run a few
basic queries, but that's all I've really done so far. How do I create a
query that (1) evaluates each boolean field for TRUE/FALSE and (2) counts
the number of rows where each field is TRUE? Also, one field is an integer,
so I want the average from that field, rather than the count. Something
along the lines of:SELECT COUNT (t1.fielda, t2.fielda, t2.fieldb) AVG(t2.fieldc) FROM t1, t2
WHERE t1.item_id=t2.item_id AND t1.fielda IS TRUE AND t2.fielda IS TRUE
AND t2.fieldb IS TRUE AND t2.fieldc IS NOT NULLThe result is intended to be something of a summary report. t1 contains
basic info about each item, with one field for whether the item is active or
not. The other table contains additional info about whether each item has
particular attributes. I want the query to tell me the average number of
years active items have been active, and the number of items where each
attribute is true. I then want to turn the raw attribute counts into
percentages, so I can say "for n% of the items these attributes are true".Pointers to good examples/tutorials are welcome. Most I have seen are
rather simplistic, and what I am reading in the manual isn't coming together
very well.Thanks,
Don
Import Notes
Reply to msg id not found: F8E84F0F56445B4CB39E019EF67DACBA2F1E54@exchsrvr.winemantech.com
El mar, 12-09-2006 a las 21:15 -0400, Don Parris escribi�:
I'm a DB novice as well as a pgsql novice. I can manage to
run a few basic queries, but that's all I've really done so
far. How do I create a query that (1) evaluates each boolean
field for TRUE/FALSE and (2) counts the number of rows where
each field is TRUE? Also, one field is an integer, so I want
the average from that field, rather than the count. Something
along the lines of:SELECT COUNT (t1.fielda, t2.fielda, t2.fieldb) AVG(t2.fieldc)
FROM t1, t2
WHERE t1.item_id=t2.item_id AND t1.fielda IS TRUE AND
t2.fielda IS TRUE AND t2.fieldb IS TRUE AND t2.fieldc IS NOT
NULLThe result is intended to be something of a summary report.
t1SELECT count(t1.fielda), count(t2.fielda), count(t2.fieldb),
AVG(t2.fieldc)
FROM t1 JOIN t2 ON t1.item_id = t2.item_id
GROUP BY NULL;
This one looks more like what I am attempting to do. However, I do
need to be sure my count() functions are counting the values that are
TRUE. Is this a case where I should run a query to select the records
where the values for the desired columns are true, insert that result
into a temp table, and then perform the count() function as above on
just those records? Sure seems like that would be the simple route,
now that I think about it.
You should then use WHERE clause:
SELECT count(t1.fielda) FROM t1 WHERE t1.fielda = TRUE;
this should report how many rows of t1 are TRUE.
BTW, I cannot understand your query's logic because you want to
aggregate both tables.
Joinning both tables you get a table like:
item_id, t1.fielda, t2.fielda, t2.fieldb, t2.fieldc
Then you want to aggregate fields _independently_ so you may need to do
four subqueries.
At last, if you use aggregate functions without group by clause,
postgres use data from the whole table and returns only one row with
that result.
A query (UNTESTED) that should solve your problem may be this one:
SELECT a.count AS "t1fielda",
b.count AS "t2fielda",
c.count AS "t2fieldb",
d.avg AS "t2avgc"
FROM (SELECT count(*) FROM t1 WHERE fielda = TRUE) AS a
JOIN (SELECT count(*) FROM t2 WHERE fielda = TRUE) AS b ON TRUE
JOIN (SELECT count(*) FROM t2 WHERE fieldb = TRUE) AS c ON TRUE
JOIN (SELECT avg(fieldc) FROM t2) AS d ON TRUE;
I am also a DB novice so this is correct.
Regards
Show quoted text
El mi�, 13-09-2006 a las 13:16 +0200, Oscar Rodriguez Fonseca escribi�:
A query (UNTESTED) that should solve your problem may be this one:
SELECT a.count AS "t1fielda",
b.count AS "t2fielda",
c.count AS "t2fieldb",
d.avg AS "t2avgc"
FROM (SELECT count(*) FROM t1 WHERE fielda = TRUE) AS a
JOIN (SELECT count(*) FROM t2 WHERE fielda = TRUE) AS b ON TRUE
JOIN (SELECT count(*) FROM t2 WHERE fieldb = TRUE) AS c ON TRUE
JOIN (SELECT avg(fieldc) FROM t2) AS d ON TRUE;I am also a DB novice so this is correct.
Sorry, I was trying to write:
"I am also a DB novice so I HOPE this is correct".
Best regards.
Ah, I did not know what was in your fields, so I did not assume they
were Boolean values. It looked to me like you were trying to use IS
TRUE to substitute for the lack of a GROUP BY, so I didn't know what to
do.
Yes, count() will include all non-NULL values. Sorry if I sounded
unclear there.
If you do typecasting the value zero is false (and non-zero is true).
NULL in an expression always returns NULL, and many programs will
interpret that result as false. So I'm not sure of what results you
might get with a Boolean test against a non-Boolean field, especially if
it's an integer field.
postgres=# select 0::boolean = FALSE;
?column?
----------
t
(1 row)
You should just be able to take the previous query and add in your WHERE
clauses:
SELECT count(t1.fielda), count(t2.fielda), count(t2.fieldb),
AVG(t2.fieldc)
FROM t1 JOIN t2 ON t1.item_id = t2.item_id
WHERE t1.fielda = TRUE AND t2.fielda = TRUE AND t2.fieldb = TRUE
GROUP BY NULL;
Now, the INNER JOIN you're using is only selecting fields where both
t1.item_id and t2.item_id exist and the respective fields are TRUE.
That is, it's only going to run the count and average functions against
the results of this query:
SELECT *
FROM t1 JOIN t2 ON ON t1.item_id = t2.item_id
WHERE t1.fielda = TRUE AND t2.fielda = TRUE AND t2.fieldb = TRUE;
If that's what you want, that's great.
However, you might want a count of each field where that field is TRUE.
In that case, I would use either temporary tables, compound queries and
derived tables, or multiple simple queries.
It's also possible that you might want a count of fields where
t1.item_id and t2.item_id exist, but where only each respective field is
TRUE. That is, you want a count of t1.fielda where it is TRUE no matter
what t2.fielda and t2.fieldb are as long as t1.item_id matches
t2.item_id. In that case you have to do even more joins, and that could
take a fair bit of time especially if you haven't indexed your item_id
fields.
You really have to look at your result sets. Sometimes it is better to
run multiple simple queries instead of one big complex query to be sure
you're getting the data you want and the query executes in a reasonable
amount of time.
Also, consider that NULL values are generally considered bad to
purposefully enter. Logically, It would be better to create one table
for each field and then create a record for each item_id as you need it
so you never have NULLs. The problem with that is one of performance if
you end up doing large number of JOINs. In that case, it might be
better to use integers instead of Boolean fields, since you have three
explicit states of TRUE, FALSE, and NOT YET DETERMINED.
--
Brandon Aiken
CS/IT Systems Engineer
________________________________
From: pgsql-novice-owner@postgresql.org
[mailto:pgsql-novice-owner@postgresql.org] On Behalf Of Don Parris
Sent: Tuesday, September 12, 2006 9:16 PM
To: pgsql-novice@postgresql.org
Subject: Re: [NOVICE] Question About Aggregate Functions
On 9/12/06, Brandon Aiken <BAiken@winemantech.com> wrote:
First, aggregate functions always have to have a GROUP BY
clause. If you want everything in a table or join, you use GROUP BY
NULL.
Thanks. I did not realize that.
Next, IS TRUE statements will select anything that is not NULL,
0, or FALSE, so I'm not sure what you're trying to get because you're
getting nearly everything, and count() already ignores NULL values.
I didn't see that in the manual's coverage, but could have overlooked
it. But count() will include the FALSE values along with the TRUE
values - ignoring only those that are NULL. At least, I think that's
the case. So, for each column I select, I need to be sure I am counting
only the TRUE values. I do have NULL, FALSE and TRUE values in each
column, since I do not always know for sure whether an attribute is TRUE
or FALSE when I record the item. That may be determined later, but not
in all cases.
Next, count(x, y, z) isn't a valid function. Count() only has
one parameter, so you'll have to call it several times.
I knew my syntax was wrong - but wasn't sure about calling multiple
functions since I hadn't seen any examples of that in my hunting for
info. I was trying to make a little clearer what I wanted to do.
Depending on what you were hoping count(x, y, z) was returning,
you do this:
SELECT count(t1.fielda), count(t2.fielda), count(t2.fieldb),
AVG(t2.fieldc)
FROM t1 JOIN t2 ON ON t1.item_id = t2.item_id
GROUP BY NULL;
This one looks more like what I am attempting to do. However, I do need
to be sure my count() functions are counting the values that are TRUE.
Is this a case where I should run a query to select the records where
the values for the desired columns are true, insert that result into a
temp table, and then perform the count() function as above on just those
records? Sure seems like that would be the simple route, now that I
think about it.
<SNIP>
Greetings,
I'm a DB novice as well as a pgsql novice. I can manage to run a few
basic queries, but that's all I've really done so far. How do I create
a query that (1) evaluates each boolean field for TRUE/FALSE and (2)
counts the number of rows where each field is TRUE? Also, one field is
an integer, so I want the average from that field, rather than the
count. Something along the lines of:
SELECT COUNT (t1.fielda, t2.fielda, t2.fieldb) AVG(t2.fieldc) FROM t1,
t2
WHERE t1.item_id=t2.item_id AND t1.fielda IS TRUE AND t2.fielda IS TRUE
AND t2.fieldb IS TRUE AND t2.fieldc IS NOT NULL
The result is intended to be something of a summary report. t1 contains
basic info about each item, with one field for whether the item is
active or not. The other table contains additional info about whether
each item has particular attributes. I want the query to tell me the
average number of years active items have been active, and the number of
items where each attribute is true. I then want to turn the raw
attribute counts into percentages, so I can say "for n% of the items
these attributes are true".
Pointers to good examples/tutorials are welcome. Most I have seen are
rather simplistic, and what I am reading in the manual isn't coming
together very well.
Thanks,
Don
Ah, I did not know what was in your fields, so I did not assume they
were Boolean values. It looked to me like you were trying to use IS
TRUE to substitute for the lack of a GROUP BY, so I didn't know what to
do.
Yes, count() will include all non-NULL values. Sorry if I sounded
unclear there.
If you do typecasting the value zero is false (and non-zero is true).
NULL in an expression always returns NULL, and many programs will
interpret that result as false. So I'm not sure of what results you
might get with a Boolean test against a non-Boolean field, especially if
it's an integer field.
postgres=# select 0::boolean = FALSE;
?column?
----------
t
(1 row)
You should just be able to take the previous query and add in your WHERE
clauses:
SELECT count(t1.fielda), count(t2.fielda), count(t2.fieldb),
AVG(t2.fieldc)
FROM t1 JOIN t2 ON t1.item_id = t2.item_id
WHERE t1.fielda = TRUE AND t2.fielda = TRUE AND t2.fieldb = TRUE
GROUP BY NULL;
Now, the INNER JOIN you're using is only selecting fields where both
t1.item_id and t2.item_id exist and the respective fields are TRUE.
That is, it's only going to run the count and average functions against
the results of this query:
SELECT *
FROM t1 JOIN t2 ON ON t1.item_id = t2.item_id
WHERE t1.fielda = TRUE AND t2.fielda = TRUE AND t2.fieldb = TRUE;
If that's what you want, that's great.
However, you might want a count of each field where that field is TRUE.
In that case, I would use either temporary tables, compound queries and
derived tables, or multiple simple queries.
It's also possible that you might want a count of fields where
t1.item_id and t2.item_id exist, but where only each respective field is
TRUE. That is, you want a count of t1.fielda where it is TRUE no matter
what t2.fielda and t2.fieldb are as long as t1.item_id matches
t2.item_id. In that case you have to do even more joins, and that could
take a fair bit of time especially if you haven't indexed your item_id
fields.
You really have to look at your result sets. Sometimes it is better to
run multiple simple queries instead of one big complex query to be sure
you're getting the data you want and the query executes in a reasonable
amount of time.
Also, consider that NULL values are generally considered bad to
purposefully enter. Logically, It would be better to create one table
for each field and then create a record for each item_id as you need it
so you never have NULLs. The problem with that is one of performance if
you end up doing large number of JOINs. In that case, it might be
better to use integers instead of Boolean fields, since you have three
explicit states of TRUE, FALSE, and NOT YET DETERMINED.
--
Brandon Aiken
CS/IT Systems Engineer
________________________________
From: pgsql-novice-owner@postgresql.org
[mailto:pgsql-novice-owner@postgresql.org] On Behalf Of Don Parris
Sent: Tuesday, September 12, 2006 9:16 PM
To: pgsql-novice@postgresql.org
Subject: Re: [NOVICE] Question About Aggregate Functions
On 9/12/06, Brandon Aiken <BAiken@winemantech.com> wrote:
First, aggregate functions always have to have a GROUP BY
clause. If you want everything in a table or join, you use GROUP BY
NULL.
Thanks. I did not realize that.
Next, IS TRUE statements will select anything that is not NULL,
0, or FALSE, so I'm not sure what you're trying to get because you're
getting nearly everything, and count() already ignores NULL values.
I didn't see that in the manual's coverage, but could have overlooked
it. But count() will include the FALSE values along with the TRUE
values - ignoring only those that are NULL. At least, I think that's
the case. So, for each column I select, I need to be sure I am counting
only the TRUE values. I do have NULL, FALSE and TRUE values in each
column, since I do not always know for sure whether an attribute is TRUE
or FALSE when I record the item. That may be determined later, but not
in all cases.
Next, count(x, y, z) isn't a valid function. Count() only has
one parameter, so you'll have to call it several times.
I knew my syntax was wrong - but wasn't sure about calling multiple
functions since I hadn't seen any examples of that in my hunting for
info. I was trying to make a little clearer what I wanted to do.
Depending on what you were hoping count(x, y, z) was returning,
you do this:
SELECT count(t1.fielda), count(t2.fielda), count(t2.fieldb),
AVG(t2.fieldc)
FROM t1 JOIN t2 ON ON t1.item_id = t2.item_id
GROUP BY NULL;
This one looks more like what I am attempting to do. However, I do need
to be sure my count() functions are counting the values that are TRUE.
Is this a case where I should run a query to select the records where
the values for the desired columns are true, insert that result into a
temp table, and then perform the count() function as above on just those
records? Sure seems like that would be the simple route, now that I
think about it.
<SNIP>
Greetings,
I'm a DB novice as well as a pgsql novice. I can manage to run a few
basic queries, but that's all I've really done so far. How do I create
a query that (1) evaluates each boolean field for TRUE/FALSE and (2)
counts the number of rows where each field is TRUE? Also, one field is
an integer, so I want the average from that field, rather than the
count. Something along the lines of:
SELECT COUNT (t1.fielda, t2.fielda, t2.fieldb) AVG(t2.fieldc) FROM t1,
t2
WHERE t1.item_id=t2.item_id AND t1.fielda IS TRUE AND t2.fielda IS TRUE
AND t2.fieldb IS TRUE AND t2.fieldc IS NOT NULL
The result is intended to be something of a summary report. t1 contains
basic info about each item, with one field for whether the item is
active or not. The other table contains additional info about whether
each item has particular attributes. I want the query to tell me the
average number of years active items have been active, and the number of
items where each attribute is true. I then want to turn the raw
attribute counts into percentages, so I can say "for n% of the items
these attributes are true".
Pointers to good examples/tutorials are welcome. Most I have seen are
rather simplistic, and what I am reading in the manual isn't coming
together very well.
Thanks,
Don
On 9/13/06, Brandon Aiken <BAiken@winemantech.com> wrote:
Ah, I did not know what was in your fields, so I did not assume they were
Boolean values. It looked to me like you were trying to use IS TRUE to
substitute for the lack of a GROUP BY, so I didn't know what to do.
That was in the first paragraph of my OP. "How do I create a query that
(1) evaluates each boolean field for TRUE/FALSE and (2) counts the number of
rows where each field is TRUE?" Maybe you just hadn't had your first cup of
coffee? ;-) Seriously, though, I really do appreciate your help.
Yes, count() will include all non-NULL values. Sorry if I sounded unclear
there.
If you do typecasting the value zero is false (and non-zero is true).
NULL in an expression always returns NULL, and many programs will interpret
that result as false. So I'm not sure of what results you might get with a
Boolean test against a non-Boolean field, especially if it's an integer
field.postgres=# select 0::boolean = FALSE;
?column?
----------
t
(1 row)
You should just be able to take the previous query and add in your WHERE
clauses:SELECT count(t1.fielda), count(t2.fielda), count(t2.fieldb), AVG(t2.fieldc
)FROM t1 JOIN t2 ON t1.item_id = t2.item_id
WHERE t1.fielda = TRUE AND t2.fielda = TRUE AND t2.fieldb = TRUE
GROUP BY NULL;
Now, the INNER JOIN you're using is only selecting fields where both
t1.item_id and t2.item_id exist and the respective fields are TRUE. That
is, it's only going to run the count and average functions against the
results of this query:SELECT *
FROM t1 JOIN t2 ON ON t1.item_id = t2.item_id
WHERE t1.fielda = TRUE AND t2.fielda = TRUE AND t2.fieldb = TRUE;
If that's what you want, that's great.
Can I use OR instead of AND here?
However, you might want a count of each field where that field is TRUE. In
that case, I would use either temporary tables, compound queries and derived
tables, or multiple simple queries.
It's also possible that you might want a count of fields where t1.item_idand
t2.item_id exist, but where only each respective field is TRUE. That is,
you want a count of t1.fielda where it is TRUE no matter what t2.fieldaand
t2.fieldb are as long as t1.item_id matches t2.item_id. In that case you
have to do even more joins, and that could take a fair bit of time
especially if you haven't indexed your item_id fields.
Well this sounds more like what I want. Given t2.fielda, t2.fieldb,
t2.fieldc, any one (or all three) could be true, but frequently at least one
of the fields is false. Initially, all of the fields might be unknown (thus
NULL) for a given item until I am able to investigate the items to determine
TRUE/FALSE. I frequently have items that are inactive, and thus unable to
determine any of attributes in t2.
My end result needs to be a count of all the values in each field where the
value is TRUE, as opposed to FALSE or NULL.
You really have to look at your result sets. Sometimes it is better to
run multiple simple queries instead of one big complex query to be sure
you're getting the data you want and the query executes in a reasonable
amount of time.Also, consider that NULL values are generally considered bad to
purposefully enter. Logically, It would be better to create one table for
each field and then create a record for each item_id as you need it so you
never have NULLs. The problem with that is one of performance if you end up
doing large number of JOINs. In that case, it might be better to use
integers instead of Boolean fields, since you have three explicit states of
TRUE, FALSE, and NOT YET DETERMINED.
Regarding the NULL Values:
I have been thinking that I might want to leave my NULLs as they are.
However, I will be concentrating mostly on the items that are active.
Inactive items are only counted as part of the total number of items. Their
attributes are a moot point, but technically FALSE. (If they are inactive,
the attributes are no longer TRUE in any case.)
I am counting only those fields that I know (through verification) to be
TRUE. I can use FALSE where the actual value is not known, and just change
the attribute to TRUE when I discover that to be the case. I just need to
be sure in my counts that I note the fact that FALSE values include the
unverified values (meaning some of those might actually be true). Does that
sound fairly logical to you?
Regarding the Table Layout:
The columns in t2 are ordered, essentially according to the category of
attributes involved. I had thought about using multiple tables, one for
each category of attributes. However, I chose a monolithic table for the
attributes to represent a single survey of each item. Each item might be
surveyed again in the future to determine any changes, which would introduce
a new version of the current table.
I'll tinker around with the queries a bit, and see what I come up with.
Thanks for the input.
I think I mistakenly sent this to General instead of Novice. Oops.
Yeah, I either skipped over or forgot the bit in the OP about bools.
Mea culpa.
You should be able to use OR instead of AND in any logical expression.
Well this sounds more like what I want. Given t2.fielda, t2.fieldb,
t2.fieldc, any one (or all three) could be true, but frequently at least
one of the fields is false. Initially, all of the fields might be
unknown (thus NULL) for a given item until I am able to investigate the
items to determine TRUE/FALSE. I frequently have items that are
inactive, and thus unable to determine any of attributes in t2.
My end result needs to be a count of all the values in each field where
the value is TRUE, as opposed to FALSE or NULL.
Yeah, I would probably run 4 separate, simple queries. That will get
you the best performance since you're doing no JOINs and no composite
queries.
If you need to enter the results into another table, try INSERT ... to
insert the defaults and any primary key you have (like timestamp), then
four UPDATE ... SELECT statements.
The real problem with NULLs is some of the (in my mind) nonsensical
results you get, especially with logical operators:
NULL AND TRUE => NULL
NULL OR TRUE => TRUE
NULL AND FALSE => FALSE
NULL OR FALSE => NULL
Plus you have to use IS instead of = since any NULL in an = expression
makes the result NULL (yes, this is an error in my previous queries).
NULL just has all these special cases. I find it much nicer to avoid it
wherever possible since it has somewhat unpredictable results.
________________________________
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Don Parris
Sent: Wednesday, September 13, 2006 12:50 PM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] [NOVICE] Question About Aggregate Functions
On 9/13/06, Brandon Aiken <BAiken@winemantech.com> wrote:
Ah, I did not know what was in your fields, so I did not assume
they were Boolean values. It looked to me like you were trying to use
IS TRUE to substitute for the lack of a GROUP BY, so I didn't know what
to do.
That was in the first paragraph of my OP. "How do I create a query
that (1) evaluates each boolean field for TRUE/FALSE and (2) counts the
number of rows where each field is TRUE?" Maybe you just hadn't had
your first cup of coffee? ;-) Seriously, though, I really do appreciate
your help.
Yes, count() will include all non-NULL values. Sorry if I
sounded unclear there.
If you do typecasting the value zero is false (and non-zero is
true). NULL in an expression always returns NULL, and many programs
will interpret that result as false. So I'm not sure of what results
you might get with a Boolean test against a non-Boolean field,
especially if it's an integer field.
postgres=# select 0::boolean = FALSE;
?column?
----------
t
(1 row)
You should just be able to take the previous query and add in
your WHERE clauses:
SELECT count(t1.fielda), count(t2.fielda), count(t2.fieldb),
AVG(t2.fieldc)
FROM t1 JOIN t2 ON t1.item_id = t2.item_id
WHERE t1.fielda = TRUE AND t2.fielda = TRUE AND t2.fieldb = TRUE
GROUP BY NULL;
Now, the INNER JOIN you're using is only selecting fields where
both t1.item_id and t2.item_id exist and the respective fields are TRUE.
That is, it's only going to run the count and average functions against
the results of this query:
SELECT *
FROM t1 JOIN t2 ON ON t1.item_id = t2.item_id
WHERE t1.fielda = TRUE AND t2.fielda = TRUE AND t2.fieldb =
TRUE;
If that's what you want, that's great.
Can I use OR instead of AND here?
However, you might want a count of each field where that field
is TRUE. In that case, I would use either temporary tables, compound
queries and derived tables, or multiple simple queries.
It's also possible that you might want a count of fields where
t1.item_id and t2.item_id exist, but where only each respective field is
TRUE. That is, you want a count of t1.fielda where it is TRUE no matter
what t2.fielda and t2.fieldb are as long as t1.item_id matches
t2.item_id. In that case you have to do even more joins, and that could
take a fair bit of time especially if you haven't indexed your item_id
fields.
Well this sounds more like what I want. Given t2.fielda, t2.fieldb,
t2.fieldc, any one (or all three) could be true, but frequently at least
one of the fields is false. Initially, all of the fields might be
unknown (thus NULL) for a given item until I am able to investigate the
items to determine TRUE/FALSE. I frequently have items that are
inactive, and thus unable to determine any of attributes in t2.
My end result needs to be a count of all the values in each field where
the value is TRUE, as opposed to FALSE or NULL.
You really have to look at your result sets. Sometimes it is
better to run multiple simple queries instead of one big complex query
to be sure you're getting the data you want and the query executes in a
reasonable amount of time.
Also, consider that NULL values are generally considered bad to
purposefully enter. Logically, It would be better to create one table
for each field and then create a record for each item_id as you need it
so you never have NULLs. The problem with that is one of performance if
you end up doing large number of JOINs. In that case, it might be
better to use integers instead of Boolean fields, since you have three
explicit states of TRUE, FALSE, and NOT YET DETERMINED.
Regarding the NULL Values:
I have been thinking that I might want to leave my NULLs as they are.
However, I will be concentrating mostly on the items that are active.
Inactive items are only counted as part of the total number of items.
Their attributes are a moot point, but technically FALSE. (If they are
inactive, the attributes are no longer TRUE in any case.)
I am counting only those fields that I know (through verification) to be
TRUE. I can use FALSE where the actual value is not known, and just
change the attribute to TRUE when I discover that to be the case. I
just need to be sure in my counts that I note the fact that FALSE values
include the unverified values (meaning some of those might actually be
true). Does that sound fairly logical to you?
Regarding the Table Layout:
The columns in t2 are ordered, essentially according to the category of
attributes involved. I had thought about using multiple tables, one for
each category of attributes. However, I chose a monolithic table for
the attributes to represent a single survey of each item. Each item
might be surveyed again in the future to determine any changes, which
would introduce a new version of the current table.
I'll tinker around with the queries a bit, and see what I come up with.
Thanks for the input.
On 9/13/06, Brandon Aiken <BAiken@winemantech.com> wrote:
I think I mistakenly sent this to General instead of Novice. Oops.
Oh well. I posted back to Novice.
Yeah, I either skipped over or forgot the bit in the OP about bools. Mea
culpa.
Hehe.
You should be able to use OR instead of AND in any logical expression.
Well this sounds more like what I want. Given t2.fielda, t2.fieldb,
t2.fieldc, any one (or all three) could be true, but frequently at least
one of the fields is false. Initially, all of the fields might be unknown
(thus NULL) for a given item until I am able to investigate the items to
determine TRUE/FALSE. I frequently have items that are inactive, and thus
unable to determine any of attributes in t2.My end result needs to be a count of all the values in each field where
the value is TRUE, as opposed to FALSE or NULL.Yeah, I would probably run 4 separate, simple queries. That will get you
the best performance since you're doing no JOINs and no composite queries.
There are actually 12 fields involved. I figured if I could just learn how
to handle the first few, I could probably take it from there.
If you need to enter the results into another table, try INSERT … to insert
the defaults and any primary key you have (like timestamp), then four UPDATE
… SELECT statements.
Cool. I'll play around with it a while, see what I come up with.
The real problem with NULLs is some of the (in my mind) nonsensical results
you get, especially with logical operators:
NULL AND TRUE => NULL
NULL OR TRUE => TRUE
NULL AND FALSE => FALSE
NULL OR FALSE => NULL
Plus you have to use IS instead of = since any NULL in an = expression
makes the result NULL (yes, this is an error in my previous queries). NULL
just has all these special cases. I find it much nicer to avoid it wherever
possible since it has somewhat unpredictable results.
Wow. Guess that could give me some interesting results, depending on what I
do! Thanks again for the input.
Don