histogram
I have a column of 2 million float values from 0 to 1.
I would like to figure out how many values fit into buckets spaced by 0.10, e.g. from 0 to 0.10, from 0.10 to 0.20, etc.
What is the best way to do this?
Thanks, Joel
--------------------------------------------------------------------------
- for hire: mac osx device driver ninja, kernel extensions and usb drivers
---------------------+------------+---------------------------------------
http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont
---------------------+------------+---------------------------------------
Hi,
try something like this:
select
trunc(random() * 10.)/10.
, count(*)
from
generate_series(1,2000000)
group by 1 order by 2
regards
Thomas
Am 30.04.2011 18:37, schrieb Joel Reymont:
Show quoted text
I have a column of 2 million float values from 0 to 1.
I would like to figure out how many values fit into buckets spaced by 0.10, e.g. from 0 to 0.10, from 0.10 to 0.20, etc.
What is the best way to do this?
Thanks, Joel
--------------------------------------------------------------------------
- for hire: mac osx device driver ninja, kernel extensions and usb drivers
---------------------+------------+---------------------------------------
http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont
---------------------+------------+---------------------------------------
Thank you Thomas!
Is there a way for the code below to determine the number of rows in the table and use it?
Thanks, Joel
On Apr 30, 2011, at 5:48 PM, Thomas Markus wrote:
Hi,
try something like this:
select
trunc(random() * 10.)/10.
, count(*)
from
generate_series(1,2000000)
group by 1 order by 2
--------------------------------------------------------------------------
- for hire: mac osx device driver ninja, kernel extensions and usb drivers
---------------------+------------+---------------------------------------
http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont
---------------------+------------+---------------------------------------
What is the meaning of
group by 1 order by 2
e.g. what to the numbers 1 and 2 stand for?
What would change if I do the following?
group by 1 order by 1
On Apr 30, 2011, at 5:48 PM, Thomas Markus wrote:
Hi,
try something like this:
select
trunc(random() * 10.)/10.
, count(*)
from
generate_series(1,2000000)
group by 1 order by 2
--------------------------------------------------------------------------
- for hire: mac osx device driver ninja, kernel extensions and usb drivers
---------------------+------------+---------------------------------------
http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont
---------------------+------------+---------------------------------------
I think this should do what I want
select trunc(distance * 10.)/10., count(*)
from doc_ads
group by 1 order by 1
Thanks, Joel
--------------------------------------------------------------------------
- for hire: mac osx device driver ninja, kernel extensions and usb drivers
---------------------+------------+---------------------------------------
http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont
---------------------+------------+---------------------------------------
re: 1 and 2. They're horrible (imho) reference to the attributes of the
returned tuple. Or at best an exposure of the implementation. :)
Order by "2" if you want the most frequent (highest counts) of your
distances at the bottom of the output (or ordery by 2 desc) if you want
them at the top of your output.
Joel Reymont wrote:
Show quoted text
I think this should do what I want
select trunc(distance * 10.)/10., count(*)
from doc_ads
group by 1 order by 1Thanks, Joel
--------------------------------------------------------------------------
- for hire: mac osx device driver ninja, kernel extensions and usb drivers
---------------------+------------+---------------------------------------
http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont
---------------------+------------+---------------------------------------
Given that you are actively implementing the code that uses the 1 and 2 I
don't see how it is that egregious. When generating calculated fields it is
cleaner than the alternative:
Select trunc(distance * 10.)/10., count(*)
From doc_ads
Group by (trunc(distance * 10.))
Order by (trunc(distance * 10.))
It would be nice if you could do:
Select trunc(distance * 10.)/10. AS bin, count(*) AS frequency
From doc_ads
Group by bin
Order by bin
But I do not believe that is allowed (though I may have my syntax wrong...)
David J.
re: 1 and 2. They're horrible (imho) reference to the attributes of the
returned tuple. Or at best an exposure of the implementation. :)
Show quoted text
Joel Reymont wrote:
I think this should do what I want
select trunc(distance * 10.)/10., count(*)
from doc_ads
group by 1 order by 1Thanks, Joel
David Johnston wrote:
Given that you are actively implementing the code that uses the 1 and 2 I
don't see how it is that egregious. When generating calculated fields it is
cleaner than the alternative:Select trunc(distance * 10.)/10., count(*)
From doc_ads
Group by (trunc(distance * 10.))
Order by (trunc(distance * 10.))It would be nice if you could do:
Select trunc(distance * 10.)/10. AS bin, count(*) AS frequency
From doc_ads
Group by bin
Order by binBut I do not believe that is allowed (though I may have my syntax wrong...)
David J.
re: 1 and 2. They're horrible (imho) reference to the attributes of the
returned tuple. Or at best an exposure of the implementation. :)
Joel Reymont wrote:
I think this should do what I want
select trunc(distance * 10.)/10., count(*)
from doc_ads
group by 1 order by 1Thanks, Joel
I think we're supposed to bottom-post here.
I agree in the case of generated columns and old servers but you see the
practice more commonly than really necessary. But in 8.4 at least
select trunc(distance * 10.0 )/10.0 as histo, count(*) as tally
from d group by histo order by tally;
works just fine for me
hi,
group by 1 means group by first output column
order by 2 means order by second output column ascending
Am 30.04.2011 19:00, schrieb Joel Reymont:
Show quoted text
What is the meaning of
group by 1 order by 2
e.g. what to the numbers 1 and 2 stand for?
What would change if I do the following?
group by 1 order by 1
On Apr 30, 2011, at 5:48 PM, Thomas Markus wrote:
Hi,
try something like this:
select
trunc(random() * 10.)/10.
, count(*)
from
generate_series(1,2000000)
group by 1 order by 2--------------------------------------------------------------------------
- for hire: mac osx device driver ninja, kernel extensions and usb drivers
---------------------+------------+---------------------------------------
http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont
---------------------+------------+---------------------------------------