histogram

Started by Joel Reymontalmost 15 years ago9 messagesgeneral
Jump to latest
#1Joel Reymont
joelr1@gmail.com

I have a column of 2 million float values from 0 to 1.

I would like to figure out how many values fit into buckets spaced by 0.10, e.g. from 0 to 0.10, from 0.10 to 0.20, etc.

What is the best way to do this?

Thanks, Joel

--------------------------------------------------------------------------
- for hire: mac osx device driver ninja, kernel extensions and usb drivers
---------------------+------------+---------------------------------------
http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont
---------------------+------------+---------------------------------------

#2Thomas Markus
t.markus@proventis.net
In reply to: Joel Reymont (#1)
Re: histogram

Hi,

try something like this:

select
trunc(random() * 10.)/10.
, count(*)
from
generate_series(1,2000000)
group by 1 order by 2

regards
Thomas

Am 30.04.2011 18:37, schrieb Joel Reymont:

Show quoted text

I have a column of 2 million float values from 0 to 1.

I would like to figure out how many values fit into buckets spaced by 0.10, e.g. from 0 to 0.10, from 0.10 to 0.20, etc.

What is the best way to do this?

Thanks, Joel

--------------------------------------------------------------------------
- for hire: mac osx device driver ninja, kernel extensions and usb drivers
---------------------+------------+---------------------------------------
http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont
---------------------+------------+---------------------------------------

#3Joel Reymont
joelr1@gmail.com
In reply to: Thomas Markus (#2)
Re: histogram

Thank you Thomas!

Is there a way for the code below to determine the number of rows in the table and use it?

Thanks, Joel

On Apr 30, 2011, at 5:48 PM, Thomas Markus wrote:

Hi,

try something like this:

select
trunc(random() * 10.)/10.
, count(*)
from
generate_series(1,2000000)
group by 1 order by 2

--------------------------------------------------------------------------
- for hire: mac osx device driver ninja, kernel extensions and usb drivers
---------------------+------------+---------------------------------------
http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont
---------------------+------------+---------------------------------------

#4Joel Reymont
joelr1@gmail.com
In reply to: Thomas Markus (#2)
Re: histogram

What is the meaning of

group by 1 order by 2

e.g. what to the numbers 1 and 2 stand for?

What would change if I do the following?

group by 1 order by 1

On Apr 30, 2011, at 5:48 PM, Thomas Markus wrote:

Hi,

try something like this:

select
trunc(random() * 10.)/10.
, count(*)
from
generate_series(1,2000000)
group by 1 order by 2

--------------------------------------------------------------------------
- for hire: mac osx device driver ninja, kernel extensions and usb drivers
---------------------+------------+---------------------------------------
http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont
---------------------+------------+---------------------------------------

#5Joel Reymont
joelr1@gmail.com
In reply to: Thomas Markus (#2)
Re: histogram

I think this should do what I want

select trunc(distance * 10.)/10., count(*)
from doc_ads
group by 1 order by 1

Thanks, Joel

--------------------------------------------------------------------------
- for hire: mac osx device driver ninja, kernel extensions and usb drivers
---------------------+------------+---------------------------------------
http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont
---------------------+------------+---------------------------------------

#6Rob Sargent
robjsargent@gmail.com
In reply to: Joel Reymont (#5)
Re: histogram

re: 1 and 2. They're horrible (imho) reference to the attributes of the
returned tuple. Or at best an exposure of the implementation. :)

Order by "2" if you want the most frequent (highest counts) of your
distances at the bottom of the output (or ordery by 2 desc) if you want
them at the top of your output.

Joel Reymont wrote:

Show quoted text

I think this should do what I want

select trunc(distance * 10.)/10., count(*)
from doc_ads
group by 1 order by 1

Thanks, Joel

--------------------------------------------------------------------------
- for hire: mac osx device driver ninja, kernel extensions and usb drivers
---------------------+------------+---------------------------------------
http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont
---------------------+------------+---------------------------------------

#7David G. Johnston
david.g.johnston@gmail.com
In reply to: Rob Sargent (#6)
Re: histogram

Given that you are actively implementing the code that uses the 1 and 2 I
don't see how it is that egregious. When generating calculated fields it is
cleaner than the alternative:

Select trunc(distance * 10.)/10., count(*)

From doc_ads

Group by (trunc(distance * 10.))
Order by (trunc(distance * 10.))

It would be nice if you could do:

Select trunc(distance * 10.)/10. AS bin, count(*) AS frequency

From doc_ads

Group by bin
Order by bin

But I do not believe that is allowed (though I may have my syntax wrong...)

David J.

re: 1 and 2. They're horrible (imho) reference to the attributes of the

returned tuple. Or at best an exposure of the implementation. :)

Show quoted text

Joel Reymont wrote:

I think this should do what I want

select trunc(distance * 10.)/10., count(*)
from doc_ads
group by 1 order by 1

Thanks, Joel

#8Rob Sargent
robjsargent@gmail.com
In reply to: David G. Johnston (#7)
Re: histogram

David Johnston wrote:

Given that you are actively implementing the code that uses the 1 and 2 I
don't see how it is that egregious. When generating calculated fields it is
cleaner than the alternative:

Select trunc(distance * 10.)/10., count(*)
From doc_ads
Group by (trunc(distance * 10.))
Order by (trunc(distance * 10.))

It would be nice if you could do:

Select trunc(distance * 10.)/10. AS bin, count(*) AS frequency
From doc_ads
Group by bin
Order by bin

But I do not believe that is allowed (though I may have my syntax wrong...)

David J.

re: 1 and 2. They're horrible (imho) reference to the attributes of the

returned tuple. Or at best an exposure of the implementation. :)

Joel Reymont wrote:

I think this should do what I want

select trunc(distance * 10.)/10., count(*)
from doc_ads
group by 1 order by 1

Thanks, Joel

I think we're supposed to bottom-post here.

I agree in the case of generated columns and old servers but you see the
practice more commonly than really necessary. But in 8.4 at least

select trunc(distance * 10.0 )/10.0 as histo, count(*) as tally
from d group by histo order by tally;

works just fine for me

#9Thomas Markus
t.markus@proventis.net
In reply to: Joel Reymont (#4)
Re: histogram

hi,

group by 1 means group by first output column
order by 2 means order by second output column ascending

Am 30.04.2011 19:00, schrieb Joel Reymont:

Show quoted text

What is the meaning of

group by 1 order by 2

e.g. what to the numbers 1 and 2 stand for?

What would change if I do the following?

group by 1 order by 1

On Apr 30, 2011, at 5:48 PM, Thomas Markus wrote:

Hi,

try something like this:

select
trunc(random() * 10.)/10.
, count(*)
from
generate_series(1,2000000)
group by 1 order by 2

--------------------------------------------------------------------------
- for hire: mac osx device driver ninja, kernel extensions and usb drivers
---------------------+------------+---------------------------------------
http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont
---------------------+------------+---------------------------------------