playing with timestamp entries

Started by Dale Walkerabout 25 years ago3 messagesgeneral

dale@icr.com.au

about 25 years ago

I record our radius logs in a firly basic table, schema is as follows:
----------------------------------------------------------
CREATE TABLE "history_new" (
"username" character varying(50) NOT NULL,
"time_stamp" int4 NOT NULL,
"acctstatustype" character varying(8) NOT NULL ,
"acctdelay" int2 NOT NULL,
"acctinputoctets" int4 ,
"acctoutputoctets" int4 ,
"acctsessionid" character varying(30),
"acctsessiontime" int4 ,
"acctterminatecause" character varying(50),
"nasidentifier" character varying(22),
"nasport" character varying(4),
"framedipaddress" character varying(16),
"callingstationid" character varying(16),
"ascenddatarate" character varying(16),
"calledstationid" character varying(16)
);
-----------------------------------------------------------

I then create an index on the username column by:

---------------------------------------
CREATE INDEX "i_h_uh" on HISTORY using hash (username);
---------------------------------------

I use the 'hash' type as queries regarding usage will always be of the
form "select ...... where username='xxx';"

I also calculate a 'summary' in the form of a table I call sumlog:

-----------------------------------------------------------
CREATE TABLE "sumlog" (
"username" character varying(8) NOT NULL,
"period" character varying(8) NOT NULL,
"sumtime" int4 DEFAULT '0' ,
"mbup" float8 DEFAULT '0.0' ,
"mbdn" float8 DEFAULT '0.0' );
------------------------------------------------------------

This table is populated by the following query:
-----------------------------------------------------
insert into sumlog
select s.username,
to_char(timestamp(h.time_stamp),'YYYY-MM') as date,
sum(h.acctsessiontime),
sum(float8(h.acctinputoctets)/1000000),
sum(float8(h.acctoutputoctets)/1000000)
from subscribers as s,history as h
where s.username=h.username
group by s.username,date;
-----------------------------------------------------------

This works fine, but as the database size is constantly growing the
summary table takes a while to calculate...

I was thinking of doing the following:
1. only update 'sumlog' for the current period (eg. 2001-04)
2. adding an index on the timestamp column to speed-up the query for
the insert into sumlog.

My questions are:
1. is it possible to create an index entry on the function applied to
the time_stamp.
eg. something of the form [ create index "i_ts" on history
(to_char(timestamp(h.time_stamp),'YYYY-MM')) ]
2. what is the best way to access the data from the history table for a
known period..
eg. knowing period='2001-04' is there a better function to use than
'to_char' against the timestamp, any sort of indexing I should use,
etc...

I've been going around in circles, and I'm sure I've missed some
basic/common-sense sort of step, but now I'm running out of time to
spend on it....

Has anyone here done anything similar??

--
Dale Walker < dale@icr.com.au >
Independent Computer Retailers (ICR) Pty Ltd
http://www.icr.com.au/

Tom Lane

tgl@sss.pgh.pa.us

about 25 years ago

In reply to: Dale Walker (#1)

Re: playing with timestamp entries

Dale Walker <dale@icr.com.au> writes:

I use the 'hash' type as queries regarding usage will always be of the
form "select ...... where username='xxx';"

Use a btree anyway. Postgres' btree implementation is much better than
its hash index implementation.

insert into sumlog
select s.username,
to_char(timestamp(h.time_stamp),'YYYY-MM') as date,
sum(h.acctsessiontime),
sum(float8(h.acctinputoctets)/1000000),
sum(float8(h.acctoutputoctets)/1000000)
from subscribers as s,history as h
where s.username=h.username
group by s.username,date;

This works fine, but as the database size is constantly growing the
summary table takes a while to calculate...

What plan does EXPLAIN show for this query?

regards, tom lane

Dale Walker

dale@icr.com.au

about 25 years ago

In reply to: Dale Walker (#1)

Re: playing with timestamp entries

Tom Lane wrote:

Dale Walker <dale@icr.com.au> writes:

I use the 'hash' type as queries regarding usage will always be of the
form "select ...... where username='xxx';"

Use a btree anyway. Postgres' btree implementation is much better than
its hash index implementation.

OK, I'll give that a whirl...

insert into sumlog
select s.username,
to_char(timestamp(h.time_stamp),'YYYY-MM') as date,
sum(h.acctsessiontime),
sum(float8(h.acctinputoctets)/1000000),
sum(float8(h.acctoutputoctets)/1000000)
from subscribers as s,history as h
where s.username=h.username
group by s.username,date;

This works fine, but as the database size is constantly growing the
summary table takes a while to calculate...

What plan does EXPLAIN show for this query?

regards, tom lane

psql:zz.sql:7: NOTICE: QUERY PLAN:

Aggregate (cost=349984.03..365862.83 rows=127030 width=40)
-> Group (cost=349984.03..356335.55 rows=1270304 width=40)
-> Sort (cost=349984.03..349984.03 rows=1270304 width=40)
-> Hash Join (cost=27.35..87635.90 rows=1270304
width=40)
-> Seq Scan on history h (cost=0.00..36786.04
rows=1270304 width=28)
-> Hash (cost=25.28..25.28 rows=828 width=12)
-> Seq Scan on subscribers s
(cost=0.00..25.28 rows=828 width=12)

EXPLAIN

----------

The way I read this, I think my biggest problem is in the
sorting/grouping...

--
Dale Walker < dale@icr.com.au >
Independent Computer Retailers (ICR) Pty Ltd
http://www.icr.com.au/