Removing redundant itemsets

Started by Allan Kamauabout 18 years ago3 messagesgeneral
Jump to latest
#1Allan Kamau
allank@sanbi.ac.za

Hi all,
I have a plain sql problem (didn't know where else to post it).
I have a list of transactions (market basket) and I would like to select
non redundant longest possible patterns by eliminating
(creating/populating other table to contain only non redandant itemsets)
transactions having item lists which are fully included in at least one
other transaction.

(Am assuming all the items of all the transactions have met the minimum
support currently set at 1)

Below is a sample case, table schema and data(DDL and DML)

Transaction Itemset
'100' 'a','b','c','d'
'200' 'c','d'
'300' 'a','c','e'
'400' 'e','d'

On successful removal out of 'redanduant' or smaller transactions having
items contained in totality by at least one other transaction, the
transaction '200' would be weeded out as it's itemset {'c','d'} is
contained in '100' {'a','b','c','d'} transaction.

drop sequence if exists togo_seq cascade;
create sequence togo_seq;
drop table if exists togo cascade;
create table togo
(
id integer not null default nextval('togo_seq')
,tid char(3) not null
,item char(1) not null
,primary key(id)
,unique(tid,item)
)
;
insert into togo(tid,item)values('100','b');
insert into togo(tid,item)values('100','a');
insert into togo(tid,item)values('100','c');
insert into togo(tid,item)values('100','d');
insert into togo(tid,item)values('200','c');
insert into togo(tid,item)values('200','d');
insert into togo(tid,item)values('300','a');
insert into togo(tid,item)values('300','c');
insert into togo(tid,item)values('300','e');
insert into togo(tid,item)values('400','e');
insert into togo(tid,item)values('400','d');

Allan.

#2Allan Kamau
allank@sanbi.ac.za
In reply to: Allan Kamau (#1)
Re: Removing redundant itemsets

Thanks, I have "reworded" the email (replaced "transaction" with
"purchase"). The email has now been sent to the pgsql-sql mailing list.

Craig Ringer wrote:

Show quoted text

Allan Kamau wrote:

Hi all,
I have a plain sql problem (didn't know where else to post it).

pgsql-sql

I have a list of transactions (market basket) ...

Your use of the term "transaction" for a database entity is really
confusing, and will make it significantly harder for others to
understand and work with your schema. It might be worth renaming the
entity to avoid the conflict with the meaning of "transaction" as
"atomic unit of work" as controlled by BEGIN/COMMIT/ROLLBACK .

--
Craig Ringer

#3Craig Ringer
craig@2ndquadrant.com
In reply to: Allan Kamau (#1)
Re: Removing redundant itemsets

Allan Kamau wrote:

Hi all,
I have a plain sql problem (didn't know where else to post it).

pgsql-sql

I have a list of transactions (market basket) ...

Your use of the term "transaction" for a database entity is really
confusing, and will make it significantly harder for others to
understand and work with your schema. It might be worth renaming the
entity to avoid the conflict with the meaning of "transaction" as
"atomic unit of work" as controlled by BEGIN/COMMIT/ROLLBACK .

--
Craig Ringer