Re: UNIQUE constraint on KEY VALUE PAIRS
I am afraid I over-simplyfied my use-case. Sorry.
In fact, I am not using a character sequence, but a custom COMPOSITE TYPE;
part of which is a hstore:
CREATE TYPE SSTORE AS (scope TEXT, kvp hstore);
I created some functions and operators around it -- sort-of costly
operations for comparing SSTOREs. Well: more costly than it would be to
compare INTEGERs, I guess.
The sets of Key/Values held by SSTORE will show extensive similarities, that
is: common pairs of key/value pairs.
So, essentially it looks like this:
CREATE TABLE textseqs(the_key SSTORE)
CREATE UNIQUE INDEX ON textseqs USING BTREE( hash_sstore(the_key), the_key )
Does this make sense? I am seeking some sensible trade-off between
performance and flexibility...
Regards,
Rob
2011/5/14 David Johnston <polobo@yahoo.com>
Show quoted text
Could you index the reverse of the string so the unique part appears first?
On May 14, 2011, at 11:20, InterRob <rob.marjot@gmail.com> wrote:
Dear list,
I would be pleased if you could share some thoughts with me on the
following: say I wish to maintain a table with all distinct character
sequences (variable length) showing series with strong similarities.
Example:"abbbabacccdef"
"abbbabaccdcdf"
"abbbabaccdcgf"
...
"qwtrhdffdd"
...
"qwtrhdffdds"
...
"qwtrhdffddsspp"
"qwtrhdffddsspf"
"qwtrhdffddssph"
"qwtrhdffddsspL"
etc.Think of them as ordered values (array-like), as a set having many values
in common, in the same "elements" (that is: positions; in my application,
each position has some particular meaning -- the sequence represents a set
of particular settings)CREATE TABLE textseqs(txtseq TEXT)
What would be an efficient approach in enforcing a UNIQUE constraint?
I was thinking of using hashbuckets in a b-tree:
CREATE UNIQUE INDEX ON textseqs USING BTREE( hashtext(txtseq), txtseq )This index would "cache" hashes for each row. Upon inserting of a new
row, traversing the index involves the comparison of two single integers for
each node in the b-tree, until the actual hash value (if it exists) was
reached. Then, only within that bucket (the hashes won't be unique), the
more expensive string comparing is required; involving a sequencial
comparison of (potentially maaaany) characters. Yet, within that bucket, the
character series may be expected to show stronger differences than a plain
sorted list of all values in the table would have, indexed by a b-tree
index. Wouldn't traversing such a plain (non-composite, single column, on:
"txtseq") b-tree index involve a sequencial comparison of (potentially many)
characters *at EACH NODE* of the tree ? Or am I mistaken that each node is
filled with actual values from the txtseq column?Thank you for your input!
Cheers,
Rob