Values larger than 1/3 of a buffer page cannot be indexed.

Started by Viktor Nagyabout 15 years ago8 messagesgeneral

viktor.nagy@toolpart.hu

about 15 years ago

hi,

when trying to insert a long-long value, I get the following error:

index row size 3120 exceeds maximum 2712 for index "ir_translation_ltns"
HINT: Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text
indexing.

is there a way to generate this recommended function index of an md5 hash on
an already existing database and tables (filled with data)?

thanks,
Viktor

Dmitriy Igrishin

dmitigr@gmail.com

about 15 years ago

In reply to: Viktor Nagy (#1)

Re: Values larger than 1/3 of a buffer page cannot be indexed.

Hey Viktor,

2011/3/13 Viktor Nagy <viktor.nagy@toolpart.hu>

hi,

when trying to insert a long-long value, I get the following error:

index row size 3120 exceeds maximum 2712 for index "ir_translation_ltns"
HINT: Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text
indexing.

is there a way to generate this recommended function index of an md5 hash
on an already existing database and tables (filled with data)?

Just create index this way, e.g.
CREATE INDEX ir_translation_ltns ON tab ((md5(col)));
where "tab" and "col" are table and column of which you want
to create btree index.

thanks,
Viktor

--
// Dmitriy.

Brian Hirt

bhirt@me.com

about 15 years ago

In reply to: Dmitriy Igrishin (#2)

Re: Values larger than 1/3 of a buffer page cannot be indexed.

On Mar 13, 2011, at 12:05 PM, Dmitriy Igrishin wrote:

Hey Viktor,

2011/3/13 Viktor Nagy <viktor.nagy@toolpart.hu>
hi,

when trying to insert a long-long value, I get the following error:

index row size 3120 exceeds maximum 2712 for index "ir_translation_ltns"
HINT: Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text indexing.

is there a way to generate this recommended function index of an md5 hash on an already existing database and tables (filled with data)?
Just create index this way, e.g.
CREATE INDEX ir_translation_ltns ON tab ((md5(col)));
where "tab" and "col" are table and column of which you want
to create btree index.

This probably goes without saying, but you'll have to use col = md5('blahblahblahblah') in your qualifiers to get the benefit of the index.

--brian

Rob Sargent

robjsargent@gmail.com

about 15 years ago

In reply to: Brian Hirt (#3)

Re: Values larger than 1/3 of a buffer page cannot be indexed.

Brian Hirt wrote:

On Mar 13, 2011, at 12:05 PM, Dmitriy Igrishin wrote:

Hey Viktor,

2011/3/13 Viktor Nagy <viktor.nagy@toolpart.hu
<mailto:viktor.nagy@toolpart.hu>>

hi,

when trying to insert a long-long value, I get the following error:

index row size 3120 exceeds maximum 2712 for index
"ir_translation_ltns"
HINT: Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use
full text indexing.

is there a way to generate this recommended function index of an
md5 hash on an already existing database and tables (filled with
data)?

Just create index this way, e.g.
CREATE INDEX ir_translation_ltns ON tab ((md5(col)));
where "tab" and "col" are table and column of which you want
to create btree index.

This probably goes without saying, but you'll have to use col =
md5('blahblahblahblah') in your qualifiers to get the benefit of the
index.

--brian

Unless the point is to guarantee uniqueness of the "long-long value"s.

Merlin Moncure

mmoncure@gmail.com

about 15 years ago

In reply to: Rob Sargent (#4)

Re: Values larger than 1/3 of a buffer page cannot be indexed.

On Sun, Mar 13, 2011 at 4:37 PM, Rob Sargent <robjsargent@gmail.com> wrote:

Brian Hirt wrote:

On Mar 13, 2011, at 12:05 PM, Dmitriy Igrishin wrote:

Hey Viktor,

2011/3/13 Viktor Nagy <viktor.nagy@toolpart.hu
<mailto:viktor.nagy@toolpart.hu>>

hi,

when trying to insert a long-long value, I get the following error:

index row size 3120 exceeds maximum 2712 for index
"ir_translation_ltns"
HINT: Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use
full text indexing.

is there a way to generate this recommended function index of an
md5 hash on an already existing database and tables (filled with
data)?

Just create index this way, e.g.
CREATE INDEX ir_translation_ltns ON tab ((md5(col)));
where "tab" and "col" are table and column of which you want
to create btree index.

This probably goes without saying, but you'll have to use col =
md5('blahblahblahblah') in your qualifiers to get the benefit of the index.

--brian

Unless the point is to guarantee uniqueness of the "long-long value"s.

md5 will do that too: the main thing you lose going to hash indexing
is ordering.

merlin

Viktor Nagy

viktor.nagy@toolpart.hu

about 15 years ago

In reply to: Dmitriy Igrishin (#2)

Re: Values larger than 1/3 of a buffer page cannot be indexed.

thanks, this worked.

On Sun, Mar 13, 2011 at 7:05 PM, Dmitriy Igrishin <dmitigr@gmail.com> wrote:

Show quoted text

Hey Viktor,

2011/3/13 Viktor Nagy <viktor.nagy@toolpart.hu>

hi,

when trying to insert a long-long value, I get the following error:

index row size 3120 exceeds maximum 2712 for index "ir_translation_ltns"
HINT: Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text
indexing.

is there a way to generate this recommended function index of an md5 hash
on an already existing database and tables (filled with data)?

Just create index this way, e.g.
CREATE INDEX ir_translation_ltns ON tab ((md5(col)));
where "tab" and "col" are table and column of which you want
to create btree index.

thanks,
Viktor

--
// Dmitriy.

Craig Ringer

craig@2ndquadrant.com

about 15 years ago

In reply to: Merlin Moncure (#5)

Re: Values larger than 1/3 of a buffer page cannot be indexed.

On 03/14/2011 09:25 PM, Merlin Moncure wrote:

Unless the point is to guarantee uniqueness of the "long-long value"s.

md5 will do that too: the main thing you lose going to hash indexing
is ordering.

MD5 will *probably* guarantee the uniqueness of the values. Personally
I'm not a big fan of betting on that, and tend to like to test for
equality against the hash first and then, if the hashes are equal,
against the values.

I have several md5 collisions in one of my tables, so it's far from
impossible. In fact, it's much more likely than you'd expect thanks to
the birthday paradox.

--
Craig Ringer

Merlin Moncure

mmoncure@gmail.com

about 15 years ago

In reply to: Craig Ringer (#7)

Re: Values larger than 1/3 of a buffer page cannot be indexed.

On Mon, Mar 28, 2011 at 10:28 PM, Craig Ringer
<craig@postnewspapers.com.au> wrote:

On 03/14/2011 09:25 PM, Merlin Moncure wrote:

Unless the point is to guarantee uniqueness of the "long-long value"s.

md5 will do that too: the main thing you lose going to hash indexing
is ordering.

MD5 will *probably* guarantee the uniqueness of the values. Personally I'm
not a big fan of betting on that, and tend to like to test for equality
against the hash first and then, if the hashes are equal, against the
values.

I have several md5 collisions in one of my tables, so it's far from
impossible. In fact, it's much more likely than you'd expect thanks to the
birthday paradox.

I'll agree that 128 bit key is just on the cusp of discomfort if you
have a large data set and/or want to be really safe from random
collision, but any larger key and it's just worry for the sake of
worrying.

You mentioned the birthday attack -- let's consult the chart. For MD5
(128 bits), if you are ok with 10^-15 chance of *one* collision (odds
1 in 1,000,000,000,000,000), you can safely hash up to 100,000,000,000
unique values. Of course, md5 is not perfectly distributing so the
numbers are not in fact that good, but those are pretty tall odds.
If you are concerned about safety, you are much better off jumping to
sha-1 (or sha-256 if you're really nervous) than comparing the base
string, since your chance of random bump is now well below a number of
other low probability events, such as a drive bit error rate (meaning,
your base string comparison is useless) or getting hit by a comet.

Now, if you're worried about intentionally forged data, that's
different, but it doesn't sound like that's the case here. When NIST
hash competition resolves (my money is on skein), you'll likely be
able to to just pick an output size that matches your requirements.

merlin