B-tree index row size limit

Started by Florian Weimerover 9 years ago2 messages
#1Florian Weimer
fw@deneb.enyo.de

The index row size limit reared its ugly head again.

My current use of PostgreSQL is to load structured data into it but
from sources I don't have control over, to support a wide range of
queries whose precise nature is not yet known to me. (Is this called
a data warehouse?)

Anyway, what happens from time to time is that some data which has
been processed successfully in the past suddenly failed to load
because there happens to be a very long string in it. I know how to
work around this, but it's still annoying when it happens, and the
workarounds may make it much, much harder to write efficient queries.

What it would it take to eliminate the B-tree index row size limit (or
rather, increase it to several hundred megabytes)? I don't care about
performance for index-based lookups for overlong columns, I just want
to be able to load arbitrary data and index it.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Florian Weimer (#1)
Re: B-tree index row size limit

On 10/09/2016 02:39 PM, Florian Weimer wrote:

What it would it take to eliminate the B-tree index row size limit (or
rather, increase it to several hundred megabytes)? I don't care about
performance for index-based lookups for overlong columns, I just want
to be able to load arbitrary data and index it.

A few ideas:

* Add support for "truncate" B-tree support functions. Long values
wouldn't be stored, but they would be cut at a suitable length. This
would complicate things when you have two values that only differ in the
truncated-away portion. You'd need to still be able to order them
correctly in the index, perhaps by fetching the actual value from the heap.

* Use TOAST for index datums. That would involve adding a whole new
toast table for the index, with the index for the toast table.

* Have something like TOAST, implemented within the B-tree AM. When a
large datum is stored, chop it into chunks that are stored in special
"toast" pages in the index.

* Add smarts to the planner, to support using an expression index even
if the predicate doesn't contain the expression verbatim. For example,
if you have an index on SUBSTR(column, 100), and a predicate "column =
'foo'", you could use the index, if the planner just knew enough about
SUBSTR to realize that.

* Don't do it. Use a hash index instead. If all goes well, it'll be
WAL-logged in PostgreSQL 10.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers