Feature Request: bigtsvector

Started by CPTover 10 years ago4 messages

cpt@novozymes.com

over 10 years ago

Hi all;

We are running a multi-TB bioinformatics system on PostgreSQL and use a
denormalized schema in
places with a lot of tsvectors aggregated together for centralized
searching. This is
very important to the performance of the system. These aggregate many
documents (sometimes tens of thousands), many of which contain large
numbers of references to other documents. It isn't uncommon to have
tens of thousands of lexemes. The tsvectors hold mixed document id and
natural language search information (all f which comes in from the same
documents).

Recently we have started hitting the 1MB limit on tsvector size. We
have found it possible to
patch PostgreSQL to make the tsvector larger but this changes the
on-disk layout. How likely is
it that either the tsvector size could be increased in future versions
to allow for vectors up to toastable size (1GB logical)? I can't
imagine we are the only ones with such a problem. Since, I think,
changing the on-disk layout might not be such a good idea, maybe it
would be worth considering having a new bigtsvector type?

Btw, we've been very impressed with the extent that PostgreSQL has
tolerated all kinds of loads we have thrown at it.

Regards,
CPT

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Bruce Momjian

bruce@momjian.us

over 10 years ago

In reply to: CPT (#1)

Re: [GENERAL] Feature Request: bigtsvector

On Wed, Jun 17, 2015 at 07:58:21AM +0200, CPT wrote:

Hi all;

We are running a multi-TB bioinformatics system on PostgreSQL and
use a denormalized schema in
places with a lot of tsvectors aggregated together for centralized
searching. This is
very important to the performance of the system. These aggregate
many documents (sometimes tens of thousands), many of which contain
large numbers of references to other documents. It isn't uncommon
to have tens of thousands of lexemes. The tsvectors hold mixed
document id and natural language search information (all f which
comes in from the same documents).

Recently we have started hitting the 1MB limit on tsvector size. We
have found it possible to
patch PostgreSQL to make the tsvector larger but this changes the
on-disk layout. How likely is
it that either the tsvector size could be increased in future
versions to allow for vectors up to toastable size (1GB logical)? I
can't imagine we are the only ones with such a problem. Since, I
think, changing the on-disk layout might not be such a good idea,
maybe it would be worth considering having a new bigtsvector type?

Btw, we've been very impressed with the extent that PostgreSQL has
tolerated all kinds of loads we have thrown at it.

Can anyone on hackers answer this question from June?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Ildus Kurbangaliev

i.kurbangaliev@postgrespro.ru

over 10 years ago

In reply to: Bruce Momjian (#2)

Re: [GENERAL] Feature Request: bigtsvector

On Wed, 9 Sep 2015 10:52:02 -0400
Bruce Momjian <bruce@momjian.us> wrote:

On Wed, Jun 17, 2015 at 07:58:21AM +0200, CPT wrote:

Hi all;

We are running a multi-TB bioinformatics system on PostgreSQL and
use a denormalized schema in
places with a lot of tsvectors aggregated together for centralized
searching. This is
very important to the performance of the system. These aggregate
many documents (sometimes tens of thousands), many of which contain
large numbers of references to other documents. It isn't uncommon
to have tens of thousands of lexemes. The tsvectors hold mixed
document id and natural language search information (all f which
comes in from the same documents).

Recently we have started hitting the 1MB limit on tsvector size. We
have found it possible to
patch PostgreSQL to make the tsvector larger but this changes the
on-disk layout. How likely is
it that either the tsvector size could be increased in future
versions to allow for vectors up to toastable size (1GB logical)? I
can't imagine we are the only ones with such a problem. Since, I
think, changing the on-disk layout might not be such a good idea,
maybe it would be worth considering having a new bigtsvector type?

Btw, we've been very impressed with the extent that PostgreSQL has
tolerated all kinds of loads we have thrown at it.

Can anyone on hackers answer this question from June?

Hi, I'm working on patch now that removes this limit without changes (or
small changes) of on-disk layout. I think it'll be ready during this
month.

----
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com <http://www.postgrespro.com/>
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Bruce Momjian

bruce@momjian.us

over 10 years ago

In reply to: Ildus Kurbangaliev (#3)

Re: [GENERAL] Feature Request: bigtsvector

On Wed, Sep 9, 2015 at 06:14:28PM +0300, Ildus Kurbangaliev wrote:

On Wed, 9 Sep 2015 10:52:02 -0400
Bruce Momjian <bruce@momjian.us> wrote:

On Wed, Jun 17, 2015 at 07:58:21AM +0200, CPT wrote:

Hi all;

We are running a multi-TB bioinformatics system on PostgreSQL and
use a denormalized schema in
places with a lot of tsvectors aggregated together for centralized
searching. This is
very important to the performance of the system. These aggregate
many documents (sometimes tens of thousands), many of which contain
large numbers of references to other documents. It isn't uncommon
to have tens of thousands of lexemes. The tsvectors hold mixed
document id and natural language search information (all f which
comes in from the same documents).

Recently we have started hitting the 1MB limit on tsvector size. We
have found it possible to
patch PostgreSQL to make the tsvector larger but this changes the
on-disk layout. How likely is
it that either the tsvector size could be increased in future
versions to allow for vectors up to toastable size (1GB logical)? I
can't imagine we are the only ones with such a problem. Since, I
think, changing the on-disk layout might not be such a good idea,
maybe it would be worth considering having a new bigtsvector type?

Btw, we've been very impressed with the extent that PostgreSQL has
tolerated all kinds of loads we have thrown at it.

Can anyone on hackers answer this question from June?

Hi, I'm working on patch now that removes this limit without changes (or
small changes) of on-disk layout. I think it'll be ready during this
month.

Oh, great, thanks.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers