Questions about TSearch2 and PG 8.2

Started by Markus Wollnyalmost 19 years ago3 messagesgeneral
Jump to latest
#1Markus Wollny
Markus.Wollny@computec.de

Hello!

I'm in the process of upgrading our PostgreSQL 8.1 databases to PostgreSQL 8.2.4. I have stumbled over a minor issue with the upgrade and some helpful suggestions here:
http://people.planetpostgresql.org/xzilla/index.php?/archives/291-The-pain-that-is-tsearch2-8.1-8.2-upgrading.html

I shall try tonight with an plain SQL dump, but as some of my DBs are quite large, I usually use the custom dump format. As I would like to move the tsearch2-stuff in ist own schema as suggested, I tried using a restore list. I'd like to report that everything works as expected, but I've got a slight problem with the custom schema part. I created the target-db, created a schema tsearch2 and installed the tsearch2-functions, operators, configuration and whatnot into this new schema. Then I edited the restore list so that the tsearch2-bits would not be created from the dump file again. However, the binary-dump tries to create the textindex-columns with a tsvector-type which explicitly references the public schema.

Instead of

CREATE TABLE someschema.article
(
id integer,
mytext text,
idxfti tsvector
);

it tries to create the table like this

CREATE TABLE someschema.article
(
id integer,
mytext text,
idxfti public.tsvector
);

As the tsvector-type is defined in the tsearch2-schema, this is bound to fail, even with the search_path set to include the tsearch2-schema. I assume that this happens because the table article is not in the same schema as the original tsvector-type and the default search_path is being ignored on the dump in order to be on the safe side. This "double-checking" breaks the migration in my case, however, so is there some way that would allow me to change the table definition on restore from using just tsvector instead of the explicit public.tsvector? I already tried editing the binary dump, but that just resulted in a corrupted dump-file. I there's no other way, I'll go the plain dump route, of course, but I'd just like to check this issue.

My second question concerns the new Gin (Generalized Inverted Index) index type. Is it stable enough for production yet and would it yield a high enough performance gain in comparison the GiST? Does it make much sense using a Gin-index alongside the GiST-one? Would we need to change anything in the application code in order to make use of Gin - like using

where idxfti @> to_tsquery('default_german', 'Fundstück')

instead of

where idxfti @@ to_tsquery('default_german', 'Fundstück')

? The docs here http://www.sai.msu.su/~megera/wiki/Gin are still a bit thin, so any hint to some further examples would be greatly appreciated.

Kind regards

Markus

Computec Media AG
Sitz der Gesellschaft und Registergericht: Fürth (HRB 8818)
Vorstandsmitglieder: Johannes S. Gözalan (Vorsitzender) und Niels Herrmann
Vorsitzender des Aufsichtsrates: Jürg Marquard
Umsatzsteuer-Identifikationsnummer: DE 812 575 276

#2Oleg Bartunov
oleg@sai.msu.su
In reply to: Markus Wollny (#1)
Re: Questions about TSearch2 and PG 8.2

On Mon, 30 Apr 2007, Markus Wollny wrote:

Hello!

I'm in the process of upgrading our PostgreSQL 8.1 databases to PostgreSQL 8.2.4. I have stumbled over a minor issue with the upgrade and some helpful suggestions here:
http://people.planetpostgresql.org/xzilla/index.php?/archives/291-The-pain-that-is-tsearch2-8.1-8.2-upgrading.html

I shall try tonight with an plain SQL dump, but as some of my DBs are quite large, I usually use the custom dump format. As I would like to move the tsearch2-stuff in ist own schema as suggested, I tried using a restore list. I'd like to report that everything works as expected, but I've got a slight problem with the custom schema part. I created the target-db, created a schema tsearch2 and installed the tsearch2-functions, operators, configuration and whatnot into this new schema. Then I edited the restore list so that the tsearch2-bits would not be created from the dump file again. However, the binary-dump tries to create the textindex-columns with a tsvector-type which explicitly references the public schema.

Instead of

CREATE TABLE someschema.article
(
id integer,
mytext text,
idxfti tsvector
);

it tries to create the table like this

CREATE TABLE someschema.article
(
id integer,
mytext text,
idxfti public.tsvector
);

As the tsvector-type is defined in the tsearch2-schema, this is bound to fail, even with the search_path set to include the tsearch2-schema. I assume that this happens because the table article is not in the same schema as the original tsvector-type and the default search_path is being ignored on the dump in order to be on the safe side. This "double-checking" breaks the migration in my case, however, so is there some way that would allow me to change the table definition on restore from using just tsvector instead of the explicit public.tsvector? I already tried editing the binary dump, but that just resulted in a corrupted dump-file. I there's no other way, I'll go the plain dump route, of course, but I'd just like to check this issue.

I think you need plain sql dump.

My second question concerns the new Gin (Generalized Inverted Index) index type. Is it stable enough for production yet and would it yield a high enough performance gain in comparison the GiST? Does it make much sense using a Gin-index alongside the GiST-one? Would we need to change anything in the application code in order to make use of Gin - like using

where idxfti @> to_tsquery('default_german', 'Fundst?ck')

instead of

where idxfti @@ to_tsquery('default_german', 'Fundst?ck')

No, use @@ operator with Gin as well, you may need @@@ operator for
Gin, if you use weights in tsquery. btw, read
http://www.sai.msu.su/~megera/postgres/fts/doc for information about
Gin+Gist index. It's true, that the best combination is
GiST for online stuff and Gin for archived. Also, if you read russian
there are some papers available http://www.sai.msu.su/~megera/postgres/talks/

? The docs here http://www.sai.msu.su/~megera/wiki/Gin are still a bit thin, so any hint to some further examples would be greatly appreciated.

See above

Kind regards

Markus

Computec Media AG
Sitz der Gesellschaft und Registergericht: F?rth (HRB 8818)
Vorstandsmitglieder: Johannes S. G?zalan (Vorsitzender) und Niels Herrmann
Vorsitzender des Aufsichtsrates: J?rg Marquard
Umsatzsteuer-Identifikationsnummer: DE 812 575 276

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org/

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#3Robert Treat
xzilla@users.sourceforge.net
In reply to: Markus Wollny (#1)
Re: Questions about TSearch2 and PG 8.2

On Monday 30 April 2007 13:20, Markus Wollny wrote:

Hello!

I'm in the process of upgrading our PostgreSQL 8.1 databases to PostgreSQL
8.2.4. I have stumbled over a minor issue with the upgrade and some helpful
suggestions here:
http://people.planetpostgresql.org/xzilla/index.php?/archives/291-The-pain-
that-is-tsearch2-8.1-8.2-upgrading.html

I shall try tonight with an plain SQL dump, but as some of my DBs are quite
large, I usually use the custom dump format. As I would like to move the
tsearch2-stuff in ist own schema as suggested, I tried using a restore
list. I'd like to report that everything works as expected, but I've got a
slight problem with the custom schema part. I created the target-db,
created a schema tsearch2 and installed the tsearch2-functions, operators,
configuration and whatnot into this new schema. Then I edited the restore
list so that the tsearch2-bits would not be created from the dump file
again. However, the binary-dump tries to create the textindex-columns with
a tsvector-type which explicitly references the public schema.

Instead of

CREATE TABLE someschema.article
(
id integer,
mytext text,
idxfti tsvector
);

it tries to create the table like this

CREATE TABLE someschema.article
(
id integer,
mytext text,
idxfti public.tsvector
);

As the tsvector-type is defined in the tsearch2-schema, this is bound to
fail, even with the search_path set to include the tsearch2-schema. I
assume that this happens because the table article is not in the same
schema as the original tsvector-type and the default search_path is being
ignored on the dump in order to be on the safe side. This "double-checking"
breaks the migration in my case, however, so is there some way that would
allow me to change the table definition on restore from using just tsvector
instead of the explicit public.tsvector? I already tried editing the binary
dump, but that just resulted in a corrupted dump-file. I there's no other
way, I'll go the plain dump route, of course, but I'd just like to check
this issue.

I think you want to split it into 2 parts... do a schema only dump in plain
sql to get the schema loaded, then use the custom format to pull the data
over.

My second question concerns the new Gin (Generalized Inverted Index) index
type. Is it stable enough for production yet and would it yield a high
enough performance gain in comparison the GiST? Does it make much sense
using a Gin-index alongside the GiST-one? Would we need to change anything
in the application code in order to make use of Gin - like using

where idxfti @> to_tsquery('default_german', 'Fundstück')

instead of

where idxfti @@ to_tsquery('default_german', 'Fundstück')

? The docs here http://www.sai.msu.su/~megera/wiki/Gin are still a bit
thin, so any hint to some further examples would be greatly appreciated.

Gin is certainly stable enough for production use, we're using it on some
projects at OmniTI at least.

I've got some gin/gist information up as well at
http://people.planetpostgresql.org/xzilla/index.php?/archives/278-PostgreSQL-full-text-search-testing.html,
also be sure check the trackback links... and also I think magnus had a post
on gin/gist wrt the postgresql.org website.

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL