Combine multiple text search configuration
Hi,
I want to know if I can combine multiple text search configurations when I
tried to use FTS.
Is there any options like this:
*to_tsvector(['english', 'french'], document)*
Trying to create a new text configuration:
*Create text search configuration test (copy=simple)*
*Alter text search configuration test*
*add mapping for asciiword with english_stem,french_stem*
This query doesn't work. How can I combine multiple text search
configurations if I need more than one into my query to search a word?
Hi,
On 2017-11-06 09:17, hmidi slim wrote:
Hi,
I want to know if I can combine multiple text search configurations when
I tried to use FTS.
Is there any options like this:
*to_tsvector(['english', 'french'], document)*
*
*
Trying to create a new text configuration:
*Create text search configuration test (copy=simple)*
*Alter text search configuration test*
*add mapping for asciiword with english_stem,french_stem*
*
*
This query doesn't work. How can I combine multiple text search
configurations if I need more than one into my query to search a word?
what about using two indexes, one for each language? If your documents
can either be English OR French, the English OR the French vector should
match an English OR French tsquery.
It is not clear to me how combining two stemmers should practically work
since each word can only have one stem. If you have multilingual
documents or texts with code switching, you could also try combining the
two vectors both for the documents and the query:
(to_tsvector('english', document) || to_tsvector('french', document)) @@
(to_tsquery('english', query) || to_tsquery('french', query))
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Hi,
On 2017-11-06 09:17, hmidi slim wrote:
Hi,
I want to know if I can combine multiple text search configurations when
I tried to use FTS.
Is there any options like this:
*to_tsvector(['english', 'french'], document)*
*
*
Trying to create a new text configuration:
*Create text search configuration test (copy=simple)*
*Alter text search configuration test*
*add mapping for asciiword with english_stem,french_stem*
*
*
This query doesn't work. How can I combine multiple text search
configurations if I need more than one into my query to search a word?
what about using two indexes, one for each language? If your documents
can either be English OR French, the English OR the French vector should
match an English OR French tsquery.
It is not clear to me how combining two stemmers should practically work
since each word can only have one stem. If you have multilingual
documents or texts with code switching, you could also try combining the
two vectors both for the documents and the query:
(to_tsvector('english', document) || to_tsvector('french', document)) @@
(to_tsquery('english', query) || to_tsquery('french', query))
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Hi,
Thank for your proposition but when to use this query :
(to_tsvector('english', document) || to_tsvector('french', document)) @@
(to_tsquery('english', query) || to_tsquery('french', query))
I think that the performance decrease and not a good solution for big
amount of data. Is it?
2017-11-06 20:46 GMT+01:00 Johannes Graën <johannes@selfnet.de>:
Show quoted text
Hi,
On 2017-11-06 09:17, hmidi slim wrote:
Hi,
I want to know if I can combine multiple text search configurations when
I tried to use FTS.
Is there any options like this:
*to_tsvector(['english', 'french'], document)*
*
*
Trying to create a new text configuration:
*Create text search configuration test (copy=simple)*
*Alter text search configuration test*
*add mapping for asciiword with english_stem,french_stem*
*
*
This query doesn't work. How can I combine multiple text search
configurations if I need more than one into my query to search a word?what about using two indexes, one for each language? If your documents
can either be English OR French, the English OR the French vector should
match an English OR French tsquery.It is not clear to me how combining two stemmers should practically work
since each word can only have one stem. If you have multilingual
documents or texts with code switching, you could also try combining the
two vectors both for the documents and the query:(to_tsvector('english', document) || to_tsvector('french', document)) @@
(to_tsquery('english', query) || to_tsquery('french', query))
On 2017-11-07 08:27, hmidi slim wrote:
Hi,
Thank for your proposition but when to use this query :
(to_tsvector('english', document) || to_tsvector('french', document)) @@
(to_tsquery('english', query) || to_tsquery('french', query))
I think that the performance decrease and not a good solution for big
amount of data. Is it?
You have more lexems when you combine two languages, but not twice as
many as there will be some overlap. That means your index will also be
be bigger than a single language index. Anyhow I would expect this
variant to perform better than querying two single columns
simultaneously. Maybe one of the FTS developers could comment on this?
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 2017-11-07 08:27, hmidi slim wrote:
Hi,
Thank for your proposition but when to use this query :
(to_tsvector('english', document) || to_tsvector('french', document)) @@
(to_tsquery('english', query) || to_tsquery('french', query))
I think that the performance decrease and not a good solution for big
amount of data. Is it?
You have more lexems when you combine two languages, but not twice as
many as there will be some overlap. That means your index will also be
be bigger than a single language index. Anyhow I would expect this
variant to perform better than querying two single columns
simultaneously. Maybe one of the FTS developers could comment on this?
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Thu, 9 Nov 2017 09:11:07 +0100
Johannes Graën <johannes@selfnet.de> wrote:
On 2017-11-07 08:27, hmidi slim wrote:
Hi,
Thank for your proposition but when to use this query :
(to_tsvector('english', document) || to_tsvector('french',
document)) @@ (to_tsquery('english', query) || to_tsquery('french',
query)) I think that the performance decrease and not a good
solution for big amount of data. Is it?You have more lexems when you combine two languages, but not twice as
many as there will be some overlap. That means your index will also be
be bigger than a single language index. Anyhow I would expect this
variant to perform better than querying two single columns
simultaneously. Maybe one of the FTS developers could comment on this?
Hi,
You are right in assumption about index size. However, difference
between a shared index and two single indices depends on dictionaries,
because some them doesn't return lexemes for unknown words.
Unfortunately, there is no alternative way in PostgreSQL 10 or earlier
to do multilingual text processing.
I'm working on a patch for flexible full-text search configuration and
one of the problems I'm want to solve is multilingual search without
separate indices for each language. The patch allows combining output
of more than one dictionary using UNION operator.
Current version of the patch is a demonstration of new features and
syntax for FTS configuration. The syntax itself is still at the
discussion stage. You can check it out at pgsql-hackers mailing list if
you are interested in[1]/messages/by-id/20171019172409.731f52a7@asp437-24-g082ur/. Any feedback on the patch in terms of
internals, syntax, behavior or idea is welcome.
[1]: /messages/by-id/20171019172409.731f52a7@asp437-24-g082ur/
/messages/by-id/20171019172409.731f52a7@asp437-24-g082ur/
--
Aleksandr Parfenov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general