querying both text and non-text properties

Started by Rob Nikanderover 7 years ago4 messagesgeneral
Jump to latest
#1Rob Nikander
rob.nikander@gmail.com

Hi,

I’ve got an application where I’d like to search a collection of objects based on various properties, some text and others non-text (bools, enums, ints, etc). I’ve used full text search before, following the PG docs to set up a index on a ts_vector. And of course I’ve used normal indexes before for accelerating basic queries that use non-text columns.

Any recommendations on combining the two? For example, if I search for objects where the description text matches [some ts_query] and the color = red, I can imagine putting the color property into the text index somehow - maybe with tokens like: ’color_red’, ‘color_blue’, but for something like an integer … ?

Rob

#2Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Rob Nikander (#1)
Re: querying both text and non-text properties

Rob Nikander wrote:

I’ve got an application where I’d like to search a collection of objects
based on various properties, some text and others non-text (bools, enums,
ints, etc). I’ve used full text search before, following the PG docs to
set up a index on a ts_vector. And of course I’ve used normal indexes
before for accelerating basic queries that use non-text columns.

Any recommendations on combining the two? For example, if I search for
objects where the description text matches [some ts_query] and the
color = red, I can imagine putting the color property into the text
index somehow - maybe with tokens like: ’color_red’, ‘color_blue’,
but for something like an integer … ?

You have two options:

A combined index:

CREATE EXTENSION btree_gin;
CREATE INDEX ON fulltext USING gin (to_tsvector('english', doc), color);

That is the perfect match for a query with

WHERE color = 'red' AND to_tsvector('german', doc) @@ to_tsquery('english', 'word');

But you can also create two indexes:

CREATE INDEX ON fulltext USING gin (to_tsvector('english', doc));
CREATE INDEX ON fulltext (color);

Then you don't need the extension, and PostgreSQL can still use them for the search,
either only one of them if the condition is selective enough, or a "BitmapAnd" of both.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

#3Rob Nikander
rob.nikander@gmail.com
In reply to: Laurenz Albe (#2)
Re: querying both text and non-text properties

On Dec 4, 2018, at 4:59 PM, Laurenz Albe <laurenz.albe@cybertec.at> wrote:

You have two options:

A combined index:

CREATE EXTENSION btree_gin;
CREATE INDEX ON fulltext USING gin (to_tsvector('english', doc), color);

That is the perfect match for a query with

WHERE color = 'red' AND to_tsvector('german', doc) @@ to_tsquery('english', 'word');

But you can also create two indexes:

CREATE INDEX ON fulltext USING gin (to_tsvector('english', doc));
CREATE INDEX ON fulltext (color);

Then you don't need the extension, and PostgreSQL can still use them for the search,
either only one of them if the condition is selective enough, or a "BitmapAnd" of both.

Thanks! I will try both these methods and compare the performance.

Rob

#4Tony Shelver
tshelver@gmail.com
In reply to: Rob Nikander (#3)
Re: querying both text and non-text properties

I would suggest doing testing out btree_gin with a non-insignificant amount
of data before going ahead with it.

I did a test case last month, and the size of the generated index was
_much_ bigger than the base table.
The case involved a compound key if 1 int column and 1 timestamp range
column.

On Wed, 5 Dec 2018 at 16:02, Rob Nikander <rob.nikander@gmail.com> wrote:

Show quoted text

On Dec 4, 2018, at 4:59 PM, Laurenz Albe <laurenz.albe@cybertec.at>

wrote:

You have two options:

A combined index:

CREATE EXTENSION btree_gin;
CREATE INDEX ON fulltext USING gin (to_tsvector('english', doc), color);

That is the perfect match for a query with

WHERE color = 'red' AND to_tsvector('german', doc) @@

to_tsquery('english', 'word');

But you can also create two indexes:

CREATE INDEX ON fulltext USING gin (to_tsvector('english', doc));
CREATE INDEX ON fulltext (color);

Then you don't need the extension, and PostgreSQL can still use them for

the search,

either only one of them if the condition is selective enough, or a

"BitmapAnd" of both.

Thanks! I will try both these methods and compare the performance.

Rob