Why is there a tsquery data type?

Started by Bruce Momjianover 18 years ago15 messages
#1Bruce Momjian
bruce@momjian.us

Why does text search need a tsquery data type? I realize it needs
tsvector so it can create indexes and updated trigger columns, but it
seems tsquery could instead just be a simple text string.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#1)
Re: Why is there a tsquery data type?

Bruce Momjian <bruce@momjian.us> writes:

Why does text search need a tsquery data type? I realize it needs
tsvector so it can create indexes and updated trigger columns, but it
seems tsquery could instead just be a simple text string.

By that logic, we don't need any data types other than text.

regards, tom lane

#3Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#2)
Re: Why is there a tsquery data type?

Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Why does text search need a tsquery data type? I realize it needs
tsvector so it can create indexes and updated trigger columns, but it
seems tsquery could instead just be a simple text string.

By that logic, we don't need any data types other than text.

What is tsquery giving us that text would not?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#3)
Re: Why is there a tsquery data type?

Bruce Momjian <bruce@momjian.us> writes:

Tom Lane wrote:

By that logic, we don't need any data types other than text.

What is tsquery giving us that text would not?

A preprocessed representation that can be compared to tsvector
efficiently.

Now, if you'd asked whether we need *both* tsvector and tsquery,
that'd be a fair question. I'm not 100% clear on what the differences
are, but they seem pretty subtle. Do you think that having only
one datatype would be clearer?

regards, tom lane

#5Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#4)
Re: Why is there a tsquery data type?

Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Tom Lane wrote:

By that logic, we don't need any data types other than text.

What is tsquery giving us that text would not?

A preprocessed representation that can be compared to tsvector
efficiently.

Now, if you'd asked whether we need *both* tsvector and tsquery,
that'd be a fair question. I'm not 100% clear on what the differences
are, but they seem pretty subtle. Do you think that having only
one datatype would be clearer?

There is no question things would be clearer with only one text search
data type. The only value I can see to having a tsquery data type is
that you can store a tsquery value in a column, but why would that be
much better than just storing it in a TEXT field?

Internally I assume you would have to generate a tsquery structure from
a TEXT string, so the idea of a query representation wouldn't go away;
it would just be internal.

The one thing we would lose is the ability to process the query string
with a named configuration. If we always cast to TEXT, I assume we
would always be using "default_text_search_config", and I am a little
worried about queries in triggers that have to wire-down the
configuration name. As I understand it the tsquery goes through the
configuration just like the tsvector.

Right now you can already do:

'query' @@ 'boy girl'::tsvector

and the system casts your text string to tsquery automatically. Perhaps
we just need to minimize tsquery in the documentation and mention its
special purpose.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#6Gregory Stark
stark@enterprisedb.com
In reply to: Bruce Momjian (#5)
Re: Why is there a tsquery data type?

"Bruce Momjian" <bruce@momjian.us> writes:

There is no question things would be clearer with only one text search
data type. The only value I can see to having a tsquery data type is
that you can store a tsquery value in a column, but why would that be
much better than just storing it in a TEXT field?

When you try storing a tsquery in a column does it alert you if you have an
invalid syntax at that point? Storing it as text would mean not finding out
until you try to use the query.

Is converting a text query into the internal format faster or less memory
intensive than converting text into the internal representation? When you run
a query like "WHERE '...' @@ col" if there wasn't a tsquery data type then
'...' would have to be parsed over and over again for each row.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#7Bruce Momjian
bruce@momjian.us
In reply to: Gregory Stark (#6)
Re: Why is there a tsquery data type?

Gregory Stark wrote:

"Bruce Momjian" <bruce@momjian.us> writes:

There is no question things would be clearer with only one text search
data type. The only value I can see to having a tsquery data type is
that you can store a tsquery value in a column, but why would that be
much better than just storing it in a TEXT field?

When you try storing a tsquery in a column does it alert you if you have an
invalid syntax at that point? Storing it as text would mean not finding out
until you try to use the query.

Yes it does check syntax:

test=> select 'lkjadsf kjfdsa'::tsquery;
ERROR: syntax error in tsearch query: "lkjadsf kjfdsa"

A larger question is how many people store queries in the database to
make it worth the complexity.

Is converting a text query into the internal format faster or less memory
intensive than converting text into the internal representation? When you run
a query like "WHERE '...' @@ col" if there wasn't a tsquery data type then
'...' would have to be parsed over and over again for each row.

No, internally the TEXT string would be converted to something the
system could deal with for that query, which is probably what 99% of all
queries are going to do anyway by calling to_tsquery().

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#8Gregory Stark
stark@enterprisedb.com
In reply to: Bruce Momjian (#7)
Re: Why is there a tsquery data type?

"Bruce Momjian" <bruce@momjian.us> writes:

Gregory Stark wrote:

"Bruce Momjian" <bruce@momjian.us> writes:

There is no question things would be clearer with only one text search
data type. The only value I can see to having a tsquery data type is
that you can store a tsquery value in a column, but why would that be
much better than just storing it in a TEXT field?

When you try storing a tsquery in a column does it alert you if you have an
invalid syntax at that point? Storing it as text would mean not finding out
until you try to use the query.

Yes it does check syntax:

test=> select 'lkjadsf kjfdsa'::tsquery;
ERROR: syntax error in tsearch query: "lkjadsf kjfdsa"

A larger question is how many people store queries in the database to
make it worth the complexity.

So would this still happen if you didn't have a tsquery type? Or would it
throw the error only when the actual matching operator executed and tried to
parse the text?

Is converting a text query into the internal format faster or less memory
intensive than converting text into the internal representation? When you run
a query like "WHERE '...' @@ col" if there wasn't a tsquery data type then
'...' would have to be parsed over and over again for each row.

No, internally the TEXT string would be converted to something the
system could deal with for that query, which is probably what 99% of all
queries are going to do anyway by calling to_tsquery().

How would that happen if there wasn't a tsquery type?

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#9Bruce Momjian
bruce@momjian.us
In reply to: Gregory Stark (#8)
Re: Why is there a tsquery data type?

Gregory Stark wrote:

"Bruce Momjian" <bruce@momjian.us> writes:

Gregory Stark wrote:

"Bruce Momjian" <bruce@momjian.us> writes:

There is no question things would be clearer with only one text search
data type. The only value I can see to having a tsquery data type is
that you can store a tsquery value in a column, but why would that be
much better than just storing it in a TEXT field?

When you try storing a tsquery in a column does it alert you if you have an
invalid syntax at that point? Storing it as text would mean not finding out
until you try to use the query.

Yes it does check syntax:

test=> select 'lkjadsf kjfdsa'::tsquery;
ERROR: syntax error in tsearch query: "lkjadsf kjfdsa"

A larger question is how many people store queries in the database to
make it worth the complexity.

So would this still happen if you didn't have a tsquery type? Or would it
throw the error only when the actual matching operator executed and tried to
parse the text?

Well, if you didn't have a tsquery data type then you would get the
error when the TEXT was cast to tsquery for the search itself.

Is converting a text query into the internal format faster or less memory
intensive than converting text into the internal representation? When you run
a query like "WHERE '...' @@ col" if there wasn't a tsquery data type then
'...' would have to be parsed over and over again for each row.

No, internally the TEXT string would be converted to something the
system could deal with for that query, which is probably what 99% of all
queries are going to do anyway by calling to_tsquery().

How would that happen if there wasn't a tsquery type?

There is an internal C structure which holds the tsquery information.
My guess is that we would internally have something like tsquery but it
wouldn't be user-visible perhaps. Right now I am a little confused
about how to do this and keep the data-type-independent nature of the
backend. You are right that we might have to end up re-parsing the TEXT
field every time it hits the @@ operator, which is a pain.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Gregory Stark (#8)
Re: Why is there a tsquery data type?

Gregory Stark <stark@enterprisedb.com> writes:

How would that happen if there wasn't a tsquery type?

I don't think Bruce is suggesting that the search operand can really
be plain text (or if he is, he's nuts). The question here is whether
there's really a need for a distinction between tsvector and tsquery
datatypes; could we have tsvector serve both purposes instead?

I can see that there are differences: tsquery can tell the difference
between "x AND y" and "x OR y", whereas tsvector just knows "x, y".
A superset datatype that can do both is certainly possible, but whether
it's practical, or would be easier to use than the current design,
I dunno.

Perhaps a suitable analogy is regexp pattern matching. Traditionally
regexps are conceived of as strings, but if they'd originated in more
strongly typed languages than they did, they'd certainly be thought
of as a distinct data type. Had we implemented ~ as taking a right
operand of type 'regexp', we could win on a number of levels: entry-time
syntax checking for regexps, and a precompiled internal representation,
for instance. For regexps it seems clear to me that the target text
string and the pattern really are different datatypes, and fuzzing that
distinction is not an improvement.

regards, tom lane

#11Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#10)
Re: Why is there a tsquery data type?

Tom Lane wrote:

Gregory Stark <stark@enterprisedb.com> writes:

How would that happen if there wasn't a tsquery type?

I don't think Bruce is suggesting that the search operand can really
be plain text (or if he is, he's nuts). The question here is whether

Yes, that is what I was suggesting because as I mentioned TEXT already
functions fine as tsquery.

there's really a need for a distinction between tsvector and tsquery
datatypes; could we have tsvector serve both purposes instead?

I can see that there are differences: tsquery can tell the difference
between "x AND y" and "x OR y", whereas tsvector just knows "x, y".
A superset datatype that can do both is certainly possible, but whether
it's practical, or would be easier to use than the current design,
I dunno.

Because of the special behavior of & and |, I assume tsquery and
tsvector cannot be the same data type.

Perhaps a suitable analogy is regexp pattern matching. Traditionally
regexps are conceived of as strings, but if they'd originated in more
strongly typed languages than they did, they'd certainly be thought
of as a distinct data type. Had we implemented ~ as taking a right
operand of type 'regexp', we could win on a number of levels: entry-time
syntax checking for regexps, and a precompiled internal representation,
for instance. For regexps it seems clear to me that the target text
string and the pattern really are different datatypes, and fuzzing that
distinction is not an improvement.

Yes, this is a good analogy.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#12Oleg Bartunov
oleg@sai.msu.su
In reply to: Bruce Momjian (#7)
Re: Why is there a tsquery data type?

On Thu, 30 Aug 2007, Bruce Momjian wrote:

Gregory Stark wrote:

"Bruce Momjian" <bruce@momjian.us> writes:

There is no question things would be clearer with only one text search
data type. The only value I can see to having a tsquery data type is
that you can store a tsquery value in a column, but why would that be
much better than just storing it in a TEXT field?

When you try storing a tsquery in a column does it alert you if you have an
invalid syntax at that point? Storing it as text would mean not finding out
until you try to use the query.

Yes it does check syntax:

test=> select 'lkjadsf kjfdsa'::tsquery;
ERROR: syntax error in tsearch query: "lkjadsf kjfdsa"

A larger question is how many people store queries in the database to
make it worth the complexity.

you forget about very powerfull query rewriting, which is table driven

Is converting a text query into the internal format faster or less memory
intensive than converting text into the internal representation? When you run
a query like "WHERE '...' @@ col" if there wasn't a tsquery data type then
'...' would have to be parsed over and over again for each row.

No, internally the TEXT string would be converted to something the
system could deal with for that query, which is probably what 99% of all
queries are going to do anyway by calling to_tsquery().

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#13Ron Mayer
rm_pg@cheapcomplexdevices.com
In reply to: Tom Lane (#2)
Re: Why is there a tsquery data type?

Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Why does text search need a tsquery data type? I realize it needs
tsvector so it can create indexes and updated trigger columns, but it
seems tsquery could instead just be a simple text string.

By that logic, we don't need any data types other than text.

Could similar logic argue that we'd want special types for regular
expressions too? That seems quite parallel to the tsquery type to me.

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ron Mayer (#13)
Re: Why is there a tsquery data type?

Ron Mayer <rm_pg@cheapcomplexdevices.com> writes:

Could similar logic argue that we'd want special types for regular
expressions too? That seems quite parallel to the tsquery type to me.

Yeah, it certainly seems like something we might want to consider in
future --- we could get rid of that klugy cache for compiled regexps,
for one thing. An implicit cast from text to the regexp type would
cover backwards compatibility issues, I think.

regards, tom lane

#15Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#14)
Re: Why is there a tsquery data type?

Added to TODO:

* Consider a special data type for regular expressions

http://archives.postgresql.org/pgsql-hackers/2007-08/msg01067.php

---------------------------------------------------------------------------

Tom Lane wrote:

Ron Mayer <rm_pg@cheapcomplexdevices.com> writes:

Could similar logic argue that we'd want special types for regular
expressions too? That seems quite parallel to the tsquery type to me.

Yeah, it certainly seems like something we might want to consider in
future --- we could get rid of that klugy cache for compiled regexps,
for one thing. An implicit cast from text to the regexp type would
cover backwards compatibility issues, I think.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +