pgsql: Phrase full text search.

Started by Teodor Sigaevabout 10 years ago6 messagescomitters
Jump to latest
#1Teodor Sigaev
teodor@sigaev.ru

Phrase full text search.

Patch introduces new text search operator (<-> or <DISTANCE>) into tsquery.
On-disk and binary in/out format of tsquery are backward compatible.
It has two side effect:
- change order for tsquery, so, users, who has a btree index over tsquery,
should reindex it
- less number of parenthesis in tsquery output, and tsquery becomes more
readable

Authors: Teodor Sigaev, Oleg Bartunov, Dmitry Ivanov
Reviewers: Alexander Korotkov, Artur Zakirov

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/bb140506df605fab58f48926ee1db1f80bdafb59

Modified Files
--------------
contrib/tsearch2/expected/tsearch2.out | 56 ++---
doc/src/sgml/datatype.sgml | 9 +-
doc/src/sgml/func.sgml | 39 ++++
doc/src/sgml/textsearch.sgml | 182 ++++++++++++++-
src/backend/tsearch/to_tsany.c | 187 +++++++--------
src/backend/tsearch/ts_parse.c | 15 +-
src/backend/tsearch/ts_selfuncs.c | 3 +-
src/backend/tsearch/wparser_def.c | 31 ++-
src/backend/utils/adt/tsginidx.c | 57 +++--
src/backend/utils/adt/tsgistidx.c | 4 +-
src/backend/utils/adt/tsquery.c | 311 +++++++++++++++++++------
src/backend/utils/adt/tsquery_cleanup.c | 362 +++++++++++++++++++++++++++--
src/backend/utils/adt/tsquery_op.c | 54 ++++-
src/backend/utils/adt/tsquery_util.c | 11 +-
src/backend/utils/adt/tsrank.c | 263 ++++++++++++++-------
src/backend/utils/adt/tsvector.c | 2 +-
src/backend/utils/adt/tsvector_op.c | 326 +++++++++++++++++++++++---
src/backend/utils/adt/tsvector_parser.c | 10 +-
src/include/catalog/catversion.h | 2 +-
src/include/catalog/pg_operator.h | 3 +
src/include/catalog/pg_proc.h | 7 +
src/include/tsearch/ts_public.h | 22 +-
src/include/tsearch/ts_type.h | 30 ++-
src/include/tsearch/ts_utils.h | 15 +-
src/test/regress/expected/tsdicts.out | 36 ++-
src/test/regress/expected/tsearch.out | 395 +++++++++++++++++++++++++++++---
src/test/regress/expected/tstypes.out | 369 ++++++++++++++++++++++++++++-
src/test/regress/sql/tsdicts.sql | 3 +
src/test/regress/sql/tsearch.sql | 101 ++++++++
src/test/regress/sql/tstypes.sql | 75 +++++-
30 files changed, 2536 insertions(+), 444 deletions(-)

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Teodor Sigaev (#1)
Re: pgsql: Phrase full text search.

Teodor Sigaev <teodor@sigaev.ru> writes:

Phrase full text search.

Hasn't this patch broken on-disk compatibility of type tsquery by
renumbering the values of QueryOperator.operator? I'm looking at
the patch delta in ts_type.h.

regards, tom lane

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#2)
Re: pgsql: Phrase full text search.

I wrote:

... I'm looking at the patch delta in ts_type.h.

BTW, while I'm looking at that: comparePos() was a perfectly OK
name for a static function within tsvector.c, but it seems like a
pretty horrid name for a globally exposed linker symbol. Please
rename it to something less generic.

regards, tom lane

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#4Teodor Sigaev
teodor@sigaev.ru
In reply to: Tom Lane (#2)
Re: pgsql: Phrase full text search.

Phrase full text search.

Hasn't this patch broken on-disk compatibility of type tsquery by
renumbering the values of QueryOperator.operator? I'm looking at
the patch delta in ts_type.h.

Distance field is placed exactly in hole between two uint8_t fields and uint32_t
field, as I known any known platform which we support uses 4-byte aligment for
int32 type. Am I wrong? If yes then I will move distance to the end of struct.
QueryOpertor struct isn't used directly to store to disk, it's used in union
QueryItem.
sizeof(QueryItem) = 12
sizeof(QueryOperator) = 8, so we can add distance to the end without growning
size of QueryItem.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Teodor Sigaev (#4)
Re: pgsql: Phrase full text search.

Teodor Sigaev <teodor@sigaev.ru> writes:

Hasn't this patch broken on-disk compatibility of type tsquery by
renumbering the values of QueryOperator.operator? I'm looking at
the patch delta in ts_type.h.

Distance field is placed exactly in hole between two uint8_t fields and uint32_t
field, as I known any known platform which we support uses 4-byte aligment for
int32 type. Am I wrong?

No, I'm worried about the fact that you changed the OP_xxx constants.
Won't that cause a pre-existing tsquery operator to be read incorrectly?

Assuming that I'm right, you need to revert OP_AND/OP_OR/OP_NOT to what
they were before, which means you need to give up on the assumption that
the numerical values of the OP_xxx constants correspond directly to their
syntactic priority. But that assumption was never going to survive the
next tsquery expansion anyway. I'd suggest a static const array mapping
the OP values into their syntactic priorities.

regards, tom lane

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#6Teodor Sigaev
teodor@sigaev.ru
In reply to: Tom Lane (#5)
Re: pgsql: Phrase full text search.

Assuming that I'm right, you need to revert OP_AND/OP_OR/OP_NOT to what
they were before, which means you need to give up on the assumption that
the numerical values of the OP_xxx constants correspond directly to their
syntactic priority. But that assumption was never going to survive the
next tsquery expansion anyway. I'd suggest a static const array mapping
the OP values into their syntactic priorities.

Oh, I see. Will fix.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers