pgsql: Phrase full text search.
Phrase full text search.
Patch introduces new text search operator (<-> or <DISTANCE>) into tsquery.
On-disk and binary in/out format of tsquery are backward compatible.
It has two side effect:
- change order for tsquery, so, users, who has a btree index over tsquery,
should reindex it
- less number of parenthesis in tsquery output, and tsquery becomes more
readable
Authors: Teodor Sigaev, Oleg Bartunov, Dmitry Ivanov
Reviewers: Alexander Korotkov, Artur Zakirov
Branch
------
master
Details
-------
http://git.postgresql.org/pg/commitdiff/bb140506df605fab58f48926ee1db1f80bdafb59
Modified Files
--------------
contrib/tsearch2/expected/tsearch2.out | 56 ++---
doc/src/sgml/datatype.sgml | 9 +-
doc/src/sgml/func.sgml | 39 ++++
doc/src/sgml/textsearch.sgml | 182 ++++++++++++++-
src/backend/tsearch/to_tsany.c | 187 +++++++--------
src/backend/tsearch/ts_parse.c | 15 +-
src/backend/tsearch/ts_selfuncs.c | 3 +-
src/backend/tsearch/wparser_def.c | 31 ++-
src/backend/utils/adt/tsginidx.c | 57 +++--
src/backend/utils/adt/tsgistidx.c | 4 +-
src/backend/utils/adt/tsquery.c | 311 +++++++++++++++++++------
src/backend/utils/adt/tsquery_cleanup.c | 362 +++++++++++++++++++++++++++--
src/backend/utils/adt/tsquery_op.c | 54 ++++-
src/backend/utils/adt/tsquery_util.c | 11 +-
src/backend/utils/adt/tsrank.c | 263 ++++++++++++++-------
src/backend/utils/adt/tsvector.c | 2 +-
src/backend/utils/adt/tsvector_op.c | 326 +++++++++++++++++++++++---
src/backend/utils/adt/tsvector_parser.c | 10 +-
src/include/catalog/catversion.h | 2 +-
src/include/catalog/pg_operator.h | 3 +
src/include/catalog/pg_proc.h | 7 +
src/include/tsearch/ts_public.h | 22 +-
src/include/tsearch/ts_type.h | 30 ++-
src/include/tsearch/ts_utils.h | 15 +-
src/test/regress/expected/tsdicts.out | 36 ++-
src/test/regress/expected/tsearch.out | 395 +++++++++++++++++++++++++++++---
src/test/regress/expected/tstypes.out | 369 ++++++++++++++++++++++++++++-
src/test/regress/sql/tsdicts.sql | 3 +
src/test/regress/sql/tsearch.sql | 101 ++++++++
src/test/regress/sql/tstypes.sql | 75 +++++-
30 files changed, 2536 insertions(+), 444 deletions(-)
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
Teodor Sigaev <teodor@sigaev.ru> writes:
Phrase full text search.
Hasn't this patch broken on-disk compatibility of type tsquery by
renumbering the values of QueryOperator.operator? I'm looking at
the patch delta in ts_type.h.
regards, tom lane
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
I wrote:
... I'm looking at the patch delta in ts_type.h.
BTW, while I'm looking at that: comparePos() was a perfectly OK
name for a static function within tsvector.c, but it seems like a
pretty horrid name for a globally exposed linker symbol. Please
rename it to something less generic.
regards, tom lane
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
Phrase full text search.
Hasn't this patch broken on-disk compatibility of type tsquery by
renumbering the values of QueryOperator.operator? I'm looking at
the patch delta in ts_type.h.
Distance field is placed exactly in hole between two uint8_t fields and uint32_t
field, as I known any known platform which we support uses 4-byte aligment for
int32 type. Am I wrong? If yes then I will move distance to the end of struct.
QueryOpertor struct isn't used directly to store to disk, it's used in union
QueryItem.
sizeof(QueryItem) = 12
sizeof(QueryOperator) = 8, so we can add distance to the end without growning
size of QueryItem.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
Teodor Sigaev <teodor@sigaev.ru> writes:
Hasn't this patch broken on-disk compatibility of type tsquery by
renumbering the values of QueryOperator.operator? I'm looking at
the patch delta in ts_type.h.
Distance field is placed exactly in hole between two uint8_t fields and uint32_t
field, as I known any known platform which we support uses 4-byte aligment for
int32 type. Am I wrong?
No, I'm worried about the fact that you changed the OP_xxx constants.
Won't that cause a pre-existing tsquery operator to be read incorrectly?
Assuming that I'm right, you need to revert OP_AND/OP_OR/OP_NOT to what
they were before, which means you need to give up on the assumption that
the numerical values of the OP_xxx constants correspond directly to their
syntactic priority. But that assumption was never going to survive the
next tsquery expansion anyway. I'd suggest a static const array mapping
the OP values into their syntactic priorities.
regards, tom lane
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
Assuming that I'm right, you need to revert OP_AND/OP_OR/OP_NOT to what
they were before, which means you need to give up on the assumption that
the numerical values of the OP_xxx constants correspond directly to their
syntactic priority. But that assumption was never going to survive the
next tsquery expansion anyway. I'd suggest a static const array mapping
the OP values into their syntactic priorities.
Oh, I see. Will fix.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers