ts_count

Started by Andrew Dunstanover 14 years ago6 messages
#1Andrew Dunstan
andrew@dunslane.net

One of our PostgreSQL Experts Inc customers wanted a function to count
all the occurrences of terms in a tsquery in a tsvector. This has been
written as a loadable module function, and initial testing shows it is
working well. With the client's permission we are releasing the code -
it's available at <https://github.com/pgexperts/ts_count&gt;. The actual
new code involved here is tiny, some of the code is C&P'd from tsrank.c
and much of the rest is boilerplate.

A snippet from the regression test:

select ts_count(to_tsvector('managing managers manage peons
managerially'),
to_tsquery('managers | peon'));
ts_count
----------
4

We'd like to add something like this for 9.2, so I'd like to get the API agreed and then I'll prepare a patch and submit it for the next CF.

Comments? cheers andrew

#2Oleg Bartunov
oleg@sai.msu.su
In reply to: Andrew Dunstan (#1)
Re: ts_count

Well, there are several functions available around tsearch2. so I suggest
somebody to collect all of them and create one extension - ts_addon.
For example, these are what I remember:
1. tsvector2array
2. noccurences(tsvector, tsquery) - like your ts_count
3. nmatches(tsvector, tsquery) - # of matched lexems in query
Of course, we need to think about better names for functions, since
ts_count is a bit ambiguous.

Oleg

On Sat, 4 Jun 2011, Andrew Dunstan wrote:

One of our PostgreSQL Experts Inc customers wanted a function to count all
the occurrences of terms in a tsquery in a tsvector. This has been written as
a loadable module function, and initial testing shows it is working well.
With the client's permission we are releasing the code - it's available at
<https://github.com/pgexperts/ts_count&gt;. The actual new code involved here is
tiny, some of the code is C&P'd from tsrank.c and much of the rest is
boilerplate.

A snippet from the regression test:

select ts_count(to_tsvector('managing managers manage peons
managerially'),
to_tsquery('managers | peon'));
ts_count
----------
4

We'd like to add something like this for 9.2, so I'd like to get the API
agreed and then I'll prepare a patch and submit it for the next CF.

Comments? cheers andrew

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#3Andrew Dunstan
andrew@dunslane.net
In reply to: Oleg Bartunov (#2)
Re: ts_count

On 06/04/2011 04:51 PM, Oleg Bartunov wrote:

Well, there are several functions available around tsearch2. so I suggest
somebody to collect all of them and create one extension - ts_addon.
For example, these are what I remember:
1. tsvector2array
2. noccurences(tsvector, tsquery) - like your ts_count
3. nmatches(tsvector, tsquery) - # of matched lexems in query
Of course, we need to think about better names for functions, since
ts_count is a bit ambiguous.

Getting agreed names was one reason for posting. I don't know why these
need to be an extension. I think they are of sufficiently general
interest (and sufficiently lightweight) that we could just build them in.

cheers

andrew

#4Alvaro Herrera
alvherre@commandprompt.com
In reply to: Andrew Dunstan (#1)
Re: ts_count

Excerpts from Andrew Dunstan's message of sáb jun 04 08:47:02 -0400 2011:

A snippet from the regression test:

select ts_count(to_tsvector('managing managers manage peons managerially'),
to_tsquery('managers | peon'));
ts_count
----------
4

Err, shouldn't this return 5?

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#5Andrew Dunstan
andrew@dunslane.net
In reply to: Alvaro Herrera (#4)
Re: ts_count

On 06/04/2011 08:59 PM, Alvaro Herrera wrote:

Excerpts from Andrew Dunstan's message of sáb jun 04 08:47:02 -0400 2011:

A snippet from the regression test:

select ts_count(to_tsvector('managing managers manage peons managerially'),
to_tsquery('managers | peon'));
ts_count
----------
4

Err, shouldn't this return 5?

No. 'managerially' doesn't get the same stemming.

cheers

andrew

#6Andrew Dunstan
andrew@dunslane.net
In reply to: Oleg Bartunov (#2)
Re: ts_count

On 06/04/2011 04:51 PM, Oleg Bartunov wrote:

Well, there are several functions available around tsearch2. so I suggest
somebody to collect all of them and create one extension - ts_addon.
For example, these are what I remember:
1. tsvector2array
2. noccurences(tsvector, tsquery) - like your ts_count
3. nmatches(tsvector, tsquery) - # of matched lexems in query
Of course, we need to think about better names for functions, since
ts_count is a bit ambiguous.

Oleg, are you doing this? I'd rather this stuff didn't get dropped on
the floor.

cheers

andrew