ts_count
One of our PostgreSQL Experts Inc customers wanted a function to count
all the occurrences of terms in a tsquery in a tsvector. This has been
written as a loadable module function, and initial testing shows it is
working well. With the client's permission we are releasing the code -
it's available at <https://github.com/pgexperts/ts_count>. The actual
new code involved here is tiny, some of the code is C&P'd from tsrank.c
and much of the rest is boilerplate.
A snippet from the regression test:
select ts_count(to_tsvector('managing managers manage peons
managerially'),
to_tsquery('managers | peon'));
ts_count
----------
4
We'd like to add something like this for 9.2, so I'd like to get the API agreed and then I'll prepare a patch and submit it for the next CF.
Comments? cheers andrew
Well, there are several functions available around tsearch2. so I suggest
somebody to collect all of them and create one extension - ts_addon.
For example, these are what I remember:
1. tsvector2array
2. noccurences(tsvector, tsquery) - like your ts_count
3. nmatches(tsvector, tsquery) - # of matched lexems in query
Of course, we need to think about better names for functions, since
ts_count is a bit ambiguous.
Oleg
On Sat, 4 Jun 2011, Andrew Dunstan wrote:
One of our PostgreSQL Experts Inc customers wanted a function to count all
the occurrences of terms in a tsquery in a tsvector. This has been written as
a loadable module function, and initial testing shows it is working well.
With the client's permission we are releasing the code - it's available at
<https://github.com/pgexperts/ts_count>. The actual new code involved here is
tiny, some of the code is C&P'd from tsrank.c and much of the rest is
boilerplate.A snippet from the regression test:
select ts_count(to_tsvector('managing managers manage peons
managerially'),
to_tsquery('managers | peon'));
ts_count
----------
4We'd like to add something like this for 9.2, so I'd like to get the API
agreed and then I'll prepare a patch and submit it for the next CF.Comments? cheers andrew
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
On 06/04/2011 04:51 PM, Oleg Bartunov wrote:
Well, there are several functions available around tsearch2. so I suggest
somebody to collect all of them and create one extension - ts_addon.
For example, these are what I remember:
1. tsvector2array
2. noccurences(tsvector, tsquery) - like your ts_count
3. nmatches(tsvector, tsquery) - # of matched lexems in query
Of course, we need to think about better names for functions, since
ts_count is a bit ambiguous.
Getting agreed names was one reason for posting. I don't know why these
need to be an extension. I think they are of sufficiently general
interest (and sufficiently lightweight) that we could just build them in.
cheers
andrew
Excerpts from Andrew Dunstan's message of sáb jun 04 08:47:02 -0400 2011:
A snippet from the regression test:
select ts_count(to_tsvector('managing managers manage peons managerially'),
to_tsquery('managers | peon'));
ts_count
----------
4
Err, shouldn't this return 5?
--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On 06/04/2011 08:59 PM, Alvaro Herrera wrote:
Excerpts from Andrew Dunstan's message of sáb jun 04 08:47:02 -0400 2011:
A snippet from the regression test:
select ts_count(to_tsvector('managing managers manage peons managerially'),
to_tsquery('managers | peon'));
ts_count
----------
4Err, shouldn't this return 5?
No. 'managerially' doesn't get the same stemming.
cheers
andrew
On 06/04/2011 04:51 PM, Oleg Bartunov wrote:
Well, there are several functions available around tsearch2. so I suggest
somebody to collect all of them and create one extension - ts_addon.
For example, these are what I remember:
1. tsvector2array
2. noccurences(tsvector, tsquery) - like your ts_count
3. nmatches(tsvector, tsquery) - # of matched lexems in query
Of course, we need to think about better names for functions, since
ts_count is a bit ambiguous.
Oleg, are you doing this? I'd rather this stuff didn't get dropped on
the floor.
cheers
andrew