Out of the box, full text search feature suggestion for postgresql

Started by aaover 2 years ago5 messagesbugs
Jump to latest
#1aa
ghevge@gmail.com

Hello Postgres Team!

First of all, a big THANK YOU for the great work you folks are doing!

The reason I am writing to you is to suggest a feature in future Postgres
versions, a feature that is partially there but is not quite where it
should be in my opinion: the full text search functionality. This
functionality in my opinion, should be available out of the box, for any
possible language available, including east Asia character based languages.
You would probably say that this will require a huge amount of work, and I
would say, a postgres extension which does exactly this, already exists,
and it is called : pgroonga (https://pgroonga.github.io/)

This tool is very good at doing full text search, with minimum effort
from the user side (you just have to create some pgroonga indexes and
slightly alter your select queries ), but it is a bit unstable, I believe
mainly because it is not fully integrated in postgres code.

The reason I am asking for this functionality to be part of the postgres
future release is that currently, to set up postgres to do some decent full
text search operations, will require a lot of NL knowledge and DB
configurations, which the vast majority of postgres users don't have. So
providing something out of the box will be much handier for the majority,
while the experts could tweak their DBs using the existing tools.

Thanks, and I hope you will at least discuss this suggestion!
A

#2Bruce Momjian
bruce@momjian.us
In reply to: aa (#1)
Re: Out of the box, full text search feature suggestion for postgresql

On Thu, Dec 28, 2023 at 10:15:07AM -0500, aa wrote:

Hello Postgres Team!

First of all, a big THANK YOU for the great work you folks are doing!

The reason I am writing to you is to suggest a feature in future Postgres
versions, a feature that is partially there but is not quite where it should be
in my opinion: the full text search functionality. This functionality in my
opinion, should be available out of the box, for any possible language
available, including east Asia character based languages. You would probably
say that this will require a huge amount of work, and I would say, a postgres
extension which does exactly this, already exists, and it is called : pgroonga
(https://pgroonga.github.io/) 

Please explain how this is different from what we already have:

https://www.postgresql.org/docs/current/textsearch.html

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.

#3aa
ghevge@gmail.com
In reply to: Bruce Momjian (#2)
Re: Out of the box, full text search feature suggestion for postgresql

Hi Bruce,

As I have mentioned in the initial email, what is currently available in
postgresql in regards to full text search, is tailored for NL experts IMO.
As an average user you will have to dig a lot to configure your DB and your
queries to get some decent speech text results. And you will have to do
these configurations, for every language you will want to support.

I personally spent a couple of weeks trying to set up full text search on a
postgres instance, for a couple of different languages, just to realize
that is very time consuming and hard to maintain it. So in the end I found
proonga, which out of the box, offered me fast full text search support for
any language I wanted, including Chinese, Korean and Japanese. I only
needed to change my select queries to use their full text search operators
(although pgroonnga support out of the box the use of like and ilike
operators) plus I had to create some proonga indexes on the text fields I
wanted to query on. That was it.
They are still missing fuzzy search support and still have to work a bit on
stability (especially on huge DBs), but all these are already on their todo
list.

So in essence, it took me a couple of weeks of tests and a lot of failures
to realize that the postgres existing full text search functionality is too
complicated to setup and use, if you want some decent results. With
pgroonga I got decent full text search results in a matter of hours,
without much changes on my queries.

Thanks,
A

On Thu, Dec 28, 2023 at 11:46 AM Bruce Momjian <bruce@momjian.us> wrote:

Show quoted text

On Thu, Dec 28, 2023 at 10:15:07AM -0500, aa wrote:

Hello Postgres Team!

First of all, a big THANK YOU for the great work you folks are doing!

The reason I am writing to you is to suggest a feature in future Postgres
versions, a feature that is partially there but is not quite where it

should be

in my opinion: the full text search functionality. This functionality in

my

opinion, should be available out of the box, for any possible language
available, including east Asia character based languages. You would

probably

say that this will require a huge amount of work, and I would say, a

postgres

extension which does exactly this, already exists, and it is called :

pgroonga

(https://pgroonga.github.io/)

Please explain how this is different from what we already have:

https://www.postgresql.org/docs/current/textsearch.html

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.

#4Artur Zakirov
zaartur@gmail.com
In reply to: Bruce Momjian (#2)
Re: Out of the box, full text search feature suggestion for postgresql

On Thu, 28 Dec 2023 at 17:46, Bruce Momjian <bruce@momjian.us> wrote:

On Thu, Dec 28, 2023 at 10:15:07AM -0500, aa wrote:

Hello Postgres Team!

First of all, a big THANK YOU for the great work you folks are doing!

The reason I am writing to you is to suggest a feature in future Postgres
versions, a feature that is partially there but is not quite where it should be
in my opinion: the full text search functionality. This functionality in my
opinion, should be available out of the box, for any possible language
available, including east Asia character based languages. You would probably
say that this will require a huge amount of work, and I would say, a postgres
extension which does exactly this, already exists, and it is called : pgroonga
(https://pgroonga.github.io/)

Please explain how this is different from what we already have:

https://www.postgresql.org/docs/current/textsearch.html

I'm not familiar with pgroonga, but the main issue with built-in text
search is that it cannot tokenize asian and many other languages
properly.

Here default parser cannot tokenize Japanese text:

=# select * from ts_parse('default', 'これはペンです');
tokid | token
-------+----------------
2 | これはペンです

Unlike Latin:

=# select * from ts_parse('default', 'this is a pen');
tokid | token
-------+-------
1 | this
12 |
1 | is
12 |
1 | a
12 |
1 | pen

To add support for Japanese (and other languages) it is necessary to
write a new parser or fix the existing default parser.

On the other hand pgroonga's source code looks complex, and I doubt
that there are pgsql-hackers who know it and target languages well and
who will be able to port it to Postgres core.

--
Artur

#5aa
ghevge@gmail.com
In reply to: Artur Zakirov (#4)
Re: Out of the box, full text search feature suggestion for postgresql

Pgroonga project seems to have solved that problem as it supports any
language, out of the box.
As for the "pgsql-hackers" you are looking for, I would say Kou (the main
developer in pgroonga) will be the right candidate for that.

I guess it will be just a matter for you guys to convince him to join
forces.

IMO integrating pgroonga logic into postgres will be a huge benefit for the
whole postgres community, in regards to full text search functionality.

On Tue, Jan 2, 2024 at 12:21 PM Artur Zakirov <zaartur@gmail.com> wrote:

Show quoted text

On Thu, 28 Dec 2023 at 17:46, Bruce Momjian <bruce@momjian.us> wrote:

On Thu, Dec 28, 2023 at 10:15:07AM -0500, aa wrote:

Hello Postgres Team!

First of all, a big THANK YOU for the great work you folks are doing!

The reason I am writing to you is to suggest a feature in future

Postgres

versions, a feature that is partially there but is not quite where it

should be

in my opinion: the full text search functionality. This functionality

in my

opinion, should be available out of the box, for any possible language
available, including east Asia character based languages. You would

probably

say that this will require a huge amount of work, and I would say, a

postgres

extension which does exactly this, already exists, and it is called :

pgroonga

(https://pgroonga.github.io/)

Please explain how this is different from what we already have:

https://www.postgresql.org/docs/current/textsearch.html

I'm not familiar with pgroonga, but the main issue with built-in text
search is that it cannot tokenize asian and many other languages
properly.

Here default parser cannot tokenize Japanese text:

=# select * from ts_parse('default', 'これはペンです');
tokid | token
-------+----------------
2 | これはペンです

Unlike Latin:

=# select * from ts_parse('default', 'this is a pen');
tokid | token
-------+-------
1 | this
12 |
1 | is
12 |
1 | a
12 |
1 | pen

To add support for Japanese (and other languages) it is necessary to
write a new parser or fix the existing default parser.

On the other hand pgroonga's source code looks complex, and I doubt
that there are pgsql-hackers who know it and target languages well and
who will be able to port it to Postgres core.

--
Artur