Let's play bash the search engine

Started by Joshua D. Drakeover 19 years ago37 messagesgeneral
Jump to latest
#1Joshua D. Drake
jd@commandprompt.com

Hello,

search.postgresql.org is now served directly from PostgreSQL 8.2 ,
Tsearch2 and GIN. We have been testing thoroughly for the last couple of
weeks but of course... it is now open to the general public.

Take a look at let us know what you think and how it performs for you.

Sincerely,

Joshua D. Drake

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

#2Thomas H.
me@alternize.com
In reply to: Joshua D. Drake (#1)
Re: Let's play bash the search engine

Take a look at let us know what you think and how it performs for you.

i would love an advanced search where you can limit the results to a
particular version of the documentation. the query for "SELECT" returns too
many results from too many versions, obviously.

its fast & quick tho :-)

regards,
thomas

#3Jorge Godoy
jgodoy@gmail.com
In reply to: Thomas H. (#2)
Re: Let's play bash the search engine

"Thomas H." <me@alternize.com> writes:

i would love an advanced search where you can limit the results to a
particular version of the documentation. the query for "SELECT" returns too
many results from too many versions, obviously.

+1 on that.

its fast & quick tho :-)

Indeed.

Be seeing you,
--
Jorge Godoy <jgodoy@gmail.com>

#4Reece Hart
reece@harts.net
In reply to: Joshua D. Drake (#1)
Re: Let's play bash the search engine

On Mon, 2006-12-18 at 15:47 -0800, Joshua D. Drake wrote:

Take a look at let us know what you think and how it performs for you.

Terrific. Fast and meaningful.

I echo Thomas' request to have docs limited to a version (or, better,
most recent). Perhaps archived docs should be searched via a separate
page entirely.

Most the queries I did hit what I expected, except that the docs were
for old versions. (In fact, I don't think 8.2 docs ever showed up
first.)

I tried "defer constraints" and got a few not-too-useful hits. However,
"deferred constraints" returned meaningful links. Is that a stemmer
problem?

-Reece

--
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0
./universe -G 6.672e-11 -e 1.602e-19 -protonmass 1.673e-27 -uspres bush
kernel warning: universe consuming too many resources. Killing.
universe killed due to catastrophic leadership. Try -uspres carter.

#5Henrik Zagerholm
henke@mac.se
In reply to: Joshua D. Drake (#1)
Re: Let's play bash the search engine

Hello,

Searching after "tsearch"
5. PostgreSQL: Documentation: Manuals: PostgreSQL 7.4: Examples [0.1]
...tsearch and tsearch2Full text
indexingPrevHomeNextLimitationsUpPage Files User Comments No comments
could be found for this...
http://www.postgresql.org/docs/7.4/interactive/examples.html

Searching after "tsearch2"
An error occured while searching.

Searching after "tsearch2full"
An error occured while searching.

Why is it so? =)

Cheers,
Henrik

19 dec 2006 kl. 00:47 skrev Joshua D. Drake:

Show quoted text

Hello,

search.postgresql.org is now served directly from PostgreSQL 8.2 ,
Tsearch2 and GIN. We have been testing thoroughly for the last
couple of
weeks but of course... it is now open to the general public.

Take a look at let us know what you think and how it performs for you.

Sincerely,

Joshua D. Drake

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/
donate

---------------------------(end of
broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

#6Shane Ambler
pgsql@007Marketing.com
In reply to: Reece Hart (#4)
Re: Let's play bash the search engine

Reece Hart wrote:

On Mon, 2006-12-18 at 15:47 -0800, Joshua D. Drake wrote:

Take a look at let us know what you think and how it performs for you.

Terrific. Fast and meaningful.

I echo Thomas' request to have docs limited to a version (or, better,
most recent). Perhaps archived docs should be searched via a separate
page entirely.

+1 there - current docs should be searched unless you specify all/older
docs. Maybe old docs can be included in the archives section, which the
link isn't overly clear that it is the mail archives.

Most the queries I did hit what I expected, except that the docs were
for old versions. (In fact, I don't think 8.2 docs ever showed up
first.)

I tried "defer constraints" and got a few not-too-useful hits. However,
"deferred constraints" returned meaningful links. Is that a stemmer
problem?

-Reece

I reckon the old search showed 3-4 lines in the 'preview' of the
listing, now we only get 1 which sometimes wraps partially onto 2.
Personally I preferred having 3-4 lines in the results - it can be
easier to pick out the page that you are searching for without going there.

One thing that I have seen on a few searches (eg. yahoo cached pages) is
when you follow a link it then highlights the search criteria on the
page. Would be a nice feature to quickly find the search result on the
destination page.

I found a little error in the page coding calculations -

Search for create
it states "Pages 1-20 of more than 1000." - that's ok
if you go to page 50 you get "Pages 981-1000 of more than 1000." - fine
then on page 51 you get "Your search for create returned no hits."

search for 'select' or 'update' gets the same thing. It would seem that
you have a 'limit 1000' which gives the 'more than 1000' in the hits
description but it generates an extra page (51) that tries to fetch
1001-1020

--

Shane Ambler
pgSQL@007Marketing.com

Get Sheeky @ http://Sheeky.Biz

#7Gurjeet Singh
singh.gurjeet@gmail.com
In reply to: Henrik Zagerholm (#5)
Re: Let's play bash the search engine

On 12/19/06, Henrik Zagerholm <henke@mac.se> wrote:

Hello,

Searching after "tsearch"
5. PostgreSQL: Documentation: Manuals: PostgreSQL 7.4: Examples [0.1]
...tsearch and tsearch2Full text
indexingPrevHomeNextLimitationsUpPage Files User Comments No comments
could be found for this...
http://www.postgresql.org/docs/7.4/interactive/examples.html

Searching after "tsearch2"
An error occured while searching.

Searching after "tsearch2full"
An error occured while searching.

This error can be generalized to the reg-ex [::alpha::]+[::digit::]+
Examples:
A1
A2 etc...

Why is it so? =)

Cheers,
Henrik

19 dec 2006 kl. 00:47 skrev Joshua D. Drake:

Hello,

search.postgresql.org is now served directly from PostgreSQL 8.2 ,
Tsearch2 and GIN. We have been testing thoroughly for the last
couple of
weeks but of course... it is now open to the general public.

Take a look at let us know what you think and how it performs for you.

Sincerely,

Joshua D. Drake

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/
donate

---------------------------(end of
broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org/

--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | yahoo }.com

#8Gurjeet Singh
singh.gurjeet@gmail.com
In reply to: Shane Ambler (#6)
Re: Let's play bash the search engine

On 12/19/06, Shane Ambler <pgsql@007marketing.com> wrote:

I echo Thomas' request to have docs limited to a version (or, better,
most recent). Perhaps archived docs should be searched via a separate
page entirely.

+1 there - current docs should be searched unless you specify all/older

count me in too...

I reckon the old search showed 3-4 lines in the 'preview' of the

listing, now we only get 1 which sometimes wraps partially onto 2.
Personally I preferred having 3-4 lines in the results - it can be
easier to pick out the page that you are searching for without going
there.

same sentiments

One thing that I have seen on a few searches (eg. yahoo cached pages) is

when you follow a link it then highlights the search criteria on the
page. Would be a nice feature to quickly find the search result on the
destination page.

+1

I found a little error in the page coding calculations -

Search for create
it states "Pages 1-20 of more than 1000." - that's ok
if you go to page 50 you get "Pages 981-1000 of more than 1000." - fine
then on page 51 you get "Your search for create returned no hits."

search for 'select' or 'update' gets the same thing. It would seem that
you have a 'limit 1000' which gives the 'more than 1000' in the hits
description but it generates an extra page (51) that tries to fetch
1001-1020

Or is it possible that the LIMIT ... OFFSET combination is erroneous!

my '2 cents (or die tryin)',

--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | yahoo }.com

#9Magnus Hagander
magnus@hagander.net
In reply to: Thomas H. (#2)
Re: Let's play bash the search engine

On Tue, Dec 19, 2006 at 12:56:25AM +0100, Thomas H. wrote:

Take a look at let us know what you think and how it performs for you.

i would love an advanced search where you can limit the results to a
particular version of the documentation. the query for "SELECT" returns too
many results from too many versions, obviously.

You get this if you go into say the 8.2 docs, and use the search form
there - same as before.

That said, it's not a bad idea to add it anyway, but so far the main
concern has been feature-identical to what we had before.

//Magnus

#10Magnus Hagander
magnus@hagander.net
In reply to: Gurjeet Singh (#7)
Re: Let's play bash the search engine

On Tue, Dec 19, 2006 at 01:48:22PM +0530, Gurjeet Singh wrote:

On 12/19/06, Henrik Zagerholm <henke@mac.se> wrote:

Hello,

Searching after "tsearch"
5. PostgreSQL: Documentation: Manuals: PostgreSQL 7.4: Examples [0.1]
...tsearch and tsearch2Full text
indexingPrevHomeNextLimitationsUpPage Files User Comments No comments
could be found for this...
http://www.postgresql.org/docs/7.4/interactive/examples.html

Searching after "tsearch2"
An error occured while searching.

Searching after "tsearch2full"
An error occured while searching.

This error can be generalized to the reg-ex [::alpha::]+[::digit::]+
Examples:
A1
A2 etc...

Why is it so? =)

Seems to_tsvecto() returns NULL for tsearch2 or for, as you say,
anything that ends in a digit.

Oleg, can you comment on why this is happening? What can we do to fix
that?

//Magnus

#11Oleg Bartunov
oleg@sai.msu.su
In reply to: Magnus Hagander (#10)
Re: Let's play bash the search engine

On Tue, 19 Dec 2006, Magnus Hagander wrote:

On Tue, Dec 19, 2006 at 01:48:22PM +0530, Gurjeet Singh wrote:

On 12/19/06, Henrik Zagerholm <henke@mac.se> wrote:

Hello,

Searching after "tsearch"
5. PostgreSQL: Documentation: Manuals: PostgreSQL 7.4: Examples [0.1]
...tsearch and tsearch2Full text
indexingPrevHomeNextLimitationsUpPage Files User Comments No comments
could be found for this...
http://www.postgresql.org/docs/7.4/interactive/examples.html

Searching after "tsearch2"
An error occured while searching.

Searching after "tsearch2full"
An error occured while searching.

This error can be generalized to the reg-ex [::alpha::]+[::digit::]+
Examples:
A1
A2 etc...

Why is it so? =)

Seems to_tsvecto() returns NULL for tsearch2 or for, as you say,
anything that ends in a digit.

Oleg, can you comment on why this is happening? What can we do to fix
that?

Most probably, token type 'word' just doesn't indexed. If you
didnt' correct this from pgweb configuration:

-- we won't index/search some tokens
update pg_ts_cfgmap set dict_name = NULL
where tok_alias in ('email', 'url', 'sfloat', 'uri', 'float','word')
and ts_name = 'pg';

//Magnus

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#12Magnus Hagander
magnus@hagander.net
In reply to: Oleg Bartunov (#11)
Re: Let's play bash the search engine

On Tue, Dec 19, 2006 at 01:13:16PM +0300, Oleg Bartunov wrote:

Seems to_tsvecto() returns NULL for tsearch2 or for, as you say,
anything that ends in a digit.

Oleg, can you comment on why this is happening? What can we do to fix
that?

Most probably, token type 'word' just doesn't indexed. If you
didnt' correct this from pgweb configuration:

-- we won't index/search some tokens
update pg_ts_cfgmap set dict_name = NULL
where tok_alias in ('email', 'url', 'sfloat', 'uri', 'float','word')
and ts_name = 'pg';

That sounds like it's the problem. I'll update the configuration, and I
assume I have to regenerate all the tsvectors as well, right?

Should I set it to 'simple' or one of the others?

//Magnus

#13Hannes Dorbath
light@theendofthetunnel.de
In reply to: Joshua D. Drake (#1)
Don't split on underscore

I think it would be useful to adjust the parser to not split on underscores:

In case I'd like to lookup PG's to_number() function I won't get
anything useful. Number is contained nearly everywhere and to is
configured as stop word. Same with most other functions.

On 19.12.2006 00:47, Joshua D. Drake wrote:

search.postgresql.org is now served directly from PostgreSQL 8.2 ,
Tsearch2 and GIN. We have been testing thoroughly for the last couple of
weeks but of course... it is now open to the general public.

Take a look at let us know what you think and how it performs for you.

--
Regards,
Hannes Dorbath

#14Magnus Hagander
magnus@hagander.net
In reply to: Magnus Hagander (#12)
Re: Let's play bash the search engine

On Tue, Dec 19, 2006 at 01:24:01PM +0100, Magnus Hagander wrote:

On Tue, Dec 19, 2006 at 01:13:16PM +0300, Oleg Bartunov wrote:

Seems to_tsvecto() returns NULL for tsearch2 or for, as you say,
anything that ends in a digit.

Oleg, can you comment on why this is happening? What can we do to fix
that?

Most probably, token type 'word' just doesn't indexed. If you
didnt' correct this from pgweb configuration:

-- we won't index/search some tokens
update pg_ts_cfgmap set dict_name = NULL
where tok_alias in ('email', 'url', 'sfloat', 'uri', 'float','word')
and ts_name = 'pg';

That sounds like it's the problem. I'll update the configuration, and I
assume I have to regenerate all the tsvectors as well, right?

Should I set it to 'simple' or one of the others?

This has now been fixed for both website and archive search. So now you
can search for the technology that made the search possible in the first
place again :-)

//Magnus

#15Lincoln Yeoh
lyeoh@pop.jaring.my
In reply to: Joshua D. Drake (#1)
Re: Let's play bash the search engine

Hi,

Seems ok. Works better than most corporate search engines - some tend
to show pages and pages of useless press releases when you are
searching for drivers, specifications etc.

But as long as the sites remain indexable to outside search engines,
people get to use whichever search engine they prefer.

For example: Google works fine with: site:postgresql.org

It also does phrase searches, pdfs (you can even do filetype: inurl:
and other stuff[1]For example: filetype:pdf "company confidential" or filetype:xls confidential price).

Have fun!

Link.

[1]: For example: filetype:pdf "company confidential" or filetype:xls confidential price
or filetype:xls confidential price

At 07:47 AM 12/19/2006, Joshua D. Drake wrote:

Show quoted text

Hello,

search.postgresql.org is now served directly from PostgreSQL 8.2 ,
Tsearch2 and GIN. We have been testing thoroughly for the last couple of
weeks but of course... it is now open to the general public.

Take a look at let us know what you think and how it performs for you.

Sincerely,

Joshua D. Drake

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

#16Shane Ambler
pgsql@007Marketing.com
In reply to: Magnus Hagander (#9)
Re: Let's play bash the search engine

Magnus Hagander wrote:

On Tue, Dec 19, 2006 at 12:56:25AM +0100, Thomas H. wrote:

Take a look at let us know what you think and how it performs for you.

i would love an advanced search where you can limit the results to a
particular version of the documentation. the query for "SELECT" returns too
many results from too many versions, obviously.

You get this if you go into say the 8.2 docs, and use the search form
there - same as before.

I would search from the home page rather than navigate to docs first.
(open browser - type postgres.org (enter) - tab to search field - type
whattofind (enter)

That said, it's not a bad idea to add it anyway, but so far the main
concern has been feature-identical to what we had before.

It would seem to be one of those 'accept it as it is' when you get here,
and now that you ask us to look we say 'but why?' ;-)

--

Shane Ambler
pgSQL@007Marketing.com

Get Sheeky @ http://Sheeky.Biz

#17Magnus Hagander
magnus@hagander.net
In reply to: Shane Ambler (#16)
Re: Let's play bash the search engine

On Wed, Dec 20, 2006 at 01:35:57AM +1030, Shane Ambler wrote:

Magnus Hagander wrote:

You get this if you go into say the 8.2 docs, and use the search form
there - same as before.

I would search from the home page rather than navigate to docs first.
(open browser - type postgres.org (enter) - tab to search field - type
whattofind (enter)

Yeah, I can see how that's a common usage pattern.

That said, it's not a bad idea to add it anyway, but so far the main
concern has been feature-identical to what we had before.

It would seem to be one of those 'accept it as it is' when you get here,
and now that you ask us to look we say 'but why?' ;-)

Heh, it was actually Josh that asked you to look ;)

But seriously, I'm definitly interested in ways it can be improved - and
that's true of the whole web team, I'm sure. It was just my way of
saying "it will take a while", but I'll file it away as a good thing to
do when there is a moment of spare time.

//Magnus

#18Matthew T. O'Connor
matthew@zeut.net
In reply to: Magnus Hagander (#17)
Re: Let's play bash the search engine

Magnus Hagander wrote:

But seriously, I'm definitly interested in ways it can be improved - and
that's true of the whole web team, I'm sure. It was just my way of
saying "it will take a while", but I'll file it away as a good thing to
do when there is a moment of spare time.

I like the way the php.net homepage has a search box on the homepage
with a dropdown next to it to specify what to search.

#19Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Matthew T. O'Connor (#18)
Re: Let's play bash the search engine

Matthew O'Connor wrote:

Magnus Hagander wrote:

But seriously, I'm definitly interested in ways it can be improved - and
that's true of the whole web team, I'm sure. It was just my way of
saying "it will take a while", but I'll file it away as a good thing to
do when there is a moment of spare time.

I like the way the php.net homepage has a search box on the homepage
with a dropdown next to it to specify what to search.

Yeah, that would be very appropriate, allowing you to search specific
version of the docs. Heck, if it allowed searching of specific mail
lists, that would rock.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#20Gurjeet Singh
singh.gurjeet@gmail.com
In reply to: Matthew T. O'Connor (#18)
Re: Let's play bash the search engine

On 12/19/06, Matthew O'Connor <matthew@zeut.net> wrote:

Magnus Hagander wrote:

But seriously, I'm definitly interested in ways it can be improved - and
that's true of the whole web team, I'm sure. It was just my way of
saying "it will take a while", but I'll file it away as a good thing to
do when there is a moment of spare time.

I like the way the php.net homepage has a search box on the homepage
with a dropdown next to it to specify what to search.

I would recommend a set of check-boxes, so that user can select multiple
places to search. Eg. search in 8.2 release, 8.0 release, ans as just
suggested by Alvaro, pgsql-hackers mailing list also.

---------------------------(end of broadcast)---------------------------

TIP 4: Have you searched our list archives?

http://archives.postgresql.org/

--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | yahoo }.com

#21Magnus Hagander
magnus@hagander.net
In reply to: Gurjeet Singh (#20)
#22Oleg Bartunov
oleg@sai.msu.su
In reply to: Alvaro Herrera (#19)
#23Magnus Hagander
magnus@hagander.net
In reply to: Alvaro Herrera (#19)
#24Oleg Bartunov
oleg@sai.msu.su
In reply to: Gurjeet Singh (#20)
#25Matthew T. O'Connor
matthew@zeut.net
In reply to: Magnus Hagander (#21)
#26Gurjeet Singh
singh.gurjeet@gmail.com
In reply to: Oleg Bartunov (#24)
#27Joshua D. Drake
jd@commandprompt.com
In reply to: Magnus Hagander (#21)
#28Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Magnus Hagander (#23)
#29Filip Rembiałkowski
plk.zuber@gmail.com
In reply to: Joshua D. Drake (#1)
#30Oleg Bartunov
oleg@sai.msu.su
In reply to: Filip Rembiałkowski (#29)
#31Steve Atkins
steve@blighty.com
In reply to: Magnus Hagander (#23)
#32Thomas H.
me@alternize.com
In reply to: Joshua D. Drake (#1)
#33Oleg Bartunov
oleg@sai.msu.su
In reply to: Thomas H. (#32)
#34Reece Hart
reece@harts.net
In reply to: Hannes Dorbath (#13)
#35Thomas H.
me@alternize.com
In reply to: Joshua D. Drake (#1)
#36Hannes Dorbath
light@theendofthetunnel.de
In reply to: Reece Hart (#34)
#37Magnus Hagander
magnus@hagander.net
In reply to: Thomas H. (#35)