website doc search is extremely SLOW

oleg@sai.msu.su

over 22 years ago

In reply to: D. Dante Lorenso (#1)

Re: website doc search is extremely SLOW

On Mon, 29 Dec 2003, D. Dante Lorenso wrote:

Trying to use the 'search' in the docs section of PostgreSQL.org
is extremely SLOW. Considering this is a website for a database
and databases are supposed to be good for indexing content, I'd
expect a much faster performance.

I submitted my search over two minutes ago. I just finished this
email to the list. The results have still not come back. I only
searched for:

SECURITY INVOKER

Perhaps this should be worked on?

Your query takes 0.01 sec to complete (134 documents found) on my development
server I hope to present to the community soon after New Year. We've
crawled 27 postgresql related sites. Screenshot is available
http://www.sai.msu.su/~megera/postgres/pgsql.ru.gif

Dante

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

scrappy@hub.org

over 22 years ago

In reply to: D. Dante Lorenso (#1)

Re: website doc search is extremely SLOW

On Mon, 29 Dec 2003, D. Dante Lorenso wrote:

Trying to use the 'search' in the docs section of PostgreSQL.org
is extremely SLOW. Considering this is a website for a database
and databases are supposed to be good for indexing content, I'd
expect a much faster performance.

What is the full URL for the page you are looking at? Just the 'search
link' at the top of the page?

Perhaps this should be worked on?

Looking into it right now ...

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664

scrappy@hub.org

over 22 years ago

In reply to: The Hermit Hacker (#3)

Re: website doc search is extremely SLOW

On Tue, 30 Dec 2003, Marc G. Fournier wrote:

On Mon, 29 Dec 2003, D. Dante Lorenso wrote:

Trying to use the 'search' in the docs section of PostgreSQL.org
is extremely SLOW. Considering this is a website for a database
and databases are supposed to be good for indexing content, I'd
expect a much faster performance.

What is the full URL for the page you are looking at? Just the 'search
link' at the top of the page?

Perhaps this should be worked on?

Looking into it right now ...

just ran it from archives.postgresql.org (security invoker) and it comes
back in 10 seconds ... I think it might be a problem with doing a search
while indexing is happening ... am looking at that ...

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664

http://www.postgresql.org/search.cgi?ul=http://www.postgresql.org/docs/7.4/static/&q=SECURITY+INVOKER

jd@commandprompt.com

over 22 years ago

In reply to: The Hermit Hacker (#3)

Re: website doc search is extremely SLOW

When you got to docs and then click static, it has the ability to
search. It is slowwwwwwwww....

Sincerely,

Joshua D. Drake

On Tue, 2003-12-30 at 19:05, Marc G. Fournier wrote:

On Mon, 29 Dec 2003, D. Dante Lorenso wrote:

Trying to use the 'search' in the docs section of PostgreSQL.org
is extremely SLOW. Considering this is a website for a database
and databases are supposed to be good for indexing content, I'd
expect a much faster performance.

What is the full URL for the page you are looking at? Just the 'search
link' at the top of the page?

Perhaps this should be worked on?

Looking into it right now ...

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

-- 
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL

D. Dante Lorenso

dante@lorenso.com

over 22 years ago

In reply to: The Hermit Hacker (#3)

Re: website doc search is extremely SLOW

Marc G. Fournier wrote:

On Mon, 29 Dec 2003, D. Dante Lorenso wrote:

Trying to use the 'search' in the docs section of PostgreSQL.org
is extremely SLOW. Considering this is a website for a database
and databases are supposed to be good for indexing content, I'd
expect a much faster performance.

What is the full URL for the page you are looking at? Just the 'search
link' at the top of the page?

Perhaps this should be worked on?

Looking into it right now ...

http://www.postgresql.org/ *click Docs on top of page*
http://www.postgresql.org/docs/ * click PostgreSQL static
documentation *

Search this document set: [ SECURITY INVOKER ] Search!

I loaded that URL on IE and I wait like 2 minutes or more for a response.
then, it usually returns with 1 result. I click the Search! button again
to refresh and it came back a little faster with 0 results?

Searched again from the top and it's a little faster now:

* click search *

date

Wed Dec 31 22:52:01 CST 2003

* results come back *

date

Wed Dec 31 22:52:27 CST 2003

Still one result:

PostgreSQL 7.4 Documentation (SQL Key Words)
<http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html>
[*0.087%*]
http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html
Size: 65401 bytes, modified: Tue, 25 Nov 2003, 15:02:33 AST

However, the page that I SHOULD have found was this one:

http://www.postgresql.org/docs/current/static/sql-createfunction.html

That page has SECURITY INVOKER in a whole section:

[EXTERNAL] SECURITY INVOKER
[EXTERNAL] SECURITY DEFINER

SECURITY INVOKER indicates that the function is to be executed with
the privileges of the user that calls it. That is the default.
SECURITY DEFINER specifies that the function is to be executed with
the privileges of the user that created it.

Dante

----------
D. Dante Lorenso
dante@lorenso.com

pg@fastcrypt.com

over 22 years ago

In reply to: D. Dante Lorenso (#6)

Re: website doc search is extremely SLOW

search for create index took 59 seconds ?

I've got a fairly (< 1 second for the same search) fast search engine on
the docs at

http://postgresintl.com/search?query=create index

if that link doesn't work, try

postgres.fastcrypt.com/search?query=create index

for now you will have to type it, I'm working on indexing it then making
it pretty

Dave

On Tue, 2003-12-30 at 22:39, D. Dante Lorenso wrote:

Marc G. Fournier wrote:

On Mon, 29 Dec 2003, D. Dante Lorenso wrote:

Trying to use the 'search' in the docs section of PostgreSQL.org
is extremely SLOW. Considering this is a website for a database
and databases are supposed to be good for indexing content, I'd
expect a much faster performance.

What is the full URL for the page you are looking at? Just the 'search
link' at the top of the page?

Perhaps this should be worked on?

Looking into it right now ...

http://www.postgresql.org/ *click Docs on top of page*
http://www.postgresql.org/docs/ * click PostgreSQL static
documentation *

Search this document set: [ SECURITY INVOKER ] Search!

http://www.postgresql.org/search.cgi?ul=http://www.postgresql.org/docs/7.4/static/&q=SECURITY+INVOKER

I loaded that URL on IE and I wait like 2 minutes or more for a response.
then, it usually returns with 1 result. I click the Search! button again
to refresh and it came back a little faster with 0 results?

Searched again from the top and it's a little faster now:

* click search *

date

Wed Dec 31 22:52:01 CST 2003

* results come back *

date

Wed Dec 31 22:52:27 CST 2003

Still one result:

PostgreSQL 7.4 Documentation (SQL Key Words)
<http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html>
[*0.087%*]
http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html
Size: 65401 bytes, modified: Tue, 25 Nov 2003, 15:02:33 AST

However, the page that I SHOULD have found was this one:

http://www.postgresql.org/docs/current/static/sql-createfunction.html

That page has SECURITY INVOKER in a whole section:

[EXTERNAL] SECURITY INVOKER
[EXTERNAL] SECURITY DEFINER

SECURITY INVOKER indicates that the function is to be executed with
the privileges of the user that calls it. That is the default.
SECURITY DEFINER specifies that the function is to be executed with
the privileges of the user that created it.

Dante

----------
D. Dante Lorenso
dante@lorenso.com

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

--
Dave Cramer
519 939 0336
ICQ # 1467551

scrappy@hub.org

over 22 years ago

In reply to: Dave Cramer (#7)

Re: website doc search is extremely SLOW

does anyone know anything better then mnogosearch, that works with
PostgreSQL, for doing indexing? the database server is a Dual Xeon 2.4G,
4G of RAM, and a load avg right now of a lowly 1.5 ... the file system is
3x72G drive in a RAID5 configuration, and the database server is 7.4 ...
the mnogosearch folk use mysql for their development, so its possible
there is something they are doing that is slowing this process down, to
compensate for a fault in mysql, but this is ridiculous ...

note that I have it setup with what the mnogosearch folk lists as being
'the fastest schema for large indexes' or 'crc-multi' ...

right now, we're running only 373k docs:

isvr5# indexer -S

Database statistics

Status Expired Total
-----------------------------
415 0 311 Unsupported Media Type
302 0 1171 Moved Temporarily
502 0 43 Bad Gateway
414 0 3 Request-URI Too Long
301 0 307 Moved Permanently
404 0 1960 Not found
410 0 1 Gone
401 0 51 Unauthorized
304 0 16591 Not Modified
200 0 373015 OK
504 0 48 Gateway Timeout
400 0 3 Bad Request
0 2 47 Not indexed yet
-----------------------------
Total 2 393551

and a vacuum analyze runs nightly ...

anyone with suggestions/ideas? has to be something client/server, like
mnogosearch, as we're dealing with multiple servers searching against the
same database ... so I don't *think* that ht/Dig is a solution, but may be
wrong there ...

On Wed, 30 Dec 2003, Dave Cramer wrote:

search for create index took 59 seconds ?

I've got a fairly (< 1 second for the same search) fast search engine on
the docs at

http://postgresintl.com/search?query=create index

if that link doesn't work, try

postgres.fastcrypt.com/search?query=create index

for now you will have to type it, I'm working on indexing it then making
it pretty

Dave

On Tue, 2003-12-30 at 22:39, D. Dante Lorenso wrote:

Marc G. Fournier wrote:

On Mon, 29 Dec 2003, D. Dante Lorenso wrote:

Trying to use the 'search' in the docs section of PostgreSQL.org
is extremely SLOW. Considering this is a website for a database
and databases are supposed to be good for indexing content, I'd
expect a much faster performance.

What is the full URL for the page you are looking at? Just the 'search
link' at the top of the page?

Perhaps this should be worked on?

Looking into it right now ...

http://www.postgresql.org/ *click Docs on top of page*
http://www.postgresql.org/docs/ * click PostgreSQL static
documentation *

Search this document set: [ SECURITY INVOKER ] Search!

http://www.postgresql.org/search.cgi?ul=http://www.postgresql.org/docs/7.4/static/&q=SECURITY+INVOKER

I loaded that URL on IE and I wait like 2 minutes or more for a response.
then, it usually returns with 1 result. I click the Search! button again
to refresh and it came back a little faster with 0 results?

Searched again from the top and it's a little faster now:

* click search *

date

Wed Dec 31 22:52:01 CST 2003

* results come back *

date

Wed Dec 31 22:52:27 CST 2003

Still one result:

PostgreSQL 7.4 Documentation (SQL Key Words)
<http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html>
[*0.087%*]
http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html
Size: 65401 bytes, modified: Tue, 25 Nov 2003, 15:02:33 AST

However, the page that I SHOULD have found was this one:

http://www.postgresql.org/docs/current/static/sql-createfunction.html

That page has SECURITY INVOKER in a whole section:

[EXTERNAL] SECURITY INVOKER
[EXTERNAL] SECURITY DEFINER

SECURITY INVOKER indicates that the function is to be executed with
the privileges of the user that calls it. That is the default.
SECURITY DEFINER specifies that the function is to be executed with
the privileges of the user that created it.

Dante

----------
D. Dante Lorenso
dante@lorenso.com

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

--
Dave Cramer
519 939 0336
ICQ # 1467551

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664

pg@fastcrypt.com

over 22 years ago

In reply to: The Hermit Hacker (#8)

Re: website doc search is extremely SLOW

Why are their multiple servers hitting the same db

what servers are searching through the db?

Dave
On Wed, 2003-12-31 at 00:04, Marc G. Fournier wrote:

does anyone know anything better then mnogosearch, that works with
PostgreSQL, for doing indexing? the database server is a Dual Xeon 2.4G,
4G of RAM, and a load avg right now of a lowly 1.5 ... the file system is
3x72G drive in a RAID5 configuration, and the database server is 7.4 ...
the mnogosearch folk use mysql for their development, so its possible
there is something they are doing that is slowing this process down, to
compensate for a fault in mysql, but this is ridiculous ...

note that I have it setup with what the mnogosearch folk lists as being
'the fastest schema for large indexes' or 'crc-multi' ...

right now, we're running only 373k docs:

isvr5# indexer -S

Database statistics

Status Expired Total
-----------------------------
415 0 311 Unsupported Media Type
302 0 1171 Moved Temporarily
502 0 43 Bad Gateway
414 0 3 Request-URI Too Long
301 0 307 Moved Permanently
404 0 1960 Not found
410 0 1 Gone
401 0 51 Unauthorized
304 0 16591 Not Modified
200 0 373015 OK
504 0 48 Gateway Timeout
400 0 3 Bad Request
0 2 47 Not indexed yet
-----------------------------
Total 2 393551

and a vacuum analyze runs nightly ...

anyone with suggestions/ideas? has to be something client/server, like
mnogosearch, as we're dealing with multiple servers searching against the
same database ... so I don't *think* that ht/Dig is a solution, but may be
wrong there ...

On Wed, 30 Dec 2003, Dave Cramer wrote:

search for create index took 59 seconds ?

I've got a fairly (< 1 second for the same search) fast search engine on
the docs at

http://postgresintl.com/search?query=create index

if that link doesn't work, try

postgres.fastcrypt.com/search?query=create index

for now you will have to type it, I'm working on indexing it then making
it pretty

Dave

On Tue, 2003-12-30 at 22:39, D. Dante Lorenso wrote:

Marc G. Fournier wrote:

On Mon, 29 Dec 2003, D. Dante Lorenso wrote:

Trying to use the 'search' in the docs section of PostgreSQL.org
is extremely SLOW. Considering this is a website for a database
and databases are supposed to be good for indexing content, I'd
expect a much faster performance.

What is the full URL for the page you are looking at? Just the 'search
link' at the top of the page?

Perhaps this should be worked on?

Looking into it right now ...

http://www.postgresql.org/ *click Docs on top of page*
http://www.postgresql.org/docs/ * click PostgreSQL static
documentation *

Search this document set: [ SECURITY INVOKER ] Search!

http://www.postgresql.org/search.cgi?ul=http://www.postgresql.org/docs/7.4/static/&q=SECURITY+INVOKER

I loaded that URL on IE and I wait like 2 minutes or more for a response.
then, it usually returns with 1 result. I click the Search! button again
to refresh and it came back a little faster with 0 results?

Searched again from the top and it's a little faster now:

* click search *

date

Wed Dec 31 22:52:01 CST 2003

* results come back *

date

Wed Dec 31 22:52:27 CST 2003

Still one result:

PostgreSQL 7.4 Documentation (SQL Key Words)
<http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html>
[*0.087%*]
http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html
Size: 65401 bytes, modified: Tue, 25 Nov 2003, 15:02:33 AST

However, the page that I SHOULD have found was this one:

http://www.postgresql.org/docs/current/static/sql-createfunction.html

That page has SECURITY INVOKER in a whole section:

[EXTERNAL] SECURITY INVOKER
[EXTERNAL] SECURITY DEFINER

SECURITY INVOKER indicates that the function is to be executed with
the privileges of the user that calls it. That is the default.
SECURITY DEFINER specifies that the function is to be executed with
the privileges of the user that created it.

Dante

----------
D. Dante Lorenso
dante@lorenso.com

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

--
Dave Cramer
519 939 0336
ICQ # 1467551

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664

--
Dave Cramer
519 939 0336
ICQ # 1467551

#10

scrappy@hub.org

over 22 years ago

In reply to: Dave Cramer (#9)

Re: website doc search is extremely SLOW

On Wed, 31 Dec 2003, Dave Cramer wrote:

Why are their multiple servers hitting the same db

what servers are searching through the db?

www.postgresql.org and archives.postgresql.org both hit the same DB ...
the point is more that whatever alternative that someone can suggest, it
has to be able to be accessed centrally from several different machines
... when I just tried a search, I was the only one hitting the database,
and the search was dreadful, so it isn't a problem with multiple
connections :(

Just as an FYI, the database server has sufficient RAM on her, so it isn't
a swapping issue ... swap usuage right now, after 77 days uptime:

Device 1K-blocks Used Avail Capacity Type
/dev/da0s1b 8388480 17556 8370924 0% Interleaved

Dave
On Wed, 2003-12-31 at 00:04, Marc G. Fournier wrote:

does anyone know anything better then mnogosearch, that works with
PostgreSQL, for doing indexing? the database server is a Dual Xeon 2.4G,
4G of RAM, and a load avg right now of a lowly 1.5 ... the file system is
3x72G drive in a RAID5 configuration, and the database server is 7.4 ...
the mnogosearch folk use mysql for their development, so its possible
there is something they are doing that is slowing this process down, to
compensate for a fault in mysql, but this is ridiculous ...

note that I have it setup with what the mnogosearch folk lists as being
'the fastest schema for large indexes' or 'crc-multi' ...

right now, we're running only 373k docs:

isvr5# indexer -S

Database statistics

Status Expired Total
-----------------------------
415 0 311 Unsupported Media Type
302 0 1171 Moved Temporarily
502 0 43 Bad Gateway
414 0 3 Request-URI Too Long
301 0 307 Moved Permanently
404 0 1960 Not found
410 0 1 Gone
401 0 51 Unauthorized
304 0 16591 Not Modified
200 0 373015 OK
504 0 48 Gateway Timeout
400 0 3 Bad Request
0 2 47 Not indexed yet
-----------------------------
Total 2 393551

and a vacuum analyze runs nightly ...

anyone with suggestions/ideas? has to be something client/server, like
mnogosearch, as we're dealing with multiple servers searching against the
same database ... so I don't *think* that ht/Dig is a solution, but may be
wrong there ...

On Wed, 30 Dec 2003, Dave Cramer wrote:

search for create index took 59 seconds ?

I've got a fairly (< 1 second for the same search) fast search engine on
the docs at

http://postgresintl.com/search?query=create index

if that link doesn't work, try

postgres.fastcrypt.com/search?query=create index

for now you will have to type it, I'm working on indexing it then making
it pretty

Dave

On Tue, 2003-12-30 at 22:39, D. Dante Lorenso wrote:

Marc G. Fournier wrote:

On Mon, 29 Dec 2003, D. Dante Lorenso wrote:

Trying to use the 'search' in the docs section of PostgreSQL.org
is extremely SLOW. Considering this is a website for a database
and databases are supposed to be good for indexing content, I'd
expect a much faster performance.

What is the full URL for the page you are looking at? Just the 'search
link' at the top of the page?

Perhaps this should be worked on?

Looking into it right now ...

http://www.postgresql.org/ *click Docs on top of page*
http://www.postgresql.org/docs/ * click PostgreSQL static
documentation *

Search this document set: [ SECURITY INVOKER ] Search!

http://www.postgresql.org/search.cgi?ul=http://www.postgresql.org/docs/7.4/static/&q=SECURITY+INVOKER

I loaded that URL on IE and I wait like 2 minutes or more for a response.
then, it usually returns with 1 result. I click the Search! button again
to refresh and it came back a little faster with 0 results?

Searched again from the top and it's a little faster now:

* click search *

date

Wed Dec 31 22:52:01 CST 2003

* results come back *

date

Wed Dec 31 22:52:27 CST 2003

Still one result:

PostgreSQL 7.4 Documentation (SQL Key Words)
<http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html>
[*0.087%*]
http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html
Size: 65401 bytes, modified: Tue, 25 Nov 2003, 15:02:33 AST

However, the page that I SHOULD have found was this one:

http://www.postgresql.org/docs/current/static/sql-createfunction.html

That page has SECURITY INVOKER in a whole section:

[EXTERNAL] SECURITY INVOKER
[EXTERNAL] SECURITY DEFINER

SECURITY INVOKER indicates that the function is to be executed with
the privileges of the user that calls it. That is the default.
SECURITY DEFINER specifies that the function is to be executed with
the privileges of the user that created it.

Dante

----------
D. Dante Lorenso
dante@lorenso.com

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

--
Dave Cramer
519 939 0336
ICQ # 1467551

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664

--
Dave Cramer
519 939 0336
ICQ # 1467551

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664

#11

pg@fastcrypt.com

over 22 years ago

In reply to: The Hermit Hacker (#8)

Re: website doc search is extremely SLOW

I can modify mine to be client server if you want?

It is a java app, so we need to be able to run jdk1.3 at least?

Dave
On Wed, 2003-12-31 at 00:04, Marc G. Fournier wrote:

does anyone know anything better then mnogosearch, that works with
PostgreSQL, for doing indexing? the database server is a Dual Xeon 2.4G,
4G of RAM, and a load avg right now of a lowly 1.5 ... the file system is
3x72G drive in a RAID5 configuration, and the database server is 7.4 ...
the mnogosearch folk use mysql for their development, so its possible
there is something they are doing that is slowing this process down, to
compensate for a fault in mysql, but this is ridiculous ...

note that I have it setup with what the mnogosearch folk lists as being
'the fastest schema for large indexes' or 'crc-multi' ...

right now, we're running only 373k docs:

isvr5# indexer -S

Database statistics

Status Expired Total
-----------------------------
415 0 311 Unsupported Media Type
302 0 1171 Moved Temporarily
502 0 43 Bad Gateway
414 0 3 Request-URI Too Long
301 0 307 Moved Permanently
404 0 1960 Not found
410 0 1 Gone
401 0 51 Unauthorized
304 0 16591 Not Modified
200 0 373015 OK
504 0 48 Gateway Timeout
400 0 3 Bad Request
0 2 47 Not indexed yet
-----------------------------
Total 2 393551

and a vacuum analyze runs nightly ...

anyone with suggestions/ideas? has to be something client/server, like
mnogosearch, as we're dealing with multiple servers searching against the
same database ... so I don't *think* that ht/Dig is a solution, but may be
wrong there ...

On Wed, 30 Dec 2003, Dave Cramer wrote:

search for create index took 59 seconds ?

I've got a fairly (< 1 second for the same search) fast search engine on
the docs at

http://postgresintl.com/search?query=create index

if that link doesn't work, try

postgres.fastcrypt.com/search?query=create index

for now you will have to type it, I'm working on indexing it then making
it pretty

Dave

On Tue, 2003-12-30 at 22:39, D. Dante Lorenso wrote:

Marc G. Fournier wrote:

On Mon, 29 Dec 2003, D. Dante Lorenso wrote:

Trying to use the 'search' in the docs section of PostgreSQL.org
is extremely SLOW. Considering this is a website for a database
and databases are supposed to be good for indexing content, I'd
expect a much faster performance.

What is the full URL for the page you are looking at? Just the 'search
link' at the top of the page?

Perhaps this should be worked on?

Looking into it right now ...

http://www.postgresql.org/ *click Docs on top of page*
http://www.postgresql.org/docs/ * click PostgreSQL static
documentation *

Search this document set: [ SECURITY INVOKER ] Search!

http://www.postgresql.org/search.cgi?ul=http://www.postgresql.org/docs/7.4/static/&q=SECURITY+INVOKER

I loaded that URL on IE and I wait like 2 minutes or more for a response.
then, it usually returns with 1 result. I click the Search! button again
to refresh and it came back a little faster with 0 results?

Searched again from the top and it's a little faster now:

* click search *

date

Wed Dec 31 22:52:01 CST 2003

* results come back *

date

Wed Dec 31 22:52:27 CST 2003

Still one result:

PostgreSQL 7.4 Documentation (SQL Key Words)
<http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html>
[*0.087%*]
http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html
Size: 65401 bytes, modified: Tue, 25 Nov 2003, 15:02:33 AST

However, the page that I SHOULD have found was this one:

http://www.postgresql.org/docs/current/static/sql-createfunction.html

That page has SECURITY INVOKER in a whole section:

[EXTERNAL] SECURITY INVOKER
[EXTERNAL] SECURITY DEFINER

SECURITY INVOKER indicates that the function is to be executed with
the privileges of the user that calls it. That is the default.
SECURITY DEFINER specifies that the function is to be executed with
the privileges of the user that created it.

Dante

----------
D. Dante Lorenso
dante@lorenso.com

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

--
Dave Cramer
519 939 0336
ICQ # 1467551

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

--
Dave Cramer
519 939 0336
ICQ # 1467551

#12

scrappy@hub.org

over 22 years ago

In reply to: Dave Cramer (#11)

Re: website doc search is extremely SLOW

On Wed, 31 Dec 2003, Dave Cramer wrote:

I can modify mine to be client server if you want?

It is a java app, so we need to be able to run jdk1.3 at least?

jdk1.4 is available on the VMs ... does your spider? for instance, you
mention that you have the docs indexed right now, but we are currently
indexing:

Server http://archives.postgresql.org/
Server http://advocacy.postgresql.org/
Server http://developer.postgresql.org/
Server http://gborg.postgresql.org/
Server http://pgadmin.postgresql.org/
Server http://techdocs.postgresql.org/
Server http://www.postgresql.org/

will it be able to handle:

186_archives=# select count(*) from url;
count
--------
393551
(1 row)

as fast as you are finding with just the docs?

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664

#13

jd@commandprompt.com

over 22 years ago

In reply to: The Hermit Hacker (#10)

Re: website doc search is extremely SLOW

Hello,

Why are we not using Tsearch2?

Besides the obvious of getting everything into the database?

Sincerely,

Joshua D. Drake

On Tue, 2003-12-30 at 21:24, Marc G. Fournier wrote:

On Wed, 31 Dec 2003, Dave Cramer wrote:

Why are their multiple servers hitting the same db

what servers are searching through the db?

www.postgresql.org and archives.postgresql.org both hit the same DB ...
the point is more that whatever alternative that someone can suggest, it
has to be able to be accessed centrally from several different machines
... when I just tried a search, I was the only one hitting the database,
and the search was dreadful, so it isn't a problem with multiple
connections :(

Just as an FYI, the database server has sufficient RAM on her, so it isn't
a swapping issue ... swap usuage right now, after 77 days uptime:

Device 1K-blocks Used Avail Capacity Type
/dev/da0s1b 8388480 17556 8370924 0% Interleaved

Dave
On Wed, 2003-12-31 at 00:04, Marc G. Fournier wrote:

does anyone know anything better then mnogosearch, that works with
PostgreSQL, for doing indexing? the database server is a Dual Xeon 2.4G,
4G of RAM, and a load avg right now of a lowly 1.5 ... the file system is
3x72G drive in a RAID5 configuration, and the database server is 7.4 ...
the mnogosearch folk use mysql for their development, so its possible
there is something they are doing that is slowing this process down, to
compensate for a fault in mysql, but this is ridiculous ...

note that I have it setup with what the mnogosearch folk lists as being
'the fastest schema for large indexes' or 'crc-multi' ...

right now, we're running only 373k docs:

isvr5# indexer -S

Database statistics

Status Expired Total
-----------------------------
415 0 311 Unsupported Media Type
302 0 1171 Moved Temporarily
502 0 43 Bad Gateway
414 0 3 Request-URI Too Long
301 0 307 Moved Permanently
404 0 1960 Not found
410 0 1 Gone
401 0 51 Unauthorized
304 0 16591 Not Modified
200 0 373015 OK
504 0 48 Gateway Timeout
400 0 3 Bad Request
0 2 47 Not indexed yet
-----------------------------
Total 2 393551

and a vacuum analyze runs nightly ...

anyone with suggestions/ideas? has to be something client/server, like
mnogosearch, as we're dealing with multiple servers searching against the
same database ... so I don't *think* that ht/Dig is a solution, but may be
wrong there ...

On Wed, 30 Dec 2003, Dave Cramer wrote:

search for create index took 59 seconds ?

I've got a fairly (< 1 second for the same search) fast search engine on
the docs at

http://postgresintl.com/search?query=create index

if that link doesn't work, try

postgres.fastcrypt.com/search?query=create index

for now you will have to type it, I'm working on indexing it then making
it pretty

Dave

On Tue, 2003-12-30 at 22:39, D. Dante Lorenso wrote:

Marc G. Fournier wrote:

On Mon, 29 Dec 2003, D. Dante Lorenso wrote:

Trying to use the 'search' in the docs section of PostgreSQL.org
is extremely SLOW. Considering this is a website for a database
and databases are supposed to be good for indexing content, I'd
expect a much faster performance.

What is the full URL for the page you are looking at? Just the 'search
link' at the top of the page?

Perhaps this should be worked on?

Looking into it right now ...

http://www.postgresql.org/ *click Docs on top of page*
http://www.postgresql.org/docs/ * click PostgreSQL static
documentation *

Search this document set: [ SECURITY INVOKER ] Search!

http://www.postgresql.org/search.cgi?ul=http://www.postgresql.org/docs/7.4/static/&q=SECURITY+INVOKER

I loaded that URL on IE and I wait like 2 minutes or more for a response.
then, it usually returns with 1 result. I click the Search! button again
to refresh and it came back a little faster with 0 results?

Searched again from the top and it's a little faster now:

* click search *

date

Wed Dec 31 22:52:01 CST 2003

* results come back *

date

Wed Dec 31 22:52:27 CST 2003

Still one result:

PostgreSQL 7.4 Documentation (SQL Key Words)
<http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html>
[*0.087%*]
http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html
Size: 65401 bytes, modified: Tue, 25 Nov 2003, 15:02:33 AST

However, the page that I SHOULD have found was this one:

http://www.postgresql.org/docs/current/static/sql-createfunction.html

That page has SECURITY INVOKER in a whole section:

[EXTERNAL] SECURITY INVOKER
[EXTERNAL] SECURITY DEFINER

SECURITY INVOKER indicates that the function is to be executed with
the privileges of the user that calls it. That is the default.
SECURITY DEFINER specifies that the function is to be executed with
the privileges of the user that created it.

Dante

----------
D. Dante Lorenso
dante@lorenso.com

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

--
Dave Cramer
519 939 0336
ICQ # 1467551

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664

--
Dave Cramer
519 939 0336
ICQ # 1467551

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

-- 
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL

#14

Arjen van der Meijden

acmmailing@vulcanus.its.tudelft.nl

over 22 years ago

In reply to: The Hermit Hacker (#8)

Re: website doc search is extremely SLOW

Marc,

At our website we had a "in database" search as well... It was terribly
slow (it was a custom built vector space model implemented in mysql+php
so that explains a bit).

We replaced it by the Xapian library (www.xapian.org) with its Omega
frontend as a middle end. I.e. we call with our php-scripts the omega
search frontend and postprocess the results with the scripts (some
rights double checks and so on), from the results we build a very simpel
SELECT ... FROM documents ... WHERE docid IN implode($docids_array)
(you understand enough php to understand this, I suppose)

With our 10GB of tekst, we have a 14GB (uncompressed, 9G compressed
orso) xapian database (the largest part is for the 6.7G positional
table), I'm pretty sure that if we'd store that information in something
like tsearch it'd be more than that 14GB...

Searches take less than a second (unless you do phrase searches of
course, that takes a few seconds and sometimes a few minutes).

I did a query on 'ext3 undelete' just a few minutes ago and it did the
search in 827150 documents in only 0.027 (a second run 0.006) seconds
(ext3 was found in 753 and undelete in 360 documents). Of course that is
excluding the results parsing, the total time to create the webpage was
"much" longer (0.43 seconds orso) due to the fact that the results
needs to be transferred via xinetd and the results needs to be extracted
from mysql (which is terrible with the "search supporting queries" we
issue :/ ) Our search machine is very similar the machine you use as
database, but it doesn't do much heavy work apart from running the
xapian/omega search combination.

If you are interested in this, I can provide (much) more information
about our implementation. Since you don't need right-checks, you could
even get away with just the omega front end all by itself (it has a nice
scripting language, but can't interface with anything but xapian).

The main advantage of taking this out of your sql database is that it
runs on its own custom built storage system (and you could offload it to
another machine, like we did).
Btw, if you really need an "in database" solution, read back the
postings of Eric Ridge at 26-12-2003 20:54 on the hackers list (he's
working on integrating xapian in postgresql as a FTI)

Best regards,

Arjen van der Meijden

Marc G. Fournier wrote:

Show quoted text

does anyone know anything better then mnogosearch, that works with
PostgreSQL, for doing indexing? the database server is a Dual Xeon 2.4G,
4G of RAM, and a load avg right now of a lowly 1.5 ... the file system is
3x72G drive in a RAID5 configuration, and the database server is 7.4 ...
the mnogosearch folk use mysql for their development, so its possible
there is something they are doing that is slowing this process down, to
compensate for a fault in mysql, but this is ridiculous ...

note that I have it setup with what the mnogosearch folk lists as being
'the fastest schema for large indexes' or 'crc-multi' ...

right now, we're running only 373k docs:

isvr5# indexer -S

Database statistics

Status Expired Total
-----------------------------
415 0 311 Unsupported Media Type
302 0 1171 Moved Temporarily
502 0 43 Bad Gateway
414 0 3 Request-URI Too Long
301 0 307 Moved Permanently
404 0 1960 Not found
410 0 1 Gone
401 0 51 Unauthorized
304 0 16591 Not Modified
200 0 373015 OK
504 0 48 Gateway Timeout
400 0 3 Bad Request
0 2 47 Not indexed yet
-----------------------------
Total 2 393551

and a vacuum analyze runs nightly ...

anyone with suggestions/ideas? has to be something client/server, like
mnogosearch, as we're dealing with multiple servers searching against the
same database ... so I don't *think* that ht/Dig is a solution, but may be
wrong there ...

#15

pg@fastcrypt.com

over 22 years ago

In reply to: The Hermit Hacker (#12)

Re: website doc search is extremely SLOW

Marc,

No it doesn't spider, it is a specialized tool for searching documents.

I'm curious, what value is there to being able to count the number of
url's ?

It does do things like query all documents where CREATE AND TABLE are n
words apart, just as fast, I would think these are more valuable to
document searching?

I think the challenge here is what do we want to search. I am betting
that folks use this page as they would man? ie. what is the command for
create trigger?

As I said my offer stands to help out, but I think if the goal is to
search the entire website, then this particular tool is not useful.

At this point I am working on indexing the sgml directly as it has less
cruft in it. For instance all the links that appear in every summary are
just noise.

Dave

On Wed, 2003-12-31 at 00:44, Marc G. Fournier wrote:

On Wed, 31 Dec 2003, Dave Cramer wrote:

I can modify mine to be client server if you want?

It is a java app, so we need to be able to run jdk1.3 at least?

jdk1.4 is available on the VMs ... does your spider? for instance, you
mention that you have the docs indexed right now, but we are currently
indexing:

Server http://archives.postgresql.org/
Server http://advocacy.postgresql.org/
Server http://developer.postgresql.org/
Server http://gborg.postgresql.org/
Server http://pgadmin.postgresql.org/
Server http://techdocs.postgresql.org/
Server http://www.postgresql.org/

will it be able to handle:

186_archives=# select count(*) from url;
count
--------
393551
(1 row)

as fast as you are finding with just the docs?

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664

--
Dave Cramer
519 939 0336
ICQ # 1467551

#16

John Sidney-Woollett

johnsw@wardbrook.com

over 22 years ago

In reply to: Dave Cramer (#15)

Re: website doc search is extremely SLOW

I think that Oleg's new search offering looks really good and fast. (I
can't wait till I have some task that needs tsearch!).

I agree with Dave that searching the docs is more important for me than
the sites - but it would be really nice to have both, in one tool.

I built something similar for the Tate Gallery in the UK - here you can
select the type of content that you want returned, either static pages or
dynamic. You can see the idea at
http://www.tate.org.uk/search/default.jsp?terms=sunset%20oil&action=new

This is custom built (using java/Oracle), supports stemming, boolean
operators, exact phrase matching, relevancy and matched term highlighting.

You can switch on/off the types of documents that you are not interested
in. Using this analogy, a search facility that could offer you results
from i) the docs and/or ii) the postgres sites static pages would be very
useful.

John Sidney-Woollett

Dave Cramer said:

Show quoted text

Marc,

No it doesn't spider, it is a specialized tool for searching documents.

I'm curious, what value is there to being able to count the number of
url's ?

It does do things like query all documents where CREATE AND TABLE are n
words apart, just as fast, I would think these are more valuable to
document searching?

I think the challenge here is what do we want to search. I am betting
that folks use this page as they would man? ie. what is the command for
create trigger?

As I said my offer stands to help out, but I think if the goal is to
search the entire website, then this particular tool is not useful.

At this point I am working on indexing the sgml directly as it has less
cruft in it. For instance all the links that appear in every summary are
just noise.

Dave

On Wed, 2003-12-31 at 00:44, Marc G. Fournier wrote:

On Wed, 31 Dec 2003, Dave Cramer wrote:

I can modify mine to be client server if you want?

It is a java app, so we need to be able to run jdk1.3 at least?

jdk1.4 is available on the VMs ... does your spider? for instance, you
mention that you have the docs indexed right now, but we are currently
indexing:

Server http://archives.postgresql.org/
Server http://advocacy.postgresql.org/
Server http://developer.postgresql.org/
Server http://gborg.postgresql.org/
Server http://pgadmin.postgresql.org/
Server http://techdocs.postgresql.org/
Server http://www.postgresql.org/

will it be able to handle:

186_archives=# select count(*) from url;
count
--------
393551
(1 row)

as fast as you are finding with just the docs?

----
Marc G. Fournier Hub.Org Networking Services
(http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ:
7615664

--
Dave Cramer
519 939 0336
ICQ # 1467551

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

#17

Ericson Smith

eric@did-it.com

over 22 years ago

In reply to: John Sidney-Woollett (#16)

Re: website doc search is extremely SLOW

You should probably take a look at the Swish project. For a certain
project, we tried Tsearch2/Tsearch, even (gasp) MySQL fulltext search,
but with over 600,000 documents to index, both took too long to conduct
searches, especially as the database was swapped in and out of memory
based on search segment. MySQL full text was the most unusable.

Swish uses its own internal DB format, and comes with a simple spider as
well. You can make it search by category, date and other nifty criteria
also.
http://swish-e.org

You can take a look over at the project and do some searches to see what
I mean:
http://cbd-net.com

Warmest regards, 
Ericson Smith
Tracking Specialist/DBA
+-----------------------+----------------------------+
| http://www.did-it.com | "When I'm paid, I always   |
| eric@did-it.com       | follow the job through.    |
| 516-255-0500          | You know that." -Angel Eyes|
+-----------------------+----------------------------+

John Sidney-Woollett wrote:

Show quoted text

I think that Oleg's new search offering looks really good and fast. (I
can't wait till I have some task that needs tsearch!).

I agree with Dave that searching the docs is more important for me than
the sites - but it would be really nice to have both, in one tool.

I built something similar for the Tate Gallery in the UK - here you can
select the type of content that you want returned, either static pages or
dynamic. You can see the idea at
http://www.tate.org.uk/search/default.jsp?terms=sunset%20oil&action=new

This is custom built (using java/Oracle), supports stemming, boolean
operators, exact phrase matching, relevancy and matched term highlighting.

You can switch on/off the types of documents that you are not interested
in. Using this analogy, a search facility that could offer you results
from i) the docs and/or ii) the postgres sites static pages would be very
useful.

John Sidney-Woollett

Dave Cramer said:

Marc,

No it doesn't spider, it is a specialized tool for searching documents.

I'm curious, what value is there to being able to count the number of
url's ?

It does do things like query all documents where CREATE AND TABLE are n
words apart, just as fast, I would think these are more valuable to
document searching?

I think the challenge here is what do we want to search. I am betting
that folks use this page as they would man? ie. what is the command for
create trigger?

As I said my offer stands to help out, but I think if the goal is to
search the entire website, then this particular tool is not useful.

At this point I am working on indexing the sgml directly as it has less
cruft in it. For instance all the links that appear in every summary are
just noise.

Dave

On Wed, 2003-12-31 at 00:44, Marc G. Fournier wrote:

On Wed, 31 Dec 2003, Dave Cramer wrote:

I can modify mine to be client server if you want?

It is a java app, so we need to be able to run jdk1.3 at least?

jdk1.4 is available on the VMs ... does your spider? for instance, you
mention that you have the docs indexed right now, but we are currently
indexing:

Server http://archives.postgresql.org/
Server http://advocacy.postgresql.org/
Server http://developer.postgresql.org/
Server http://gborg.postgresql.org/
Server http://pgadmin.postgresql.org/
Server http://techdocs.postgresql.org/
Server http://www.postgresql.org/

will it be able to handle:

186_archives=# select count(*) from url;
count
--------
393551
(1 row)

as fast as you are finding with just the docs?

----
Marc G. Fournier Hub.Org Networking Services
(http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ:
7615664

--
Dave Cramer
519 939 0336
ICQ # 1467551

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

#18

John Sidney-Woollett

johnsw@wardbrook.com

over 22 years ago

In reply to: Ericson Smith (#17)

Re: website doc search is extremely SLOW

Wow, you're right - I could have probably saved myself a load of time! :)

Although you do learn a lot reinventing the wheel... ...or at least you
hit the same issues and insights others did before...

John

Ericson Smith said:

Show quoted text

You should probably take a look at the Swish project. For a certain
project, we tried Tsearch2/Tsearch, even (gasp) MySQL fulltext search,
but with over 600,000 documents to index, both took too long to conduct
searches, especially as the database was swapped in and out of memory
based on search segment. MySQL full text was the most unusable.

Swish uses its own internal DB format, and comes with a simple spider as
well. You can make it search by category, date and other nifty criteria
also.
http://swish-e.org

You can take a look over at the project and do some searches to see what
I mean:
http://cbd-net.com
Warmest regards,
Ericson Smith
Tracking Specialist/DBA
+-----------------------+----------------------------+
| http://www.did-it.com | "When I'm paid, I always   |
| eric@did-it.com       | follow the job through.    |
| 516-255-0500          | You know that." -Angel Eyes|
+-----------------------+----------------------------+
John Sidney-Woollett wrote:

I think that Oleg's new search offering looks really good and fast. (I
can't wait till I have some task that needs tsearch!).

I agree with Dave that searching the docs is more important for me than
the sites - but it would be really nice to have both, in one tool.

I built something similar for the Tate Gallery in the UK - here you can
select the type of content that you want returned, either static pages or
dynamic. You can see the idea at
http://www.tate.org.uk/search/default.jsp?terms=sunset%20oil&action=new

This is custom built (using java/Oracle), supports stemming, boolean
operators, exact phrase matching, relevancy and matched term
highlighting.

You can switch on/off the types of documents that you are not interested
in. Using this analogy, a search facility that could offer you results
from i) the docs and/or ii) the postgres sites static pages would be very
useful.

John Sidney-Woollett

Dave Cramer said:

Marc,

No it doesn't spider, it is a specialized tool for searching documents.

I'm curious, what value is there to being able to count the number of
url's ?

It does do things like query all documents where CREATE AND TABLE are n
words apart, just as fast, I would think these are more valuable to
document searching?

I think the challenge here is what do we want to search. I am betting
that folks use this page as they would man? ie. what is the command for
create trigger?

As I said my offer stands to help out, but I think if the goal is to
search the entire website, then this particular tool is not useful.

At this point I am working on indexing the sgml directly as it has less
cruft in it. For instance all the links that appear in every summary are
just noise.

Dave

On Wed, 2003-12-31 at 00:44, Marc G. Fournier wrote:

On Wed, 31 Dec 2003, Dave Cramer wrote:

I can modify mine to be client server if you want?

It is a java app, so we need to be able to run jdk1.3 at least?

jdk1.4 is available on the VMs ... does your spider? for instance, you
mention that you have the docs indexed right now, but we are currently
indexing:

Server http://archives.postgresql.org/
Server http://advocacy.postgresql.org/
Server http://developer.postgresql.org/
Server http://gborg.postgresql.org/
Server http://pgadmin.postgresql.org/
Server http://techdocs.postgresql.org/
Server http://www.postgresql.org/

will it be able to handle:

186_archives=# select count(*) from url;
count
--------
393551
(1 row)

as fast as you are finding with just the docs?

----
Marc G. Fournier Hub.Org Networking Services
(http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ:
7615664

--
Dave Cramer
519 939 0336
ICQ # 1467551

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if
your
joining column's datatypes do not match

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

#19

http://jakarta.apache.org/lucene/docs/queryparsersyntax.html

pg@fastcrypt.com

over 22 years ago

In reply to: John Sidney-Woollett (#18)

Re: website doc search is extremely SLOW

The search engine I am using is lucene
http://jakarta.apache.org/lucene/docs/index.html

it too uses it's own internal database format, optimized for searching,
it is quite flexible, and allow searching on arbitrary fields as well.

The section on querying explains more

It is even possible to index text data inside a database.

Dave
On Wed, 2003-12-31 at 08:44, John Sidney-Woollett wrote:

Wow, you're right - I could have probably saved myself a load of time! :)

Although you do learn a lot reinventing the wheel... ...or at least you
hit the same issues and insights others did before...

John

Ericson Smith said:
You should probably take a look at the Swish project. For a certain
project, we tried Tsearch2/Tsearch, even (gasp) MySQL fulltext search,
but with over 600,000 documents to index, both took too long to conduct
searches, especially as the database was swapped in and out of memory
based on search segment. MySQL full text was the most unusable.

Swish uses its own internal DB format, and comes with a simple spider as
well. You can make it search by category, date and other nifty criteria
also.
http://swish-e.org

You can take a look over at the project and do some searches to see what
I mean:
http://cbd-net.com
Warmest regards,
Ericson Smith
Tracking Specialist/DBA
+-----------------------+----------------------------+
| http://www.did-it.com | "When I'm paid, I always   |
| eric@did-it.com       | follow the job through.    |
| 516-255-0500          | You know that." -Angel Eyes|
+-----------------------+----------------------------+
John Sidney-Woollett wrote:

I think that Oleg's new search offering looks really good and fast. (I
can't wait till I have some task that needs tsearch!).

I agree with Dave that searching the docs is more important for me than
the sites - but it would be really nice to have both, in one tool.

I built something similar for the Tate Gallery in the UK - here you can
select the type of content that you want returned, either static pages or
dynamic. You can see the idea at
http://www.tate.org.uk/search/default.jsp?terms=sunset%20oil&action=new

This is custom built (using java/Oracle), supports stemming, boolean
operators, exact phrase matching, relevancy and matched term
highlighting.

You can switch on/off the types of documents that you are not interested
in. Using this analogy, a search facility that could offer you results
from i) the docs and/or ii) the postgres sites static pages would be very
useful.

John Sidney-Woollett

Dave Cramer said:

Marc,

No it doesn't spider, it is a specialized tool for searching documents.

I'm curious, what value is there to being able to count the number of
url's ?

It does do things like query all documents where CREATE AND TABLE are n
words apart, just as fast, I would think these are more valuable to
document searching?

I think the challenge here is what do we want to search. I am betting
that folks use this page as they would man? ie. what is the command for
create trigger?

As I said my offer stands to help out, but I think if the goal is to
search the entire website, then this particular tool is not useful.

At this point I am working on indexing the sgml directly as it has less
cruft in it. For instance all the links that appear in every summary are
just noise.

Dave

On Wed, 2003-12-31 at 00:44, Marc G. Fournier wrote:

On Wed, 31 Dec 2003, Dave Cramer wrote:

I can modify mine to be client server if you want?

It is a java app, so we need to be able to run jdk1.3 at least?

jdk1.4 is available on the VMs ... does your spider? for instance, you
mention that you have the docs indexed right now, but we are currently
indexing:

Server http://archives.postgresql.org/
Server http://advocacy.postgresql.org/
Server http://developer.postgresql.org/
Server http://gborg.postgresql.org/
Server http://pgadmin.postgresql.org/
Server http://techdocs.postgresql.org/
Server http://www.postgresql.org/

will it be able to handle:

186_archives=# select count(*) from url;
count
--------
393551
(1 row)

as fast as you are finding with just the docs?

----
Marc G. Fournier Hub.Org Networking Services
(http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ:
7615664

--
Dave Cramer
519 939 0336
ICQ # 1467551

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if
your
joining column's datatypes do not match

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

--
Dave Cramer
519 939 0336
ICQ # 1467551

#20

pg@fastcrypt.com

over 22 years ago

In reply to: John Sidney-Woollett (#18)

Re: website doc search is extremely SLOW

Well it appears there are quite a few solutions to use so the next
question should be what are we trying to accomplish here?

One thing that I think is that the documentation search should be
limited to the documentation.

Who is in a position to make the decision of which solution to use?

Dave
On Wed, 2003-12-31 at 08:44, John Sidney-Woollett wrote:

Wow, you're right - I could have probably saved myself a load of time! :)

Although you do learn a lot reinventing the wheel... ...or at least you
hit the same issues and insights others did before...

John

Ericson Smith said:
You should probably take a look at the Swish project. For a certain
project, we tried Tsearch2/Tsearch, even (gasp) MySQL fulltext search,
but with over 600,000 documents to index, both took too long to conduct
searches, especially as the database was swapped in and out of memory
based on search segment. MySQL full text was the most unusable.

Swish uses its own internal DB format, and comes with a simple spider as
well. You can make it search by category, date and other nifty criteria
also.
http://swish-e.org

You can take a look over at the project and do some searches to see what
I mean:
http://cbd-net.com
Warmest regards,
Ericson Smith
Tracking Specialist/DBA
+-----------------------+----------------------------+
| http://www.did-it.com | "When I'm paid, I always   |
| eric@did-it.com       | follow the job through.    |
| 516-255-0500          | You know that." -Angel Eyes|
+-----------------------+----------------------------+
John Sidney-Woollett wrote:

I think that Oleg's new search offering looks really good and fast. (I
can't wait till I have some task that needs tsearch!).

I agree with Dave that searching the docs is more important for me than
the sites - but it would be really nice to have both, in one tool.

I built something similar for the Tate Gallery in the UK - here you can
select the type of content that you want returned, either static pages or
dynamic. You can see the idea at
http://www.tate.org.uk/search/default.jsp?terms=sunset%20oil&action=new

This is custom built (using java/Oracle), supports stemming, boolean
operators, exact phrase matching, relevancy and matched term
highlighting.

You can switch on/off the types of documents that you are not interested
in. Using this analogy, a search facility that could offer you results
from i) the docs and/or ii) the postgres sites static pages would be very
useful.

John Sidney-Woollett

Dave Cramer said:

Marc,

No it doesn't spider, it is a specialized tool for searching documents.

I'm curious, what value is there to being able to count the number of
url's ?

It does do things like query all documents where CREATE AND TABLE are n
words apart, just as fast, I would think these are more valuable to
document searching?

I think the challenge here is what do we want to search. I am betting
that folks use this page as they would man? ie. what is the command for
create trigger?

As I said my offer stands to help out, but I think if the goal is to
search the entire website, then this particular tool is not useful.

At this point I am working on indexing the sgml directly as it has less
cruft in it. For instance all the links that appear in every summary are
just noise.

Dave

On Wed, 2003-12-31 at 00:44, Marc G. Fournier wrote:

On Wed, 31 Dec 2003, Dave Cramer wrote:

I can modify mine to be client server if you want?

It is a java app, so we need to be able to run jdk1.3 at least?

jdk1.4 is available on the VMs ... does your spider? for instance, you
mention that you have the docs indexed right now, but we are currently
indexing:

Server http://archives.postgresql.org/
Server http://advocacy.postgresql.org/
Server http://developer.postgresql.org/
Server http://gborg.postgresql.org/
Server http://pgadmin.postgresql.org/
Server http://techdocs.postgresql.org/
Server http://www.postgresql.org/

will it be able to handle:

186_archives=# select count(*) from url;
count
--------
393551
(1 row)

as fast as you are finding with just the docs?

----
Marc G. Fournier Hub.Org Networking Services
(http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ:
7615664

--
Dave Cramer
519 939 0336
ICQ # 1467551

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if
your
joining column's datatypes do not match

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

--
Dave Cramer
519 939 0336
ICQ # 1467551

#21

George Essig

george_essig@yahoo.com

over 22 years ago

In reply to: Dave Cramer (#20)

#22

Eric Ridge

ebr@tcdi.com

over 22 years ago

In reply to: Arjen van der Meijden (#14)

#23

D. Dante Lorenso

dante@lorenso.com

over 22 years ago

In reply to: Dave Cramer (#20)

#24

scrappy@hub.org

over 22 years ago

In reply to: Dave Cramer (#15)

#25

scrappy@hub.org

over 22 years ago

In reply to: Joshua D. Drake (#13)

#26

scrappy@hub.org

over 22 years ago

In reply to: Dave Cramer (#20)

#27

Bruce Momjian

bruce@momjian.us

over 22 years ago

In reply to: The Hermit Hacker (#25)

#28

scrappy@hub.org

over 22 years ago

In reply to: Bruce Momjian (#27)

#29

Bruce Momjian

bruce@momjian.us

over 22 years ago

In reply to: The Hermit Hacker (#28)

#30

pg@fastcrypt.com

over 22 years ago

In reply to: Bruce Momjian (#27)

#31

scrappy@hub.org

over 22 years ago

In reply to: Bruce Momjian (#29)

#32

scrappy@hub.org

over 22 years ago

In reply to: Dave Cramer (#30)

#33

pg@fastcrypt.com

over 22 years ago

In reply to: The Hermit Hacker (#31)

#34

scrappy@hub.org

over 22 years ago

In reply to: Dave Cramer (#33)

#35

Bruce Momjian

bruce@momjian.us

over 22 years ago

In reply to: The Hermit Hacker (#31)

#36

scrappy@hub.org

over 22 years ago

In reply to: Bruce Momjian (#35)

#37

Mark Kirkwood

mark.kirkwood@catalyst.net.nz

over 22 years ago

In reply to: The Hermit Hacker (#36)

#38

Arjen van der Meijden

acmmailing@vulcanus.its.tudelft.nl

over 22 years ago

In reply to: The Hermit Hacker (#36)

#39

scrappy@hub.org

over 22 years ago

In reply to: Arjen van der Meijden (#38)

#40

tgl@sss.pgh.pa.us

over 22 years ago

In reply to: Mark Kirkwood (#37)

#41

scrappy@hub.org

over 22 years ago

In reply to: Tom Lane (#40)

#42

tgl@sss.pgh.pa.us

over 22 years ago

In reply to: The Hermit Hacker (#41)

#43

scrappy@hub.org

over 22 years ago

In reply to: Bruce Momjian (#35)

#44

scrappy@hub.org

over 22 years ago

In reply to: Tom Lane (#42)

#45

Arjen van der Meijden

acmmailing@vulcanus.its.tudelft.nl

over 22 years ago

In reply to: The Hermit Hacker (#39)

#46

tgl@sss.pgh.pa.us

over 22 years ago

In reply to: The Hermit Hacker (#44)

#47

scrappy@hub.org

over 22 years ago

In reply to: Arjen van der Meijden (#45)

#48

scrappy@hub.org

over 22 years ago

In reply to: Tom Lane (#46)

#49

tgl@sss.pgh.pa.us

over 22 years ago

In reply to: The Hermit Hacker (#31)

#50

scrappy@hub.org

over 22 years ago

In reply to: Tom Lane (#49)

#51

tgl@sss.pgh.pa.us

over 22 years ago

In reply to: The Hermit Hacker (#48)

#52

tgl@sss.pgh.pa.us

over 22 years ago

In reply to: The Hermit Hacker (#50)

#53

scrappy@hub.org

over 22 years ago

In reply to: Tom Lane (#52)

#54

Mark Kirkwood

mark.kirkwood@catalyst.net.nz

over 22 years ago

In reply to: Tom Lane (#51)

#55

tgl@sss.pgh.pa.us

over 22 years ago

In reply to: The Hermit Hacker (#53)

#56

scrappy@hub.org

over 22 years ago

In reply to: Tom Lane (#55)

#57

ezra epstein

ee_newsgroup_post@prajnait.com

over 22 years ago

In reply to: D. Dante Lorenso (#1)

#58

scrappy@hub.org

over 22 years ago

In reply to: Tom Lane (#55)

#59

tgl@sss.pgh.pa.us

over 22 years ago

In reply to: The Hermit Hacker (#58)

#60

scrappy@hub.org

over 22 years ago

In reply to: Tom Lane (#59)

#61

tgl@sss.pgh.pa.us

over 22 years ago

In reply to: The Hermit Hacker (#60)

#62

scrappy@hub.org

over 22 years ago

In reply to: Tom Lane (#61)

#63

Dave Page

dpage@pgadmin.org

over 22 years ago

In reply to: ezra epstein (#57)

#64

jd@commandprompt.com

over 22 years ago

In reply to: The Hermit Hacker (#31)

#65

oleg@sai.msu.su

over 22 years ago

In reply to: The Hermit Hacker (#25)

#66

oleg@sai.msu.su

over 22 years ago

In reply to: The Hermit Hacker (#36)

#67

pg@fastcrypt.com

over 22 years ago

In reply to: Oleg Bartunov (#65)

#68

oleg@sai.msu.su

over 22 years ago

In reply to: Dave Cramer (#67)

#69

jd@commandprompt.com

over 22 years ago

In reply to: Dave Cramer (#67)

#70

oleg@sai.msu.su

over 22 years ago

In reply to: Joshua D. Drake (#69)

#71

jd@commandprompt.com

over 22 years ago

In reply to: Oleg Bartunov (#70)

#72

scrappy@hub.org

over 22 years ago

In reply to: Oleg Bartunov (#70)

#73

scrappy@hub.org

over 22 years ago

In reply to: Oleg Bartunov (#66)

#74

scrappy@hub.org

over 22 years ago

In reply to: Oleg Bartunov (#65)

#75

oleg@sai.msu.su

over 22 years ago

In reply to: The Hermit Hacker (#74)

#76

pg@fastcrypt.com

over 22 years ago

In reply to: Oleg Bartunov (#75)

#77

scrappy@hub.org

over 22 years ago

In reply to: Oleg Bartunov (#75)

#78

pg@fastcrypt.com

over 22 years ago

In reply to: The Hermit Hacker (#77)

#79

Greg Sabino Mullane

greg@turnstep.com

over 22 years ago

In reply to: The Hermit Hacker (#25)

#80

scrappy@hub.org

over 22 years ago

In reply to: ezra epstein (#57)

#81

scrappy@hub.org

over 22 years ago

In reply to: Greg Sabino Mullane (#79)

#82

Greg Sabino Mullane

greg@turnstep.com

over 22 years ago

In reply to: The Hermit Hacker (#81)

#83

pg@fastcrypt.com

over 22 years ago

In reply to: The Hermit Hacker (#80)

#84

Mark Kirkwood

mark.kirkwood@catalyst.net.nz

over 22 years ago

In reply to: Dave Cramer (#83)

#85

tgl@sss.pgh.pa.us

over 22 years ago

In reply to: Mark Kirkwood (#84)

#86

oleg@sai.msu.su

over 22 years ago

In reply to: ezra epstein (#57)

#87

Jeff Davis

pgsql@j-davis.com

over 22 years ago

In reply to: The Hermit Hacker (#73)

#88

oleg@sai.msu.su

over 22 years ago

In reply to: D. Dante Lorenso (#23)

#89