website doc search is extremely SLOW

Started by D. Dante Lorensoover 22 years ago89 messagesgeneral
Jump to latest
#1D. Dante Lorenso
dante@lorenso.com

Trying to use the 'search' in the docs section of PostgreSQL.org
is extremely SLOW. Considering this is a website for a database
and databases are supposed to be good for indexing content, I'd
expect a much faster performance.

I submitted my search over two minutes ago. I just finished this
email to the list. The results have still not come back. I only
searched for:

SECURITY INVOKER

Perhaps this should be worked on?

Dante

#2Oleg Bartunov
oleg@sai.msu.su
In reply to: D. Dante Lorenso (#1)
Re: website doc search is extremely SLOW

On Mon, 29 Dec 2003, D. Dante Lorenso wrote:

Trying to use the 'search' in the docs section of PostgreSQL.org
is extremely SLOW. Considering this is a website for a database
and databases are supposed to be good for indexing content, I'd
expect a much faster performance.

I submitted my search over two minutes ago. I just finished this
email to the list. The results have still not come back. I only
searched for:

SECURITY INVOKER

Perhaps this should be worked on?

Your query takes 0.01 sec to complete (134 documents found) on my development
server I hope to present to the community soon after New Year. We've
crawled 27 postgresql related sites. Screenshot is available
http://www.sai.msu.su/~megera/postgres/pgsql.ru.gif

Dante

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

#3The Hermit Hacker
scrappy@hub.org
In reply to: D. Dante Lorenso (#1)
Re: website doc search is extremely SLOW

On Mon, 29 Dec 2003, D. Dante Lorenso wrote:

Trying to use the 'search' in the docs section of PostgreSQL.org
is extremely SLOW. Considering this is a website for a database
and databases are supposed to be good for indexing content, I'd
expect a much faster performance.

What is the full URL for the page you are looking at? Just the 'search
link' at the top of the page?

Perhaps this should be worked on?

Looking into it right now ...

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664

#4The Hermit Hacker
scrappy@hub.org
In reply to: The Hermit Hacker (#3)
Re: website doc search is extremely SLOW

On Tue, 30 Dec 2003, Marc G. Fournier wrote:

On Mon, 29 Dec 2003, D. Dante Lorenso wrote:

Trying to use the 'search' in the docs section of PostgreSQL.org
is extremely SLOW. Considering this is a website for a database
and databases are supposed to be good for indexing content, I'd
expect a much faster performance.

What is the full URL for the page you are looking at? Just the 'search
link' at the top of the page?

Perhaps this should be worked on?

Looking into it right now ...

just ran it from archives.postgresql.org (security invoker) and it comes
back in 10 seconds ... I think it might be a problem with doing a search
while indexing is happening ... am looking at that ...

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664

#5Joshua D. Drake
jd@commandprompt.com
In reply to: The Hermit Hacker (#3)
Re: website doc search is extremely SLOW

When you got to docs and then click static, it has the ability to
search. It is slowwwwwwwww....

Sincerely,

Joshua D. Drake

On Tue, 2003-12-30 at 19:05, Marc G. Fournier wrote:

On Mon, 29 Dec 2003, D. Dante Lorenso wrote:

Trying to use the 'search' in the docs section of PostgreSQL.org
is extremely SLOW. Considering this is a website for a database
and databases are supposed to be good for indexing content, I'd
expect a much faster performance.

What is the full URL for the page you are looking at? Just the 'search
link' at the top of the page?

Perhaps this should be worked on?

Looking into it right now ...

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

-- 
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL
#6D. Dante Lorenso
dante@lorenso.com
In reply to: The Hermit Hacker (#3)
Re: website doc search is extremely SLOW

Marc G. Fournier wrote:

On Mon, 29 Dec 2003, D. Dante Lorenso wrote:

Trying to use the 'search' in the docs section of PostgreSQL.org
is extremely SLOW. Considering this is a website for a database
and databases are supposed to be good for indexing content, I'd
expect a much faster performance.

What is the full URL for the page you are looking at? Just the 'search
link' at the top of the page?

Perhaps this should be worked on?

Looking into it right now ...

http://www.postgresql.org/ *click Docs on top of page*
http://www.postgresql.org/docs/ * click PostgreSQL static
documentation *

Search this document set: [ SECURITY INVOKER ] Search!

http://www.postgresql.org/search.cgi?ul=http://www.postgresql.org/docs/7.4/static/&q=SECURITY+INVOKER

I loaded that URL on IE and I wait like 2 minutes or more for a response.
then, it usually returns with 1 result. I click the Search! button again
to refresh and it came back a little faster with 0 results?

Searched again from the top and it's a little faster now:

* click search *

date

Wed Dec 31 22:52:01 CST 2003

* results come back *

date

Wed Dec 31 22:52:27 CST 2003

Still one result:

PostgreSQL 7.4 Documentation (SQL Key Words)
<http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html&gt;
[*0.087%*]
http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html
Size: 65401 bytes, modified: Tue, 25 Nov 2003, 15:02:33 AST

However, the page that I SHOULD have found was this one:

http://www.postgresql.org/docs/current/static/sql-createfunction.html

That page has SECURITY INVOKER in a whole section:

[EXTERNAL] SECURITY INVOKER
[EXTERNAL] SECURITY DEFINER

SECURITY INVOKER indicates that the function is to be executed with
the privileges of the user that calls it. That is the default.
SECURITY DEFINER specifies that the function is to be executed with
the privileges of the user that created it.

Dante

----------
D. Dante Lorenso
dante@lorenso.com

#7Dave Cramer
pg@fastcrypt.com
In reply to: D. Dante Lorenso (#6)
Re: website doc search is extremely SLOW

search for create index took 59 seconds ?

I've got a fairly (< 1 second for the same search) fast search engine on
the docs at

http://postgresintl.com/search?query=create index

if that link doesn't work, try

postgres.fastcrypt.com/search?query=create index

for now you will have to type it, I'm working on indexing it then making
it pretty

Dave

On Tue, 2003-12-30 at 22:39, D. Dante Lorenso wrote:

Marc G. Fournier wrote:

On Mon, 29 Dec 2003, D. Dante Lorenso wrote:

Trying to use the 'search' in the docs section of PostgreSQL.org
is extremely SLOW. Considering this is a website for a database
and databases are supposed to be good for indexing content, I'd
expect a much faster performance.

What is the full URL for the page you are looking at? Just the 'search
link' at the top of the page?

Perhaps this should be worked on?

Looking into it right now ...

http://www.postgresql.org/ *click Docs on top of page*
http://www.postgresql.org/docs/ * click PostgreSQL static
documentation *

Search this document set: [ SECURITY INVOKER ] Search!

http://www.postgresql.org/search.cgi?ul=http://www.postgresql.org/docs/7.4/static/&amp;q=SECURITY+INVOKER

I loaded that URL on IE and I wait like 2 minutes or more for a response.
then, it usually returns with 1 result. I click the Search! button again
to refresh and it came back a little faster with 0 results?

Searched again from the top and it's a little faster now:

* click search *

date

Wed Dec 31 22:52:01 CST 2003

* results come back *

date

Wed Dec 31 22:52:27 CST 2003

Still one result:

PostgreSQL 7.4 Documentation (SQL Key Words)
<http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html&gt;
[*0.087%*]
http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html
Size: 65401 bytes, modified: Tue, 25 Nov 2003, 15:02:33 AST

However, the page that I SHOULD have found was this one:

http://www.postgresql.org/docs/current/static/sql-createfunction.html

That page has SECURITY INVOKER in a whole section:

[EXTERNAL] SECURITY INVOKER
[EXTERNAL] SECURITY DEFINER

SECURITY INVOKER indicates that the function is to be executed with
the privileges of the user that calls it. That is the default.
SECURITY DEFINER specifies that the function is to be executed with
the privileges of the user that created it.

Dante

----------
D. Dante Lorenso
dante@lorenso.com

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

--
Dave Cramer
519 939 0336
ICQ # 1467551

#8The Hermit Hacker
scrappy@hub.org
In reply to: Dave Cramer (#7)
Re: website doc search is extremely SLOW

does anyone know anything better then mnogosearch, that works with
PostgreSQL, for doing indexing? the database server is a Dual Xeon 2.4G,
4G of RAM, and a load avg right now of a lowly 1.5 ... the file system is
3x72G drive in a RAID5 configuration, and the database server is 7.4 ...
the mnogosearch folk use mysql for their development, so its possible
there is something they are doing that is slowing this process down, to
compensate for a fault in mysql, but this is ridiculous ...

note that I have it setup with what the mnogosearch folk lists as being
'the fastest schema for large indexes' or 'crc-multi' ...

right now, we're running only 373k docs:

isvr5# indexer -S

Database statistics

Status Expired Total
-----------------------------
415 0 311 Unsupported Media Type
302 0 1171 Moved Temporarily
502 0 43 Bad Gateway
414 0 3 Request-URI Too Long
301 0 307 Moved Permanently
404 0 1960 Not found
410 0 1 Gone
401 0 51 Unauthorized
304 0 16591 Not Modified
200 0 373015 OK
504 0 48 Gateway Timeout
400 0 3 Bad Request
0 2 47 Not indexed yet
-----------------------------
Total 2 393551

and a vacuum analyze runs nightly ...

anyone with suggestions/ideas? has to be something client/server, like
mnogosearch, as we're dealing with multiple servers searching against the
same database ... so I don't *think* that ht/Dig is a solution, but may be
wrong there ...

On Wed, 30 Dec 2003, Dave Cramer wrote:

search for create index took 59 seconds ?

I've got a fairly (< 1 second for the same search) fast search engine on
the docs at

http://postgresintl.com/search?query=create index

if that link doesn't work, try

postgres.fastcrypt.com/search?query=create index

for now you will have to type it, I'm working on indexing it then making
it pretty

Dave

On Tue, 2003-12-30 at 22:39, D. Dante Lorenso wrote:

Marc G. Fournier wrote:

On Mon, 29 Dec 2003, D. Dante Lorenso wrote:

Trying to use the 'search' in the docs section of PostgreSQL.org
is extremely SLOW. Considering this is a website for a database
and databases are supposed to be good for indexing content, I'd
expect a much faster performance.

What is the full URL for the page you are looking at? Just the 'search
link' at the top of the page?

Perhaps this should be worked on?

Looking into it right now ...

http://www.postgresql.org/ *click Docs on top of page*
http://www.postgresql.org/docs/ * click PostgreSQL static
documentation *

Search this document set: [ SECURITY INVOKER ] Search!

http://www.postgresql.org/search.cgi?ul=http://www.postgresql.org/docs/7.4/static/&amp;q=SECURITY+INVOKER

I loaded that URL on IE and I wait like 2 minutes or more for a response.
then, it usually returns with 1 result. I click the Search! button again
to refresh and it came back a little faster with 0 results?

Searched again from the top and it's a little faster now:

* click search *

date

Wed Dec 31 22:52:01 CST 2003

* results come back *

date

Wed Dec 31 22:52:27 CST 2003

Still one result:

PostgreSQL 7.4 Documentation (SQL Key Words)
<http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html&gt;
[*0.087%*]
http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html
Size: 65401 bytes, modified: Tue, 25 Nov 2003, 15:02:33 AST

However, the page that I SHOULD have found was this one:

http://www.postgresql.org/docs/current/static/sql-createfunction.html

That page has SECURITY INVOKER in a whole section:

[EXTERNAL] SECURITY INVOKER
[EXTERNAL] SECURITY DEFINER

SECURITY INVOKER indicates that the function is to be executed with
the privileges of the user that calls it. That is the default.
SECURITY DEFINER specifies that the function is to be executed with
the privileges of the user that created it.

Dante

----------
D. Dante Lorenso
dante@lorenso.com

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

--
Dave Cramer
519 939 0336
ICQ # 1467551

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664

#9Dave Cramer
pg@fastcrypt.com
In reply to: The Hermit Hacker (#8)
Re: website doc search is extremely SLOW

Why are their multiple servers hitting the same db

what servers are searching through the db?

Dave
On Wed, 2003-12-31 at 00:04, Marc G. Fournier wrote:

does anyone know anything better then mnogosearch, that works with
PostgreSQL, for doing indexing? the database server is a Dual Xeon 2.4G,
4G of RAM, and a load avg right now of a lowly 1.5 ... the file system is
3x72G drive in a RAID5 configuration, and the database server is 7.4 ...
the mnogosearch folk use mysql for their development, so its possible
there is something they are doing that is slowing this process down, to
compensate for a fault in mysql, but this is ridiculous ...

note that I have it setup with what the mnogosearch folk lists as being
'the fastest schema for large indexes' or 'crc-multi' ...

right now, we're running only 373k docs:

isvr5# indexer -S

Database statistics

Status Expired Total
-----------------------------
415 0 311 Unsupported Media Type
302 0 1171 Moved Temporarily
502 0 43 Bad Gateway
414 0 3 Request-URI Too Long
301 0 307 Moved Permanently
404 0 1960 Not found
410 0 1 Gone
401 0 51 Unauthorized
304 0 16591 Not Modified
200 0 373015 OK
504 0 48 Gateway Timeout
400 0 3 Bad Request
0 2 47 Not indexed yet
-----------------------------
Total 2 393551

and a vacuum analyze runs nightly ...

anyone with suggestions/ideas? has to be something client/server, like
mnogosearch, as we're dealing with multiple servers searching against the
same database ... so I don't *think* that ht/Dig is a solution, but may be
wrong there ...

On Wed, 30 Dec 2003, Dave Cramer wrote:

search for create index took 59 seconds ?

I've got a fairly (< 1 second for the same search) fast search engine on
the docs at

http://postgresintl.com/search?query=create index

if that link doesn't work, try

postgres.fastcrypt.com/search?query=create index

for now you will have to type it, I'm working on indexing it then making
it pretty

Dave

On Tue, 2003-12-30 at 22:39, D. Dante Lorenso wrote:

Marc G. Fournier wrote:

On Mon, 29 Dec 2003, D. Dante Lorenso wrote:

Trying to use the 'search' in the docs section of PostgreSQL.org
is extremely SLOW. Considering this is a website for a database
and databases are supposed to be good for indexing content, I'd
expect a much faster performance.

What is the full URL for the page you are looking at? Just the 'search
link' at the top of the page?

Perhaps this should be worked on?

Looking into it right now ...

http://www.postgresql.org/ *click Docs on top of page*
http://www.postgresql.org/docs/ * click PostgreSQL static
documentation *

Search this document set: [ SECURITY INVOKER ] Search!

http://www.postgresql.org/search.cgi?ul=http://www.postgresql.org/docs/7.4/static/&amp;q=SECURITY+INVOKER

I loaded that URL on IE and I wait like 2 minutes or more for a response.
then, it usually returns with 1 result. I click the Search! button again
to refresh and it came back a little faster with 0 results?

Searched again from the top and it's a little faster now:

* click search *

date

Wed Dec 31 22:52:01 CST 2003

* results come back *

date

Wed Dec 31 22:52:27 CST 2003

Still one result:

PostgreSQL 7.4 Documentation (SQL Key Words)
<http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html&gt;
[*0.087%*]
http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html
Size: 65401 bytes, modified: Tue, 25 Nov 2003, 15:02:33 AST

However, the page that I SHOULD have found was this one:

http://www.postgresql.org/docs/current/static/sql-createfunction.html

That page has SECURITY INVOKER in a whole section:

[EXTERNAL] SECURITY INVOKER
[EXTERNAL] SECURITY DEFINER

SECURITY INVOKER indicates that the function is to be executed with
the privileges of the user that calls it. That is the default.
SECURITY DEFINER specifies that the function is to be executed with
the privileges of the user that created it.

Dante

----------
D. Dante Lorenso
dante@lorenso.com

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

--
Dave Cramer
519 939 0336
ICQ # 1467551

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664

--
Dave Cramer
519 939 0336
ICQ # 1467551

#10The Hermit Hacker
scrappy@hub.org
In reply to: Dave Cramer (#9)
Re: website doc search is extremely SLOW

On Wed, 31 Dec 2003, Dave Cramer wrote:

Why are their multiple servers hitting the same db

what servers are searching through the db?

www.postgresql.org and archives.postgresql.org both hit the same DB ...
the point is more that whatever alternative that someone can suggest, it
has to be able to be accessed centrally from several different machines
... when I just tried a search, I was the only one hitting the database,
and the search was dreadful, so it isn't a problem with multiple
connections :(

Just as an FYI, the database server has sufficient RAM on her, so it isn't
a swapping issue ... swap usuage right now, after 77 days uptime:

Device 1K-blocks Used Avail Capacity Type
/dev/da0s1b 8388480 17556 8370924 0% Interleaved

Dave
On Wed, 2003-12-31 at 00:04, Marc G. Fournier wrote:

does anyone know anything better then mnogosearch, that works with
PostgreSQL, for doing indexing? the database server is a Dual Xeon 2.4G,
4G of RAM, and a load avg right now of a lowly 1.5 ... the file system is
3x72G drive in a RAID5 configuration, and the database server is 7.4 ...
the mnogosearch folk use mysql for their development, so its possible
there is something they are doing that is slowing this process down, to
compensate for a fault in mysql, but this is ridiculous ...

note that I have it setup with what the mnogosearch folk lists as being
'the fastest schema for large indexes' or 'crc-multi' ...

right now, we're running only 373k docs:

isvr5# indexer -S

Database statistics

Status Expired Total
-----------------------------
415 0 311 Unsupported Media Type
302 0 1171 Moved Temporarily
502 0 43 Bad Gateway
414 0 3 Request-URI Too Long
301 0 307 Moved Permanently
404 0 1960 Not found
410 0 1 Gone
401 0 51 Unauthorized
304 0 16591 Not Modified
200 0 373015 OK
504 0 48 Gateway Timeout
400 0 3 Bad Request
0 2 47 Not indexed yet
-----------------------------
Total 2 393551

and a vacuum analyze runs nightly ...

anyone with suggestions/ideas? has to be something client/server, like
mnogosearch, as we're dealing with multiple servers searching against the
same database ... so I don't *think* that ht/Dig is a solution, but may be
wrong there ...

On Wed, 30 Dec 2003, Dave Cramer wrote:

search for create index took 59 seconds ?

I've got a fairly (< 1 second for the same search) fast search engine on
the docs at

http://postgresintl.com/search?query=create index

if that link doesn't work, try

postgres.fastcrypt.com/search?query=create index

for now you will have to type it, I'm working on indexing it then making
it pretty

Dave

On Tue, 2003-12-30 at 22:39, D. Dante Lorenso wrote:

Marc G. Fournier wrote:

On Mon, 29 Dec 2003, D. Dante Lorenso wrote:

Trying to use the 'search' in the docs section of PostgreSQL.org
is extremely SLOW. Considering this is a website for a database
and databases are supposed to be good for indexing content, I'd
expect a much faster performance.

What is the full URL for the page you are looking at? Just the 'search
link' at the top of the page?

Perhaps this should be worked on?

Looking into it right now ...

http://www.postgresql.org/ *click Docs on top of page*
http://www.postgresql.org/docs/ * click PostgreSQL static
documentation *

Search this document set: [ SECURITY INVOKER ] Search!

http://www.postgresql.org/search.cgi?ul=http://www.postgresql.org/docs/7.4/static/&amp;q=SECURITY+INVOKER

I loaded that URL on IE and I wait like 2 minutes or more for a response.
then, it usually returns with 1 result. I click the Search! button again
to refresh and it came back a little faster with 0 results?

Searched again from the top and it's a little faster now:

* click search *

date

Wed Dec 31 22:52:01 CST 2003

* results come back *

date

Wed Dec 31 22:52:27 CST 2003

Still one result:

PostgreSQL 7.4 Documentation (SQL Key Words)
<http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html&gt;
[*0.087%*]
http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html
Size: 65401 bytes, modified: Tue, 25 Nov 2003, 15:02:33 AST

However, the page that I SHOULD have found was this one:

http://www.postgresql.org/docs/current/static/sql-createfunction.html

That page has SECURITY INVOKER in a whole section:

[EXTERNAL] SECURITY INVOKER
[EXTERNAL] SECURITY DEFINER

SECURITY INVOKER indicates that the function is to be executed with
the privileges of the user that calls it. That is the default.
SECURITY DEFINER specifies that the function is to be executed with
the privileges of the user that created it.

Dante

----------
D. Dante Lorenso
dante@lorenso.com

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

--
Dave Cramer
519 939 0336
ICQ # 1467551

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664

--
Dave Cramer
519 939 0336
ICQ # 1467551

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664

#11Dave Cramer
pg@fastcrypt.com
In reply to: The Hermit Hacker (#8)
Re: website doc search is extremely SLOW

I can modify mine to be client server if you want?

It is a java app, so we need to be able to run jdk1.3 at least?

Dave
On Wed, 2003-12-31 at 00:04, Marc G. Fournier wrote:

does anyone know anything better then mnogosearch, that works with
PostgreSQL, for doing indexing? the database server is a Dual Xeon 2.4G,
4G of RAM, and a load avg right now of a lowly 1.5 ... the file system is
3x72G drive in a RAID5 configuration, and the database server is 7.4 ...
the mnogosearch folk use mysql for their development, so its possible
there is something they are doing that is slowing this process down, to
compensate for a fault in mysql, but this is ridiculous ...

note that I have it setup with what the mnogosearch folk lists as being
'the fastest schema for large indexes' or 'crc-multi' ...

right now, we're running only 373k docs:

isvr5# indexer -S

Database statistics

Status Expired Total
-----------------------------
415 0 311 Unsupported Media Type
302 0 1171 Moved Temporarily
502 0 43 Bad Gateway
414 0 3 Request-URI Too Long
301 0 307 Moved Permanently
404 0 1960 Not found
410 0 1 Gone
401 0 51 Unauthorized
304 0 16591 Not Modified
200 0 373015 OK
504 0 48 Gateway Timeout
400 0 3 Bad Request
0 2 47 Not indexed yet
-----------------------------
Total 2 393551

and a vacuum analyze runs nightly ...

anyone with suggestions/ideas? has to be something client/server, like
mnogosearch, as we're dealing with multiple servers searching against the
same database ... so I don't *think* that ht/Dig is a solution, but may be
wrong there ...

On Wed, 30 Dec 2003, Dave Cramer wrote:

search for create index took 59 seconds ?

I've got a fairly (< 1 second for the same search) fast search engine on
the docs at

http://postgresintl.com/search?query=create index

if that link doesn't work, try

postgres.fastcrypt.com/search?query=create index

for now you will have to type it, I'm working on indexing it then making
it pretty

Dave

On Tue, 2003-12-30 at 22:39, D. Dante Lorenso wrote:

Marc G. Fournier wrote:

On Mon, 29 Dec 2003, D. Dante Lorenso wrote:

Trying to use the 'search' in the docs section of PostgreSQL.org
is extremely SLOW. Considering this is a website for a database
and databases are supposed to be good for indexing content, I'd
expect a much faster performance.

What is the full URL for the page you are looking at? Just the 'search
link' at the top of the page?

Perhaps this should be worked on?

Looking into it right now ...

http://www.postgresql.org/ *click Docs on top of page*
http://www.postgresql.org/docs/ * click PostgreSQL static
documentation *

Search this document set: [ SECURITY INVOKER ] Search!

http://www.postgresql.org/search.cgi?ul=http://www.postgresql.org/docs/7.4/static/&amp;q=SECURITY+INVOKER

I loaded that URL on IE and I wait like 2 minutes or more for a response.
then, it usually returns with 1 result. I click the Search! button again
to refresh and it came back a little faster with 0 results?

Searched again from the top and it's a little faster now:

* click search *

date

Wed Dec 31 22:52:01 CST 2003

* results come back *

date

Wed Dec 31 22:52:27 CST 2003

Still one result:

PostgreSQL 7.4 Documentation (SQL Key Words)
<http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html&gt;
[*0.087%*]
http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html
Size: 65401 bytes, modified: Tue, 25 Nov 2003, 15:02:33 AST

However, the page that I SHOULD have found was this one:

http://www.postgresql.org/docs/current/static/sql-createfunction.html

That page has SECURITY INVOKER in a whole section:

[EXTERNAL] SECURITY INVOKER
[EXTERNAL] SECURITY DEFINER

SECURITY INVOKER indicates that the function is to be executed with
the privileges of the user that calls it. That is the default.
SECURITY DEFINER specifies that the function is to be executed with
the privileges of the user that created it.

Dante

----------
D. Dante Lorenso
dante@lorenso.com

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

--
Dave Cramer
519 939 0336
ICQ # 1467551

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

--
Dave Cramer
519 939 0336
ICQ # 1467551

#12The Hermit Hacker
scrappy@hub.org
In reply to: Dave Cramer (#11)
Re: website doc search is extremely SLOW

On Wed, 31 Dec 2003, Dave Cramer wrote:

I can modify mine to be client server if you want?

It is a java app, so we need to be able to run jdk1.3 at least?

jdk1.4 is available on the VMs ... does your spider? for instance, you
mention that you have the docs indexed right now, but we are currently
indexing:

Server http://archives.postgresql.org/
Server http://advocacy.postgresql.org/
Server http://developer.postgresql.org/
Server http://gborg.postgresql.org/
Server http://pgadmin.postgresql.org/
Server http://techdocs.postgresql.org/
Server http://www.postgresql.org/

will it be able to handle:

186_archives=# select count(*) from url;
count
--------
393551
(1 row)

as fast as you are finding with just the docs?

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664

#13Joshua D. Drake
jd@commandprompt.com
In reply to: The Hermit Hacker (#10)
Re: website doc search is extremely SLOW

Hello,

Why are we not using Tsearch2?

Besides the obvious of getting everything into the database?

Sincerely,

Joshua D. Drake

On Tue, 2003-12-30 at 21:24, Marc G. Fournier wrote:

On Wed, 31 Dec 2003, Dave Cramer wrote:

Why are their multiple servers hitting the same db

what servers are searching through the db?

www.postgresql.org and archives.postgresql.org both hit the same DB ...
the point is more that whatever alternative that someone can suggest, it
has to be able to be accessed centrally from several different machines
... when I just tried a search, I was the only one hitting the database,
and the search was dreadful, so it isn't a problem with multiple
connections :(

Just as an FYI, the database server has sufficient RAM on her, so it isn't
a swapping issue ... swap usuage right now, after 77 days uptime:

Device 1K-blocks Used Avail Capacity Type
/dev/da0s1b 8388480 17556 8370924 0% Interleaved

Dave
On Wed, 2003-12-31 at 00:04, Marc G. Fournier wrote:

does anyone know anything better then mnogosearch, that works with
PostgreSQL, for doing indexing? the database server is a Dual Xeon 2.4G,
4G of RAM, and a load avg right now of a lowly 1.5 ... the file system is
3x72G drive in a RAID5 configuration, and the database server is 7.4 ...
the mnogosearch folk use mysql for their development, so its possible
there is something they are doing that is slowing this process down, to
compensate for a fault in mysql, but this is ridiculous ...

note that I have it setup with what the mnogosearch folk lists as being
'the fastest schema for large indexes' or 'crc-multi' ...

right now, we're running only 373k docs:

isvr5# indexer -S

Database statistics

Status Expired Total
-----------------------------
415 0 311 Unsupported Media Type
302 0 1171 Moved Temporarily
502 0 43 Bad Gateway
414 0 3 Request-URI Too Long
301 0 307 Moved Permanently
404 0 1960 Not found
410 0 1 Gone
401 0 51 Unauthorized
304 0 16591 Not Modified
200 0 373015 OK
504 0 48 Gateway Timeout
400 0 3 Bad Request
0 2 47 Not indexed yet
-----------------------------
Total 2 393551

and a vacuum analyze runs nightly ...

anyone with suggestions/ideas? has to be something client/server, like
mnogosearch, as we're dealing with multiple servers searching against the
same database ... so I don't *think* that ht/Dig is a solution, but may be
wrong there ...

On Wed, 30 Dec 2003, Dave Cramer wrote:

search for create index took 59 seconds ?

I've got a fairly (< 1 second for the same search) fast search engine on
the docs at

http://postgresintl.com/search?query=create index

if that link doesn't work, try

postgres.fastcrypt.com/search?query=create index

for now you will have to type it, I'm working on indexing it then making
it pretty

Dave

On Tue, 2003-12-30 at 22:39, D. Dante Lorenso wrote:

Marc G. Fournier wrote:

On Mon, 29 Dec 2003, D. Dante Lorenso wrote:

Trying to use the 'search' in the docs section of PostgreSQL.org
is extremely SLOW. Considering this is a website for a database
and databases are supposed to be good for indexing content, I'd
expect a much faster performance.

What is the full URL for the page you are looking at? Just the 'search
link' at the top of the page?

Perhaps this should be worked on?

Looking into it right now ...

http://www.postgresql.org/ *click Docs on top of page*
http://www.postgresql.org/docs/ * click PostgreSQL static
documentation *

Search this document set: [ SECURITY INVOKER ] Search!

http://www.postgresql.org/search.cgi?ul=http://www.postgresql.org/docs/7.4/static/&amp;q=SECURITY+INVOKER

I loaded that URL on IE and I wait like 2 minutes or more for a response.
then, it usually returns with 1 result. I click the Search! button again
to refresh and it came back a little faster with 0 results?

Searched again from the top and it's a little faster now:

* click search *

date

Wed Dec 31 22:52:01 CST 2003

* results come back *

date

Wed Dec 31 22:52:27 CST 2003

Still one result:

PostgreSQL 7.4 Documentation (SQL Key Words)
<http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html&gt;
[*0.087%*]
http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html
Size: 65401 bytes, modified: Tue, 25 Nov 2003, 15:02:33 AST

However, the page that I SHOULD have found was this one:

http://www.postgresql.org/docs/current/static/sql-createfunction.html

That page has SECURITY INVOKER in a whole section:

[EXTERNAL] SECURITY INVOKER
[EXTERNAL] SECURITY DEFINER

SECURITY INVOKER indicates that the function is to be executed with
the privileges of the user that calls it. That is the default.
SECURITY DEFINER specifies that the function is to be executed with
the privileges of the user that created it.

Dante

----------
D. Dante Lorenso
dante@lorenso.com

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

--
Dave Cramer
519 939 0336
ICQ # 1467551

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664

--
Dave Cramer
519 939 0336
ICQ # 1467551

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

-- 
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL
#14Arjen van der Meijden
acmmailing@vulcanus.its.tudelft.nl
In reply to: The Hermit Hacker (#8)
Re: website doc search is extremely SLOW

Marc,

At our website we had a "in database" search as well... It was terribly
slow (it was a custom built vector space model implemented in mysql+php
so that explains a bit).

We replaced it by the Xapian library (www.xapian.org) with its Omega
frontend as a middle end. I.e. we call with our php-scripts the omega
search frontend and postprocess the results with the scripts (some
rights double checks and so on), from the results we build a very simpel
SELECT ... FROM documents ... WHERE docid IN implode($docids_array)
(you understand enough php to understand this, I suppose)

With our 10GB of tekst, we have a 14GB (uncompressed, 9G compressed
orso) xapian database (the largest part is for the 6.7G positional
table), I'm pretty sure that if we'd store that information in something
like tsearch it'd be more than that 14GB...

Searches take less than a second (unless you do phrase searches of
course, that takes a few seconds and sometimes a few minutes).

I did a query on 'ext3 undelete' just a few minutes ago and it did the
search in 827150 documents in only 0.027 (a second run 0.006) seconds
(ext3 was found in 753 and undelete in 360 documents). Of course that is
excluding the results parsing, the total time to create the webpage was
"much" longer (0.43 seconds orso) due to the fact that the results
needs to be transferred via xinetd and the results needs to be extracted
from mysql (which is terrible with the "search supporting queries" we
issue :/ ) Our search machine is very similar the machine you use as
database, but it doesn't do much heavy work apart from running the
xapian/omega search combination.

If you are interested in this, I can provide (much) more information
about our implementation. Since you don't need right-checks, you could
even get away with just the omega front end all by itself (it has a nice
scripting language, but can't interface with anything but xapian).

The main advantage of taking this out of your sql database is that it
runs on its own custom built storage system (and you could offload it to
another machine, like we did).
Btw, if you really need an "in database" solution, read back the
postings of Eric Ridge at 26-12-2003 20:54 on the hackers list (he's
working on integrating xapian in postgresql as a FTI)

Best regards,

Arjen van der Meijden

Marc G. Fournier wrote:

Show quoted text

does anyone know anything better then mnogosearch, that works with
PostgreSQL, for doing indexing? the database server is a Dual Xeon 2.4G,
4G of RAM, and a load avg right now of a lowly 1.5 ... the file system is
3x72G drive in a RAID5 configuration, and the database server is 7.4 ...
the mnogosearch folk use mysql for their development, so its possible
there is something they are doing that is slowing this process down, to
compensate for a fault in mysql, but this is ridiculous ...

note that I have it setup with what the mnogosearch folk lists as being
'the fastest schema for large indexes' or 'crc-multi' ...

right now, we're running only 373k docs:

isvr5# indexer -S

Database statistics

Status Expired Total
-----------------------------
415 0 311 Unsupported Media Type
302 0 1171 Moved Temporarily
502 0 43 Bad Gateway
414 0 3 Request-URI Too Long
301 0 307 Moved Permanently
404 0 1960 Not found
410 0 1 Gone
401 0 51 Unauthorized
304 0 16591 Not Modified
200 0 373015 OK
504 0 48 Gateway Timeout
400 0 3 Bad Request
0 2 47 Not indexed yet
-----------------------------
Total 2 393551

and a vacuum analyze runs nightly ...

anyone with suggestions/ideas? has to be something client/server, like
mnogosearch, as we're dealing with multiple servers searching against the
same database ... so I don't *think* that ht/Dig is a solution, but may be
wrong there ...

#15Dave Cramer
pg@fastcrypt.com
In reply to: The Hermit Hacker (#12)
Re: website doc search is extremely SLOW

Marc,

No it doesn't spider, it is a specialized tool for searching documents.

I'm curious, what value is there to being able to count the number of
url's ?

It does do things like query all documents where CREATE AND TABLE are n
words apart, just as fast, I would think these are more valuable to
document searching?

I think the challenge here is what do we want to search. I am betting
that folks use this page as they would man? ie. what is the command for
create trigger?

As I said my offer stands to help out, but I think if the goal is to
search the entire website, then this particular tool is not useful.

At this point I am working on indexing the sgml directly as it has less
cruft in it. For instance all the links that appear in every summary are
just noise.

Dave

On Wed, 2003-12-31 at 00:44, Marc G. Fournier wrote:

On Wed, 31 Dec 2003, Dave Cramer wrote:

I can modify mine to be client server if you want?

It is a java app, so we need to be able to run jdk1.3 at least?

jdk1.4 is available on the VMs ... does your spider? for instance, you
mention that you have the docs indexed right now, but we are currently
indexing:

Server http://archives.postgresql.org/
Server http://advocacy.postgresql.org/
Server http://developer.postgresql.org/
Server http://gborg.postgresql.org/
Server http://pgadmin.postgresql.org/
Server http://techdocs.postgresql.org/
Server http://www.postgresql.org/

will it be able to handle:

186_archives=# select count(*) from url;
count
--------
393551
(1 row)

as fast as you are finding with just the docs?

----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664

--
Dave Cramer
519 939 0336
ICQ # 1467551

#16John Sidney-Woollett
johnsw@wardbrook.com
In reply to: Dave Cramer (#15)
Re: website doc search is extremely SLOW

I think that Oleg's new search offering looks really good and fast. (I
can't wait till I have some task that needs tsearch!).

I agree with Dave that searching the docs is more important for me than
the sites - but it would be really nice to have both, in one tool.

I built something similar for the Tate Gallery in the UK - here you can
select the type of content that you want returned, either static pages or
dynamic. You can see the idea at
http://www.tate.org.uk/search/default.jsp?terms=sunset%20oil&amp;action=new

This is custom built (using java/Oracle), supports stemming, boolean
operators, exact phrase matching, relevancy and matched term highlighting.

You can switch on/off the types of documents that you are not interested
in. Using this analogy, a search facility that could offer you results
from i) the docs and/or ii) the postgres sites static pages would be very
useful.

John Sidney-Woollett

Dave Cramer said:

Show quoted text

Marc,

No it doesn't spider, it is a specialized tool for searching documents.

I'm curious, what value is there to being able to count the number of
url's ?

It does do things like query all documents where CREATE AND TABLE are n
words apart, just as fast, I would think these are more valuable to
document searching?

I think the challenge here is what do we want to search. I am betting
that folks use this page as they would man? ie. what is the command for
create trigger?

As I said my offer stands to help out, but I think if the goal is to
search the entire website, then this particular tool is not useful.

At this point I am working on indexing the sgml directly as it has less
cruft in it. For instance all the links that appear in every summary are
just noise.

Dave

On Wed, 2003-12-31 at 00:44, Marc G. Fournier wrote:

On Wed, 31 Dec 2003, Dave Cramer wrote:

I can modify mine to be client server if you want?

It is a java app, so we need to be able to run jdk1.3 at least?

jdk1.4 is available on the VMs ... does your spider? for instance, you
mention that you have the docs indexed right now, but we are currently
indexing:

Server http://archives.postgresql.org/
Server http://advocacy.postgresql.org/
Server http://developer.postgresql.org/
Server http://gborg.postgresql.org/
Server http://pgadmin.postgresql.org/
Server http://techdocs.postgresql.org/
Server http://www.postgresql.org/

will it be able to handle:

186_archives=# select count(*) from url;
count
--------
393551
(1 row)

as fast as you are finding with just the docs?

----
Marc G. Fournier Hub.Org Networking Services
(http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ:
7615664

--
Dave Cramer
519 939 0336
ICQ # 1467551

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

#17Ericson Smith
eric@did-it.com
In reply to: John Sidney-Woollett (#16)
Re: website doc search is extremely SLOW

You should probably take a look at the Swish project. For a certain
project, we tried Tsearch2/Tsearch, even (gasp) MySQL fulltext search,
but with over 600,000 documents to index, both took too long to conduct
searches, especially as the database was swapped in and out of memory
based on search segment. MySQL full text was the most unusable.

Swish uses its own internal DB format, and comes with a simple spider as
well. You can make it search by category, date and other nifty criteria
also.
http://swish-e.org

You can take a look over at the project and do some searches to see what
I mean:
http://cbd-net.com

Warmest regards, 
Ericson Smith
Tracking Specialist/DBA
+-----------------------+----------------------------+
| http://www.did-it.com | "When I'm paid, I always   |
| eric@did-it.com       | follow the job through.    |
| 516-255-0500          | You know that." -Angel Eyes|
+-----------------------+----------------------------+ 

John Sidney-Woollett wrote:

Show quoted text

I think that Oleg's new search offering looks really good and fast. (I
can't wait till I have some task that needs tsearch!).

I agree with Dave that searching the docs is more important for me than
the sites - but it would be really nice to have both, in one tool.

I built something similar for the Tate Gallery in the UK - here you can
select the type of content that you want returned, either static pages or
dynamic. You can see the idea at
http://www.tate.org.uk/search/default.jsp?terms=sunset%20oil&amp;action=new

This is custom built (using java/Oracle), supports stemming, boolean
operators, exact phrase matching, relevancy and matched term highlighting.

You can switch on/off the types of documents that you are not interested
in. Using this analogy, a search facility that could offer you results
from i) the docs and/or ii) the postgres sites static pages would be very
useful.

John Sidney-Woollett

Dave Cramer said:

Marc,

No it doesn't spider, it is a specialized tool for searching documents.

I'm curious, what value is there to being able to count the number of
url's ?

It does do things like query all documents where CREATE AND TABLE are n
words apart, just as fast, I would think these are more valuable to
document searching?

I think the challenge here is what do we want to search. I am betting
that folks use this page as they would man? ie. what is the command for
create trigger?

As I said my offer stands to help out, but I think if the goal is to
search the entire website, then this particular tool is not useful.

At this point I am working on indexing the sgml directly as it has less
cruft in it. For instance all the links that appear in every summary are
just noise.

Dave

On Wed, 2003-12-31 at 00:44, Marc G. Fournier wrote:

On Wed, 31 Dec 2003, Dave Cramer wrote:

I can modify mine to be client server if you want?

It is a java app, so we need to be able to run jdk1.3 at least?

jdk1.4 is available on the VMs ... does your spider? for instance, you
mention that you have the docs indexed right now, but we are currently
indexing:

Server http://archives.postgresql.org/
Server http://advocacy.postgresql.org/
Server http://developer.postgresql.org/
Server http://gborg.postgresql.org/
Server http://pgadmin.postgresql.org/
Server http://techdocs.postgresql.org/
Server http://www.postgresql.org/

will it be able to handle:

186_archives=# select count(*) from url;
count
--------
393551
(1 row)

as fast as you are finding with just the docs?

----
Marc G. Fournier Hub.Org Networking Services
(http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ:
7615664

--
Dave Cramer
519 939 0336
ICQ # 1467551

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

#18John Sidney-Woollett
johnsw@wardbrook.com
In reply to: Ericson Smith (#17)
Re: website doc search is extremely SLOW

Wow, you're right - I could have probably saved myself a load of time! :)

Although you do learn a lot reinventing the wheel... ...or at least you
hit the same issues and insights others did before...

John

Ericson Smith said:

Show quoted text

You should probably take a look at the Swish project. For a certain
project, we tried Tsearch2/Tsearch, even (gasp) MySQL fulltext search,
but with over 600,000 documents to index, both took too long to conduct
searches, especially as the database was swapped in and out of memory
based on search segment. MySQL full text was the most unusable.

Swish uses its own internal DB format, and comes with a simple spider as
well. You can make it search by category, date and other nifty criteria
also.
http://swish-e.org

You can take a look over at the project and do some searches to see what
I mean:
http://cbd-net.com

Warmest regards,
Ericson Smith
Tracking Specialist/DBA
+-----------------------+----------------------------+
| http://www.did-it.com | "When I'm paid, I always   |
| eric@did-it.com       | follow the job through.    |
| 516-255-0500          | You know that." -Angel Eyes|
+-----------------------+----------------------------+

John Sidney-Woollett wrote:

I think that Oleg's new search offering looks really good and fast. (I
can't wait till I have some task that needs tsearch!).

I agree with Dave that searching the docs is more important for me than
the sites - but it would be really nice to have both, in one tool.

I built something similar for the Tate Gallery in the UK - here you can
select the type of content that you want returned, either static pages or
dynamic. You can see the idea at
http://www.tate.org.uk/search/default.jsp?terms=sunset%20oil&amp;action=new

This is custom built (using java/Oracle), supports stemming, boolean
operators, exact phrase matching, relevancy and matched term
highlighting.

You can switch on/off the types of documents that you are not interested
in. Using this analogy, a search facility that could offer you results
from i) the docs and/or ii) the postgres sites static pages would be very
useful.

John Sidney-Woollett

Dave Cramer said:

Marc,

No it doesn't spider, it is a specialized tool for searching documents.

I'm curious, what value is there to being able to count the number of
url's ?

It does do things like query all documents where CREATE AND TABLE are n
words apart, just as fast, I would think these are more valuable to
document searching?

I think the challenge here is what do we want to search. I am betting
that folks use this page as they would man? ie. what is the command for
create trigger?

As I said my offer stands to help out, but I think if the goal is to
search the entire website, then this particular tool is not useful.

At this point I am working on indexing the sgml directly as it has less
cruft in it. For instance all the links that appear in every summary are
just noise.

Dave

On Wed, 2003-12-31 at 00:44, Marc G. Fournier wrote:

On Wed, 31 Dec 2003, Dave Cramer wrote:

I can modify mine to be client server if you want?

It is a java app, so we need to be able to run jdk1.3 at least?

jdk1.4 is available on the VMs ... does your spider? for instance, you
mention that you have the docs indexed right now, but we are currently
indexing:

Server http://archives.postgresql.org/
Server http://advocacy.postgresql.org/
Server http://developer.postgresql.org/
Server http://gborg.postgresql.org/
Server http://pgadmin.postgresql.org/
Server http://techdocs.postgresql.org/
Server http://www.postgresql.org/

will it be able to handle:

186_archives=# select count(*) from url;
count
--------
393551
(1 row)

as fast as you are finding with just the docs?

----
Marc G. Fournier Hub.Org Networking Services
(http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ:
7615664

--
Dave Cramer
519 939 0336
ICQ # 1467551

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if
your
joining column's datatypes do not match

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

#19Dave Cramer
pg@fastcrypt.com
In reply to: John Sidney-Woollett (#18)
Re: website doc search is extremely SLOW

The search engine I am using is lucene
http://jakarta.apache.org/lucene/docs/index.html

it too uses it's own internal database format, optimized for searching,
it is quite flexible, and allow searching on arbitrary fields as well.

The section on querying explains more

http://jakarta.apache.org/lucene/docs/queryparsersyntax.html

It is even possible to index text data inside a database.

Dave
On Wed, 2003-12-31 at 08:44, John Sidney-Woollett wrote:

Wow, you're right - I could have probably saved myself a load of time! :)

Although you do learn a lot reinventing the wheel... ...or at least you
hit the same issues and insights others did before...

John

Ericson Smith said:

You should probably take a look at the Swish project. For a certain
project, we tried Tsearch2/Tsearch, even (gasp) MySQL fulltext search,
but with over 600,000 documents to index, both took too long to conduct
searches, especially as the database was swapped in and out of memory
based on search segment. MySQL full text was the most unusable.

Swish uses its own internal DB format, and comes with a simple spider as
well. You can make it search by category, date and other nifty criteria
also.
http://swish-e.org

You can take a look over at the project and do some searches to see what
I mean:
http://cbd-net.com

Warmest regards,
Ericson Smith
Tracking Specialist/DBA
+-----------------------+----------------------------+
| http://www.did-it.com | "When I'm paid, I always   |
| eric@did-it.com       | follow the job through.    |
| 516-255-0500          | You know that." -Angel Eyes|
+-----------------------+----------------------------+

John Sidney-Woollett wrote:

I think that Oleg's new search offering looks really good and fast. (I
can't wait till I have some task that needs tsearch!).

I agree with Dave that searching the docs is more important for me than
the sites - but it would be really nice to have both, in one tool.

I built something similar for the Tate Gallery in the UK - here you can
select the type of content that you want returned, either static pages or
dynamic. You can see the idea at
http://www.tate.org.uk/search/default.jsp?terms=sunset%20oil&amp;action=new

This is custom built (using java/Oracle), supports stemming, boolean
operators, exact phrase matching, relevancy and matched term
highlighting.

You can switch on/off the types of documents that you are not interested
in. Using this analogy, a search facility that could offer you results
from i) the docs and/or ii) the postgres sites static pages would be very
useful.

John Sidney-Woollett

Dave Cramer said:

Marc,

No it doesn't spider, it is a specialized tool for searching documents.

I'm curious, what value is there to being able to count the number of
url's ?

It does do things like query all documents where CREATE AND TABLE are n
words apart, just as fast, I would think these are more valuable to
document searching?

I think the challenge here is what do we want to search. I am betting
that folks use this page as they would man? ie. what is the command for
create trigger?

As I said my offer stands to help out, but I think if the goal is to
search the entire website, then this particular tool is not useful.

At this point I am working on indexing the sgml directly as it has less
cruft in it. For instance all the links that appear in every summary are
just noise.

Dave

On Wed, 2003-12-31 at 00:44, Marc G. Fournier wrote:

On Wed, 31 Dec 2003, Dave Cramer wrote:

I can modify mine to be client server if you want?

It is a java app, so we need to be able to run jdk1.3 at least?

jdk1.4 is available on the VMs ... does your spider? for instance, you
mention that you have the docs indexed right now, but we are currently
indexing:

Server http://archives.postgresql.org/
Server http://advocacy.postgresql.org/
Server http://developer.postgresql.org/
Server http://gborg.postgresql.org/
Server http://pgadmin.postgresql.org/
Server http://techdocs.postgresql.org/
Server http://www.postgresql.org/

will it be able to handle:

186_archives=# select count(*) from url;
count
--------
393551
(1 row)

as fast as you are finding with just the docs?

----
Marc G. Fournier Hub.Org Networking Services
(http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ:
7615664

--
Dave Cramer
519 939 0336
ICQ # 1467551

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if
your
joining column's datatypes do not match

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

--
Dave Cramer
519 939 0336
ICQ # 1467551

#20Dave Cramer
pg@fastcrypt.com
In reply to: John Sidney-Woollett (#18)
Re: website doc search is extremely SLOW

Well it appears there are quite a few solutions to use so the next
question should be what are we trying to accomplish here?

One thing that I think is that the documentation search should be
limited to the documentation.

Who is in a position to make the decision of which solution to use?

Dave
On Wed, 2003-12-31 at 08:44, John Sidney-Woollett wrote:

Wow, you're right - I could have probably saved myself a load of time! :)

Although you do learn a lot reinventing the wheel... ...or at least you
hit the same issues and insights others did before...

John

Ericson Smith said:

You should probably take a look at the Swish project. For a certain
project, we tried Tsearch2/Tsearch, even (gasp) MySQL fulltext search,
but with over 600,000 documents to index, both took too long to conduct
searches, especially as the database was swapped in and out of memory
based on search segment. MySQL full text was the most unusable.

Swish uses its own internal DB format, and comes with a simple spider as
well. You can make it search by category, date and other nifty criteria
also.
http://swish-e.org

You can take a look over at the project and do some searches to see what
I mean:
http://cbd-net.com

Warmest regards,
Ericson Smith
Tracking Specialist/DBA
+-----------------------+----------------------------+
| http://www.did-it.com | "When I'm paid, I always   |
| eric@did-it.com       | follow the job through.    |
| 516-255-0500          | You know that." -Angel Eyes|
+-----------------------+----------------------------+

John Sidney-Woollett wrote:

I think that Oleg's new search offering looks really good and fast. (I
can't wait till I have some task that needs tsearch!).

I agree with Dave that searching the docs is more important for me than
the sites - but it would be really nice to have both, in one tool.

I built something similar for the Tate Gallery in the UK - here you can
select the type of content that you want returned, either static pages or
dynamic. You can see the idea at
http://www.tate.org.uk/search/default.jsp?terms=sunset%20oil&amp;action=new

This is custom built (using java/Oracle), supports stemming, boolean
operators, exact phrase matching, relevancy and matched term
highlighting.

You can switch on/off the types of documents that you are not interested
in. Using this analogy, a search facility that could offer you results
from i) the docs and/or ii) the postgres sites static pages would be very
useful.

John Sidney-Woollett

Dave Cramer said:

Marc,

No it doesn't spider, it is a specialized tool for searching documents.

I'm curious, what value is there to being able to count the number of
url's ?

It does do things like query all documents where CREATE AND TABLE are n
words apart, just as fast, I would think these are more valuable to
document searching?

I think the challenge here is what do we want to search. I am betting
that folks use this page as they would man? ie. what is the command for
create trigger?

As I said my offer stands to help out, but I think if the goal is to
search the entire website, then this particular tool is not useful.

At this point I am working on indexing the sgml directly as it has less
cruft in it. For instance all the links that appear in every summary are
just noise.

Dave

On Wed, 2003-12-31 at 00:44, Marc G. Fournier wrote:

On Wed, 31 Dec 2003, Dave Cramer wrote:

I can modify mine to be client server if you want?

It is a java app, so we need to be able to run jdk1.3 at least?

jdk1.4 is available on the VMs ... does your spider? for instance, you
mention that you have the docs indexed right now, but we are currently
indexing:

Server http://archives.postgresql.org/
Server http://advocacy.postgresql.org/
Server http://developer.postgresql.org/
Server http://gborg.postgresql.org/
Server http://pgadmin.postgresql.org/
Server http://techdocs.postgresql.org/
Server http://www.postgresql.org/

will it be able to handle:

186_archives=# select count(*) from url;
count
--------
393551
(1 row)

as fast as you are finding with just the docs?

----
Marc G. Fournier Hub.Org Networking Services
(http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ:
7615664

--
Dave Cramer
519 939 0336
ICQ # 1467551

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if
your
joining column's datatypes do not match

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

--
Dave Cramer
519 939 0336
ICQ # 1467551

#21George Essig
george_essig@yahoo.com
In reply to: Dave Cramer (#20)
#22Eric Ridge
ebr@tcdi.com
In reply to: Arjen van der Meijden (#14)
#23D. Dante Lorenso
dante@lorenso.com
In reply to: Dave Cramer (#20)
#24The Hermit Hacker
scrappy@hub.org
In reply to: Dave Cramer (#15)
#25The Hermit Hacker
scrappy@hub.org
In reply to: Joshua D. Drake (#13)
#26The Hermit Hacker
scrappy@hub.org
In reply to: Dave Cramer (#20)
#27Bruce Momjian
bruce@momjian.us
In reply to: The Hermit Hacker (#25)
#28The Hermit Hacker
scrappy@hub.org
In reply to: Bruce Momjian (#27)
#29Bruce Momjian
bruce@momjian.us
In reply to: The Hermit Hacker (#28)
#30Dave Cramer
pg@fastcrypt.com
In reply to: Bruce Momjian (#27)
#31The Hermit Hacker
scrappy@hub.org
In reply to: Bruce Momjian (#29)
#32The Hermit Hacker
scrappy@hub.org
In reply to: Dave Cramer (#30)
#33Dave Cramer
pg@fastcrypt.com
In reply to: The Hermit Hacker (#31)
#34The Hermit Hacker
scrappy@hub.org
In reply to: Dave Cramer (#33)
#35Bruce Momjian
bruce@momjian.us
In reply to: The Hermit Hacker (#31)
#36The Hermit Hacker
scrappy@hub.org
In reply to: Bruce Momjian (#35)
#37Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: The Hermit Hacker (#36)
#38Arjen van der Meijden
acmmailing@vulcanus.its.tudelft.nl
In reply to: The Hermit Hacker (#36)
#39The Hermit Hacker
scrappy@hub.org
In reply to: Arjen van der Meijden (#38)
#40Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mark Kirkwood (#37)
#41The Hermit Hacker
scrappy@hub.org
In reply to: Tom Lane (#40)
#42Tom Lane
tgl@sss.pgh.pa.us
In reply to: The Hermit Hacker (#41)
#43The Hermit Hacker
scrappy@hub.org
In reply to: Bruce Momjian (#35)
#44The Hermit Hacker
scrappy@hub.org
In reply to: Tom Lane (#42)
#45Arjen van der Meijden
acmmailing@vulcanus.its.tudelft.nl
In reply to: The Hermit Hacker (#39)
#46Tom Lane
tgl@sss.pgh.pa.us
In reply to: The Hermit Hacker (#44)
#47The Hermit Hacker
scrappy@hub.org
In reply to: Arjen van der Meijden (#45)
#48The Hermit Hacker
scrappy@hub.org
In reply to: Tom Lane (#46)
#49Tom Lane
tgl@sss.pgh.pa.us
In reply to: The Hermit Hacker (#31)
#50The Hermit Hacker
scrappy@hub.org
In reply to: Tom Lane (#49)
#51Tom Lane
tgl@sss.pgh.pa.us
In reply to: The Hermit Hacker (#48)
#52Tom Lane
tgl@sss.pgh.pa.us
In reply to: The Hermit Hacker (#50)
#53The Hermit Hacker
scrappy@hub.org
In reply to: Tom Lane (#52)
#54Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Tom Lane (#51)
#55Tom Lane
tgl@sss.pgh.pa.us
In reply to: The Hermit Hacker (#53)
#56The Hermit Hacker
scrappy@hub.org
In reply to: Tom Lane (#55)
#57ezra epstein
ee_newsgroup_post@prajnait.com
In reply to: D. Dante Lorenso (#1)
#58The Hermit Hacker
scrappy@hub.org
In reply to: Tom Lane (#55)
#59Tom Lane
tgl@sss.pgh.pa.us
In reply to: The Hermit Hacker (#58)
#60The Hermit Hacker
scrappy@hub.org
In reply to: Tom Lane (#59)
#61Tom Lane
tgl@sss.pgh.pa.us
In reply to: The Hermit Hacker (#60)
#62The Hermit Hacker
scrappy@hub.org
In reply to: Tom Lane (#61)
#63Dave Page
dpage@pgadmin.org
In reply to: ezra epstein (#57)
#64Joshua D. Drake
jd@commandprompt.com
In reply to: The Hermit Hacker (#31)
#65Oleg Bartunov
oleg@sai.msu.su
In reply to: The Hermit Hacker (#25)
#66Oleg Bartunov
oleg@sai.msu.su
In reply to: The Hermit Hacker (#36)
#67Dave Cramer
pg@fastcrypt.com
In reply to: Oleg Bartunov (#65)
#68Oleg Bartunov
oleg@sai.msu.su
In reply to: Dave Cramer (#67)
#69Joshua D. Drake
jd@commandprompt.com
In reply to: Dave Cramer (#67)
#70Oleg Bartunov
oleg@sai.msu.su
In reply to: Joshua D. Drake (#69)
#71Joshua D. Drake
jd@commandprompt.com
In reply to: Oleg Bartunov (#70)
#72The Hermit Hacker
scrappy@hub.org
In reply to: Oleg Bartunov (#70)
#73The Hermit Hacker
scrappy@hub.org
In reply to: Oleg Bartunov (#66)
#74The Hermit Hacker
scrappy@hub.org
In reply to: Oleg Bartunov (#65)
#75Oleg Bartunov
oleg@sai.msu.su
In reply to: The Hermit Hacker (#74)
#76Dave Cramer
pg@fastcrypt.com
In reply to: Oleg Bartunov (#75)
#77The Hermit Hacker
scrappy@hub.org
In reply to: Oleg Bartunov (#75)
#78Dave Cramer
pg@fastcrypt.com
In reply to: The Hermit Hacker (#77)
#79Greg Sabino Mullane
greg@turnstep.com
In reply to: The Hermit Hacker (#25)
#80The Hermit Hacker
scrappy@hub.org
In reply to: ezra epstein (#57)
#81The Hermit Hacker
scrappy@hub.org
In reply to: Greg Sabino Mullane (#79)
#82Greg Sabino Mullane
greg@turnstep.com
In reply to: The Hermit Hacker (#81)
#83Dave Cramer
pg@fastcrypt.com
In reply to: The Hermit Hacker (#80)
#84Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Dave Cramer (#83)
#85Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mark Kirkwood (#84)
#86Oleg Bartunov
oleg@sai.msu.su
In reply to: ezra epstein (#57)
#87Jeff Davis
pgsql@j-davis.com
In reply to: The Hermit Hacker (#73)
#88Oleg Bartunov
oleg@sai.msu.su
In reply to: D. Dante Lorenso (#23)
#89The Hermit Hacker
scrappy@hub.org
In reply to: Tom Lane (#85)