website doc search is extremely SLOW
Trying to use the 'search' in the docs section of PostgreSQL.org
is extremely SLOW. Considering this is a website for a database
and databases are supposed to be good for indexing content, I'd
expect a much faster performance.
I submitted my search over two minutes ago. I just finished this
email to the list. The results have still not come back. I only
searched for:
SECURITY INVOKER
Perhaps this should be worked on?
Dante
On Mon, 29 Dec 2003, D. Dante Lorenso wrote:
Trying to use the 'search' in the docs section of PostgreSQL.org
is extremely SLOW. Considering this is a website for a database
and databases are supposed to be good for indexing content, I'd
expect a much faster performance.I submitted my search over two minutes ago. I just finished this
email to the list. The results have still not come back. I only
searched for:SECURITY INVOKER
Perhaps this should be worked on?
Your query takes 0.01 sec to complete (134 documents found) on my development
server I hope to present to the community soon after New Year. We've
crawled 27 postgresql related sites. Screenshot is available
http://www.sai.msu.su/~megera/postgres/pgsql.ru.gif
Dante
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83
On Mon, 29 Dec 2003, D. Dante Lorenso wrote:
Trying to use the 'search' in the docs section of PostgreSQL.org
is extremely SLOW. Considering this is a website for a database
and databases are supposed to be good for indexing content, I'd
expect a much faster performance.
What is the full URL for the page you are looking at? Just the 'search
link' at the top of the page?
Perhaps this should be worked on?
Looking into it right now ...
----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
On Tue, 30 Dec 2003, Marc G. Fournier wrote:
On Mon, 29 Dec 2003, D. Dante Lorenso wrote:
Trying to use the 'search' in the docs section of PostgreSQL.org
is extremely SLOW. Considering this is a website for a database
and databases are supposed to be good for indexing content, I'd
expect a much faster performance.What is the full URL for the page you are looking at? Just the 'search
link' at the top of the page?Perhaps this should be worked on?
Looking into it right now ...
just ran it from archives.postgresql.org (security invoker) and it comes
back in 10 seconds ... I think it might be a problem with doing a search
while indexing is happening ... am looking at that ...
----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
When you got to docs and then click static, it has the ability to
search. It is slowwwwwwwww....
Sincerely,
Joshua D. Drake
On Tue, 2003-12-30 at 19:05, Marc G. Fournier wrote:
On Mon, 29 Dec 2003, D. Dante Lorenso wrote:
Trying to use the 'search' in the docs section of PostgreSQL.org
is extremely SLOW. Considering this is a website for a database
and databases are supposed to be good for indexing content, I'd
expect a much faster performance.What is the full URL for the page you are looking at? Just the 'search
link' at the top of the page?Perhaps this should be worked on?
Looking into it right now ...
----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings
--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL
Marc G. Fournier wrote:
On Mon, 29 Dec 2003, D. Dante Lorenso wrote:
Trying to use the 'search' in the docs section of PostgreSQL.org
is extremely SLOW. Considering this is a website for a database
and databases are supposed to be good for indexing content, I'd
expect a much faster performance.What is the full URL for the page you are looking at? Just the 'search
link' at the top of the page?Perhaps this should be worked on?
Looking into it right now ...
http://www.postgresql.org/ *click Docs on top of page*
http://www.postgresql.org/docs/ * click PostgreSQL static
documentation *
Search this document set: [ SECURITY INVOKER ] Search!
I loaded that URL on IE and I wait like 2 minutes or more for a response.
then, it usually returns with 1 result. I click the Search! button again
to refresh and it came back a little faster with 0 results?
Searched again from the top and it's a little faster now:
* click search *
date
Wed Dec 31 22:52:01 CST 2003
* results come back *
date
Wed Dec 31 22:52:27 CST 2003
Still one result:
PostgreSQL 7.4 Documentation (SQL Key Words)
<http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html>
[*0.087%*]
http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html
Size: 65401 bytes, modified: Tue, 25 Nov 2003, 15:02:33 AST
However, the page that I SHOULD have found was this one:
http://www.postgresql.org/docs/current/static/sql-createfunction.html
That page has SECURITY INVOKER in a whole section:
[EXTERNAL] SECURITY INVOKER
[EXTERNAL] SECURITY DEFINER
SECURITY INVOKER indicates that the function is to be executed with
the privileges of the user that calls it. That is the default.
SECURITY DEFINER specifies that the function is to be executed with
the privileges of the user that created it.
Dante
----------
D. Dante Lorenso
dante@lorenso.com
search for create index took 59 seconds ?
I've got a fairly (< 1 second for the same search) fast search engine on
the docs at
http://postgresintl.com/search?query=create index
if that link doesn't work, try
postgres.fastcrypt.com/search?query=create index
for now you will have to type it, I'm working on indexing it then making
it pretty
Dave
On Tue, 2003-12-30 at 22:39, D. Dante Lorenso wrote:
Marc G. Fournier wrote:
On Mon, 29 Dec 2003, D. Dante Lorenso wrote:
Trying to use the 'search' in the docs section of PostgreSQL.org
is extremely SLOW. Considering this is a website for a database
and databases are supposed to be good for indexing content, I'd
expect a much faster performance.What is the full URL for the page you are looking at? Just the 'search
link' at the top of the page?Perhaps this should be worked on?
Looking into it right now ...
http://www.postgresql.org/ *click Docs on top of page*
http://www.postgresql.org/docs/ * click PostgreSQL static
documentation *Search this document set: [ SECURITY INVOKER ] Search!
I loaded that URL on IE and I wait like 2 minutes or more for a response.
then, it usually returns with 1 result. I click the Search! button again
to refresh and it came back a little faster with 0 results?Searched again from the top and it's a little faster now:
* click search *
date
Wed Dec 31 22:52:01 CST 2003
* results come back *
date
Wed Dec 31 22:52:27 CST 2003
Still one result:
PostgreSQL 7.4 Documentation (SQL Key Words)
<http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html>
[*0.087%*]
http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html
Size: 65401 bytes, modified: Tue, 25 Nov 2003, 15:02:33 ASTHowever, the page that I SHOULD have found was this one:
http://www.postgresql.org/docs/current/static/sql-createfunction.html
That page has SECURITY INVOKER in a whole section:
[EXTERNAL] SECURITY INVOKER
[EXTERNAL] SECURITY DEFINERSECURITY INVOKER indicates that the function is to be executed with
the privileges of the user that calls it. That is the default.
SECURITY DEFINER specifies that the function is to be executed with
the privileges of the user that created it.Dante
----------
D. Dante Lorenso
dante@lorenso.com---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
--
Dave Cramer
519 939 0336
ICQ # 1467551
does anyone know anything better then mnogosearch, that works with
PostgreSQL, for doing indexing? the database server is a Dual Xeon 2.4G,
4G of RAM, and a load avg right now of a lowly 1.5 ... the file system is
3x72G drive in a RAID5 configuration, and the database server is 7.4 ...
the mnogosearch folk use mysql for their development, so its possible
there is something they are doing that is slowing this process down, to
compensate for a fault in mysql, but this is ridiculous ...
note that I have it setup with what the mnogosearch folk lists as being
'the fastest schema for large indexes' or 'crc-multi' ...
right now, we're running only 373k docs:
isvr5# indexer -S
Database statistics
Status Expired Total
-----------------------------
415 0 311 Unsupported Media Type
302 0 1171 Moved Temporarily
502 0 43 Bad Gateway
414 0 3 Request-URI Too Long
301 0 307 Moved Permanently
404 0 1960 Not found
410 0 1 Gone
401 0 51 Unauthorized
304 0 16591 Not Modified
200 0 373015 OK
504 0 48 Gateway Timeout
400 0 3 Bad Request
0 2 47 Not indexed yet
-----------------------------
Total 2 393551
and a vacuum analyze runs nightly ...
anyone with suggestions/ideas? has to be something client/server, like
mnogosearch, as we're dealing with multiple servers searching against the
same database ... so I don't *think* that ht/Dig is a solution, but may be
wrong there ...
On Wed, 30 Dec 2003, Dave Cramer wrote:
search for create index took 59 seconds ?
I've got a fairly (< 1 second for the same search) fast search engine on
the docs athttp://postgresintl.com/search?query=create index
if that link doesn't work, try
postgres.fastcrypt.com/search?query=create index
for now you will have to type it, I'm working on indexing it then making
it prettyDave
On Tue, 2003-12-30 at 22:39, D. Dante Lorenso wrote:
Marc G. Fournier wrote:
On Mon, 29 Dec 2003, D. Dante Lorenso wrote:
Trying to use the 'search' in the docs section of PostgreSQL.org
is extremely SLOW. Considering this is a website for a database
and databases are supposed to be good for indexing content, I'd
expect a much faster performance.What is the full URL for the page you are looking at? Just the 'search
link' at the top of the page?Perhaps this should be worked on?
Looking into it right now ...
http://www.postgresql.org/ *click Docs on top of page*
http://www.postgresql.org/docs/ * click PostgreSQL static
documentation *Search this document set: [ SECURITY INVOKER ] Search!
I loaded that URL on IE and I wait like 2 minutes or more for a response.
then, it usually returns with 1 result. I click the Search! button again
to refresh and it came back a little faster with 0 results?Searched again from the top and it's a little faster now:
* click search *
date
Wed Dec 31 22:52:01 CST 2003
* results come back *
date
Wed Dec 31 22:52:27 CST 2003
Still one result:
PostgreSQL 7.4 Documentation (SQL Key Words)
<http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html>
[*0.087%*]
http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html
Size: 65401 bytes, modified: Tue, 25 Nov 2003, 15:02:33 ASTHowever, the page that I SHOULD have found was this one:
http://www.postgresql.org/docs/current/static/sql-createfunction.html
That page has SECURITY INVOKER in a whole section:
[EXTERNAL] SECURITY INVOKER
[EXTERNAL] SECURITY DEFINERSECURITY INVOKER indicates that the function is to be executed with
the privileges of the user that calls it. That is the default.
SECURITY DEFINER specifies that the function is to be executed with
the privileges of the user that created it.Dante
----------
D. Dante Lorenso
dante@lorenso.com---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly--
Dave Cramer
519 939 0336
ICQ # 1467551
----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
Why are their multiple servers hitting the same db
what servers are searching through the db?
Dave
On Wed, 2003-12-31 at 00:04, Marc G. Fournier wrote:
does anyone know anything better then mnogosearch, that works with
PostgreSQL, for doing indexing? the database server is a Dual Xeon 2.4G,
4G of RAM, and a load avg right now of a lowly 1.5 ... the file system is
3x72G drive in a RAID5 configuration, and the database server is 7.4 ...
the mnogosearch folk use mysql for their development, so its possible
there is something they are doing that is slowing this process down, to
compensate for a fault in mysql, but this is ridiculous ...note that I have it setup with what the mnogosearch folk lists as being
'the fastest schema for large indexes' or 'crc-multi' ...right now, we're running only 373k docs:
isvr5# indexer -S
Database statistics
Status Expired Total
-----------------------------
415 0 311 Unsupported Media Type
302 0 1171 Moved Temporarily
502 0 43 Bad Gateway
414 0 3 Request-URI Too Long
301 0 307 Moved Permanently
404 0 1960 Not found
410 0 1 Gone
401 0 51 Unauthorized
304 0 16591 Not Modified
200 0 373015 OK
504 0 48 Gateway Timeout
400 0 3 Bad Request
0 2 47 Not indexed yet
-----------------------------
Total 2 393551and a vacuum analyze runs nightly ...
anyone with suggestions/ideas? has to be something client/server, like
mnogosearch, as we're dealing with multiple servers searching against the
same database ... so I don't *think* that ht/Dig is a solution, but may be
wrong there ...On Wed, 30 Dec 2003, Dave Cramer wrote:
search for create index took 59 seconds ?
I've got a fairly (< 1 second for the same search) fast search engine on
the docs athttp://postgresintl.com/search?query=create index
if that link doesn't work, try
postgres.fastcrypt.com/search?query=create index
for now you will have to type it, I'm working on indexing it then making
it prettyDave
On Tue, 2003-12-30 at 22:39, D. Dante Lorenso wrote:
Marc G. Fournier wrote:
On Mon, 29 Dec 2003, D. Dante Lorenso wrote:
Trying to use the 'search' in the docs section of PostgreSQL.org
is extremely SLOW. Considering this is a website for a database
and databases are supposed to be good for indexing content, I'd
expect a much faster performance.What is the full URL for the page you are looking at? Just the 'search
link' at the top of the page?Perhaps this should be worked on?
Looking into it right now ...
http://www.postgresql.org/ *click Docs on top of page*
http://www.postgresql.org/docs/ * click PostgreSQL static
documentation *Search this document set: [ SECURITY INVOKER ] Search!
I loaded that URL on IE and I wait like 2 minutes or more for a response.
then, it usually returns with 1 result. I click the Search! button again
to refresh and it came back a little faster with 0 results?Searched again from the top and it's a little faster now:
* click search *
date
Wed Dec 31 22:52:01 CST 2003
* results come back *
date
Wed Dec 31 22:52:27 CST 2003
Still one result:
PostgreSQL 7.4 Documentation (SQL Key Words)
<http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html>
[*0.087%*]
http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html
Size: 65401 bytes, modified: Tue, 25 Nov 2003, 15:02:33 ASTHowever, the page that I SHOULD have found was this one:
http://www.postgresql.org/docs/current/static/sql-createfunction.html
That page has SECURITY INVOKER in a whole section:
[EXTERNAL] SECURITY INVOKER
[EXTERNAL] SECURITY DEFINERSECURITY INVOKER indicates that the function is to be executed with
the privileges of the user that calls it. That is the default.
SECURITY DEFINER specifies that the function is to be executed with
the privileges of the user that created it.Dante
----------
D. Dante Lorenso
dante@lorenso.com---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly--
Dave Cramer
519 939 0336
ICQ # 1467551----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
--
Dave Cramer
519 939 0336
ICQ # 1467551
On Wed, 31 Dec 2003, Dave Cramer wrote:
Why are their multiple servers hitting the same db
what servers are searching through the db?
www.postgresql.org and archives.postgresql.org both hit the same DB ...
the point is more that whatever alternative that someone can suggest, it
has to be able to be accessed centrally from several different machines
... when I just tried a search, I was the only one hitting the database,
and the search was dreadful, so it isn't a problem with multiple
connections :(
Just as an FYI, the database server has sufficient RAM on her, so it isn't
a swapping issue ... swap usuage right now, after 77 days uptime:
Device 1K-blocks Used Avail Capacity Type
/dev/da0s1b 8388480 17556 8370924 0% Interleaved
Dave
On Wed, 2003-12-31 at 00:04, Marc G. Fournier wrote:does anyone know anything better then mnogosearch, that works with
PostgreSQL, for doing indexing? the database server is a Dual Xeon 2.4G,
4G of RAM, and a load avg right now of a lowly 1.5 ... the file system is
3x72G drive in a RAID5 configuration, and the database server is 7.4 ...
the mnogosearch folk use mysql for their development, so its possible
there is something they are doing that is slowing this process down, to
compensate for a fault in mysql, but this is ridiculous ...note that I have it setup with what the mnogosearch folk lists as being
'the fastest schema for large indexes' or 'crc-multi' ...right now, we're running only 373k docs:
isvr5# indexer -S
Database statistics
Status Expired Total
-----------------------------
415 0 311 Unsupported Media Type
302 0 1171 Moved Temporarily
502 0 43 Bad Gateway
414 0 3 Request-URI Too Long
301 0 307 Moved Permanently
404 0 1960 Not found
410 0 1 Gone
401 0 51 Unauthorized
304 0 16591 Not Modified
200 0 373015 OK
504 0 48 Gateway Timeout
400 0 3 Bad Request
0 2 47 Not indexed yet
-----------------------------
Total 2 393551and a vacuum analyze runs nightly ...
anyone with suggestions/ideas? has to be something client/server, like
mnogosearch, as we're dealing with multiple servers searching against the
same database ... so I don't *think* that ht/Dig is a solution, but may be
wrong there ...On Wed, 30 Dec 2003, Dave Cramer wrote:
search for create index took 59 seconds ?
I've got a fairly (< 1 second for the same search) fast search engine on
the docs athttp://postgresintl.com/search?query=create index
if that link doesn't work, try
postgres.fastcrypt.com/search?query=create index
for now you will have to type it, I'm working on indexing it then making
it prettyDave
On Tue, 2003-12-30 at 22:39, D. Dante Lorenso wrote:
Marc G. Fournier wrote:
On Mon, 29 Dec 2003, D. Dante Lorenso wrote:
Trying to use the 'search' in the docs section of PostgreSQL.org
is extremely SLOW. Considering this is a website for a database
and databases are supposed to be good for indexing content, I'd
expect a much faster performance.What is the full URL for the page you are looking at? Just the 'search
link' at the top of the page?Perhaps this should be worked on?
Looking into it right now ...
http://www.postgresql.org/ *click Docs on top of page*
http://www.postgresql.org/docs/ * click PostgreSQL static
documentation *Search this document set: [ SECURITY INVOKER ] Search!
I loaded that URL on IE and I wait like 2 minutes or more for a response.
then, it usually returns with 1 result. I click the Search! button again
to refresh and it came back a little faster with 0 results?Searched again from the top and it's a little faster now:
* click search *
date
Wed Dec 31 22:52:01 CST 2003
* results come back *
date
Wed Dec 31 22:52:27 CST 2003
Still one result:
PostgreSQL 7.4 Documentation (SQL Key Words)
<http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html>
[*0.087%*]
http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html
Size: 65401 bytes, modified: Tue, 25 Nov 2003, 15:02:33 ASTHowever, the page that I SHOULD have found was this one:
http://www.postgresql.org/docs/current/static/sql-createfunction.html
That page has SECURITY INVOKER in a whole section:
[EXTERNAL] SECURITY INVOKER
[EXTERNAL] SECURITY DEFINERSECURITY INVOKER indicates that the function is to be executed with
the privileges of the user that calls it. That is the default.
SECURITY DEFINER specifies that the function is to be executed with
the privileges of the user that created it.Dante
----------
D. Dante Lorenso
dante@lorenso.com---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly--
Dave Cramer
519 939 0336
ICQ # 1467551----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664--
Dave Cramer
519 939 0336
ICQ # 1467551
----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
I can modify mine to be client server if you want?
It is a java app, so we need to be able to run jdk1.3 at least?
Dave
On Wed, 2003-12-31 at 00:04, Marc G. Fournier wrote:
does anyone know anything better then mnogosearch, that works with
PostgreSQL, for doing indexing? the database server is a Dual Xeon 2.4G,
4G of RAM, and a load avg right now of a lowly 1.5 ... the file system is
3x72G drive in a RAID5 configuration, and the database server is 7.4 ...
the mnogosearch folk use mysql for their development, so its possible
there is something they are doing that is slowing this process down, to
compensate for a fault in mysql, but this is ridiculous ...note that I have it setup with what the mnogosearch folk lists as being
'the fastest schema for large indexes' or 'crc-multi' ...right now, we're running only 373k docs:
isvr5# indexer -S
Database statistics
Status Expired Total
-----------------------------
415 0 311 Unsupported Media Type
302 0 1171 Moved Temporarily
502 0 43 Bad Gateway
414 0 3 Request-URI Too Long
301 0 307 Moved Permanently
404 0 1960 Not found
410 0 1 Gone
401 0 51 Unauthorized
304 0 16591 Not Modified
200 0 373015 OK
504 0 48 Gateway Timeout
400 0 3 Bad Request
0 2 47 Not indexed yet
-----------------------------
Total 2 393551and a vacuum analyze runs nightly ...
anyone with suggestions/ideas? has to be something client/server, like
mnogosearch, as we're dealing with multiple servers searching against the
same database ... so I don't *think* that ht/Dig is a solution, but may be
wrong there ...On Wed, 30 Dec 2003, Dave Cramer wrote:
search for create index took 59 seconds ?
I've got a fairly (< 1 second for the same search) fast search engine on
the docs athttp://postgresintl.com/search?query=create index
if that link doesn't work, try
postgres.fastcrypt.com/search?query=create index
for now you will have to type it, I'm working on indexing it then making
it prettyDave
On Tue, 2003-12-30 at 22:39, D. Dante Lorenso wrote:
Marc G. Fournier wrote:
On Mon, 29 Dec 2003, D. Dante Lorenso wrote:
Trying to use the 'search' in the docs section of PostgreSQL.org
is extremely SLOW. Considering this is a website for a database
and databases are supposed to be good for indexing content, I'd
expect a much faster performance.What is the full URL for the page you are looking at? Just the 'search
link' at the top of the page?Perhaps this should be worked on?
Looking into it right now ...
http://www.postgresql.org/ *click Docs on top of page*
http://www.postgresql.org/docs/ * click PostgreSQL static
documentation *Search this document set: [ SECURITY INVOKER ] Search!
I loaded that URL on IE and I wait like 2 minutes or more for a response.
then, it usually returns with 1 result. I click the Search! button again
to refresh and it came back a little faster with 0 results?Searched again from the top and it's a little faster now:
* click search *
date
Wed Dec 31 22:52:01 CST 2003
* results come back *
date
Wed Dec 31 22:52:27 CST 2003
Still one result:
PostgreSQL 7.4 Documentation (SQL Key Words)
<http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html>
[*0.087%*]
http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html
Size: 65401 bytes, modified: Tue, 25 Nov 2003, 15:02:33 ASTHowever, the page that I SHOULD have found was this one:
http://www.postgresql.org/docs/current/static/sql-createfunction.html
That page has SECURITY INVOKER in a whole section:
[EXTERNAL] SECURITY INVOKER
[EXTERNAL] SECURITY DEFINERSECURITY INVOKER indicates that the function is to be executed with
the privileges of the user that calls it. That is the default.
SECURITY DEFINER specifies that the function is to be executed with
the privileges of the user that created it.Dante
----------
D. Dante Lorenso
dante@lorenso.com---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly--
Dave Cramer
519 939 0336
ICQ # 1467551----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
--
Dave Cramer
519 939 0336
ICQ # 1467551
On Wed, 31 Dec 2003, Dave Cramer wrote:
I can modify mine to be client server if you want?
It is a java app, so we need to be able to run jdk1.3 at least?
jdk1.4 is available on the VMs ... does your spider? for instance, you
mention that you have the docs indexed right now, but we are currently
indexing:
Server http://archives.postgresql.org/
Server http://advocacy.postgresql.org/
Server http://developer.postgresql.org/
Server http://gborg.postgresql.org/
Server http://pgadmin.postgresql.org/
Server http://techdocs.postgresql.org/
Server http://www.postgresql.org/
will it be able to handle:
186_archives=# select count(*) from url;
count
--------
393551
(1 row)
as fast as you are finding with just the docs?
----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
Hello,
Why are we not using Tsearch2?
Besides the obvious of getting everything into the database?
Sincerely,
Joshua D. Drake
On Tue, 2003-12-30 at 21:24, Marc G. Fournier wrote:
On Wed, 31 Dec 2003, Dave Cramer wrote:
Why are their multiple servers hitting the same db
what servers are searching through the db?
www.postgresql.org and archives.postgresql.org both hit the same DB ...
the point is more that whatever alternative that someone can suggest, it
has to be able to be accessed centrally from several different machines
... when I just tried a search, I was the only one hitting the database,
and the search was dreadful, so it isn't a problem with multiple
connections :(Just as an FYI, the database server has sufficient RAM on her, so it isn't
a swapping issue ... swap usuage right now, after 77 days uptime:Device 1K-blocks Used Avail Capacity Type
/dev/da0s1b 8388480 17556 8370924 0% InterleavedDave
On Wed, 2003-12-31 at 00:04, Marc G. Fournier wrote:does anyone know anything better then mnogosearch, that works with
PostgreSQL, for doing indexing? the database server is a Dual Xeon 2.4G,
4G of RAM, and a load avg right now of a lowly 1.5 ... the file system is
3x72G drive in a RAID5 configuration, and the database server is 7.4 ...
the mnogosearch folk use mysql for their development, so its possible
there is something they are doing that is slowing this process down, to
compensate for a fault in mysql, but this is ridiculous ...note that I have it setup with what the mnogosearch folk lists as being
'the fastest schema for large indexes' or 'crc-multi' ...right now, we're running only 373k docs:
isvr5# indexer -S
Database statistics
Status Expired Total
-----------------------------
415 0 311 Unsupported Media Type
302 0 1171 Moved Temporarily
502 0 43 Bad Gateway
414 0 3 Request-URI Too Long
301 0 307 Moved Permanently
404 0 1960 Not found
410 0 1 Gone
401 0 51 Unauthorized
304 0 16591 Not Modified
200 0 373015 OK
504 0 48 Gateway Timeout
400 0 3 Bad Request
0 2 47 Not indexed yet
-----------------------------
Total 2 393551and a vacuum analyze runs nightly ...
anyone with suggestions/ideas? has to be something client/server, like
mnogosearch, as we're dealing with multiple servers searching against the
same database ... so I don't *think* that ht/Dig is a solution, but may be
wrong there ...On Wed, 30 Dec 2003, Dave Cramer wrote:
search for create index took 59 seconds ?
I've got a fairly (< 1 second for the same search) fast search engine on
the docs athttp://postgresintl.com/search?query=create index
if that link doesn't work, try
postgres.fastcrypt.com/search?query=create index
for now you will have to type it, I'm working on indexing it then making
it prettyDave
On Tue, 2003-12-30 at 22:39, D. Dante Lorenso wrote:
Marc G. Fournier wrote:
On Mon, 29 Dec 2003, D. Dante Lorenso wrote:
Trying to use the 'search' in the docs section of PostgreSQL.org
is extremely SLOW. Considering this is a website for a database
and databases are supposed to be good for indexing content, I'd
expect a much faster performance.What is the full URL for the page you are looking at? Just the 'search
link' at the top of the page?Perhaps this should be worked on?
Looking into it right now ...
http://www.postgresql.org/ *click Docs on top of page*
http://www.postgresql.org/docs/ * click PostgreSQL static
documentation *Search this document set: [ SECURITY INVOKER ] Search!
I loaded that URL on IE and I wait like 2 minutes or more for a response.
then, it usually returns with 1 result. I click the Search! button again
to refresh and it came back a little faster with 0 results?Searched again from the top and it's a little faster now:
* click search *
date
Wed Dec 31 22:52:01 CST 2003
* results come back *
date
Wed Dec 31 22:52:27 CST 2003
Still one result:
PostgreSQL 7.4 Documentation (SQL Key Words)
<http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html>
[*0.087%*]
http://www.postgresql.org/docs/7.4/static/sql-keywords-appendix.html
Size: 65401 bytes, modified: Tue, 25 Nov 2003, 15:02:33 ASTHowever, the page that I SHOULD have found was this one:
http://www.postgresql.org/docs/current/static/sql-createfunction.html
That page has SECURITY INVOKER in a whole section:
[EXTERNAL] SECURITY INVOKER
[EXTERNAL] SECURITY DEFINERSECURITY INVOKER indicates that the function is to be executed with
the privileges of the user that calls it. That is the default.
SECURITY DEFINER specifies that the function is to be executed with
the privileges of the user that created it.Dante
----------
D. Dante Lorenso
dante@lorenso.com---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly--
Dave Cramer
519 939 0336
ICQ # 1467551----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664--
Dave Cramer
519 939 0336
ICQ # 1467551----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings
--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL
Marc,
At our website we had a "in database" search as well... It was terribly
slow (it was a custom built vector space model implemented in mysql+php
so that explains a bit).
We replaced it by the Xapian library (www.xapian.org) with its Omega
frontend as a middle end. I.e. we call with our php-scripts the omega
search frontend and postprocess the results with the scripts (some
rights double checks and so on), from the results we build a very simpel
SELECT ... FROM documents ... WHERE docid IN implode($docids_array)
(you understand enough php to understand this, I suppose)
With our 10GB of tekst, we have a 14GB (uncompressed, 9G compressed
orso) xapian database (the largest part is for the 6.7G positional
table), I'm pretty sure that if we'd store that information in something
like tsearch it'd be more than that 14GB...
Searches take less than a second (unless you do phrase searches of
course, that takes a few seconds and sometimes a few minutes).
I did a query on 'ext3 undelete' just a few minutes ago and it did the
search in 827150 documents in only 0.027 (a second run 0.006) seconds
(ext3 was found in 753 and undelete in 360 documents). Of course that is
excluding the results parsing, the total time to create the webpage was
"much" longer (0.43 seconds orso) due to the fact that the results
needs to be transferred via xinetd and the results needs to be extracted
from mysql (which is terrible with the "search supporting queries" we
issue :/ ) Our search machine is very similar the machine you use as
database, but it doesn't do much heavy work apart from running the
xapian/omega search combination.
If you are interested in this, I can provide (much) more information
about our implementation. Since you don't need right-checks, you could
even get away with just the omega front end all by itself (it has a nice
scripting language, but can't interface with anything but xapian).
The main advantage of taking this out of your sql database is that it
runs on its own custom built storage system (and you could offload it to
another machine, like we did).
Btw, if you really need an "in database" solution, read back the
postings of Eric Ridge at 26-12-2003 20:54 on the hackers list (he's
working on integrating xapian in postgresql as a FTI)
Best regards,
Arjen van der Meijden
Marc G. Fournier wrote:
Show quoted text
does anyone know anything better then mnogosearch, that works with
PostgreSQL, for doing indexing? the database server is a Dual Xeon 2.4G,
4G of RAM, and a load avg right now of a lowly 1.5 ... the file system is
3x72G drive in a RAID5 configuration, and the database server is 7.4 ...
the mnogosearch folk use mysql for their development, so its possible
there is something they are doing that is slowing this process down, to
compensate for a fault in mysql, but this is ridiculous ...note that I have it setup with what the mnogosearch folk lists as being
'the fastest schema for large indexes' or 'crc-multi' ...right now, we're running only 373k docs:
isvr5# indexer -S
Database statistics
Status Expired Total
-----------------------------
415 0 311 Unsupported Media Type
302 0 1171 Moved Temporarily
502 0 43 Bad Gateway
414 0 3 Request-URI Too Long
301 0 307 Moved Permanently
404 0 1960 Not found
410 0 1 Gone
401 0 51 Unauthorized
304 0 16591 Not Modified
200 0 373015 OK
504 0 48 Gateway Timeout
400 0 3 Bad Request
0 2 47 Not indexed yet
-----------------------------
Total 2 393551and a vacuum analyze runs nightly ...
anyone with suggestions/ideas? has to be something client/server, like
mnogosearch, as we're dealing with multiple servers searching against the
same database ... so I don't *think* that ht/Dig is a solution, but may be
wrong there ...
Marc,
No it doesn't spider, it is a specialized tool for searching documents.
I'm curious, what value is there to being able to count the number of
url's ?
It does do things like query all documents where CREATE AND TABLE are n
words apart, just as fast, I would think these are more valuable to
document searching?
I think the challenge here is what do we want to search. I am betting
that folks use this page as they would man? ie. what is the command for
create trigger?
As I said my offer stands to help out, but I think if the goal is to
search the entire website, then this particular tool is not useful.
At this point I am working on indexing the sgml directly as it has less
cruft in it. For instance all the links that appear in every summary are
just noise.
Dave
On Wed, 2003-12-31 at 00:44, Marc G. Fournier wrote:
On Wed, 31 Dec 2003, Dave Cramer wrote:
I can modify mine to be client server if you want?
It is a java app, so we need to be able to run jdk1.3 at least?
jdk1.4 is available on the VMs ... does your spider? for instance, you
mention that you have the docs indexed right now, but we are currently
indexing:Server http://archives.postgresql.org/
Server http://advocacy.postgresql.org/
Server http://developer.postgresql.org/
Server http://gborg.postgresql.org/
Server http://pgadmin.postgresql.org/
Server http://techdocs.postgresql.org/
Server http://www.postgresql.org/will it be able to handle:
186_archives=# select count(*) from url;
count
--------
393551
(1 row)as fast as you are finding with just the docs?
----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
--
Dave Cramer
519 939 0336
ICQ # 1467551
I think that Oleg's new search offering looks really good and fast. (I
can't wait till I have some task that needs tsearch!).
I agree with Dave that searching the docs is more important for me than
the sites - but it would be really nice to have both, in one tool.
I built something similar for the Tate Gallery in the UK - here you can
select the type of content that you want returned, either static pages or
dynamic. You can see the idea at
http://www.tate.org.uk/search/default.jsp?terms=sunset%20oil&action=new
This is custom built (using java/Oracle), supports stemming, boolean
operators, exact phrase matching, relevancy and matched term highlighting.
You can switch on/off the types of documents that you are not interested
in. Using this analogy, a search facility that could offer you results
from i) the docs and/or ii) the postgres sites static pages would be very
useful.
John Sidney-Woollett
Dave Cramer said:
Show quoted text
Marc,
No it doesn't spider, it is a specialized tool for searching documents.
I'm curious, what value is there to being able to count the number of
url's ?It does do things like query all documents where CREATE AND TABLE are n
words apart, just as fast, I would think these are more valuable to
document searching?I think the challenge here is what do we want to search. I am betting
that folks use this page as they would man? ie. what is the command for
create trigger?As I said my offer stands to help out, but I think if the goal is to
search the entire website, then this particular tool is not useful.At this point I am working on indexing the sgml directly as it has less
cruft in it. For instance all the links that appear in every summary are
just noise.Dave
On Wed, 2003-12-31 at 00:44, Marc G. Fournier wrote:
On Wed, 31 Dec 2003, Dave Cramer wrote:
I can modify mine to be client server if you want?
It is a java app, so we need to be able to run jdk1.3 at least?
jdk1.4 is available on the VMs ... does your spider? for instance, you
mention that you have the docs indexed right now, but we are currently
indexing:Server http://archives.postgresql.org/
Server http://advocacy.postgresql.org/
Server http://developer.postgresql.org/
Server http://gborg.postgresql.org/
Server http://pgadmin.postgresql.org/
Server http://techdocs.postgresql.org/
Server http://www.postgresql.org/will it be able to handle:
186_archives=# select count(*) from url;
count
--------
393551
(1 row)as fast as you are finding with just the docs?
----
Marc G. Fournier Hub.Org Networking Services
(http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ:
7615664--
Dave Cramer
519 939 0336
ICQ # 1467551---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match
You should probably take a look at the Swish project. For a certain
project, we tried Tsearch2/Tsearch, even (gasp) MySQL fulltext search,
but with over 600,000 documents to index, both took too long to conduct
searches, especially as the database was swapped in and out of memory
based on search segment. MySQL full text was the most unusable.
Swish uses its own internal DB format, and comes with a simple spider as
well. You can make it search by category, date and other nifty criteria
also.
http://swish-e.org
You can take a look over at the project and do some searches to see what
I mean:
http://cbd-net.com
Warmest regards,
Ericson Smith
Tracking Specialist/DBA
+-----------------------+----------------------------+
| http://www.did-it.com | "When I'm paid, I always |
| eric@did-it.com | follow the job through. |
| 516-255-0500 | You know that." -Angel Eyes|
+-----------------------+----------------------------+
John Sidney-Woollett wrote:
Show quoted text
I think that Oleg's new search offering looks really good and fast. (I
can't wait till I have some task that needs tsearch!).I agree with Dave that searching the docs is more important for me than
the sites - but it would be really nice to have both, in one tool.I built something similar for the Tate Gallery in the UK - here you can
select the type of content that you want returned, either static pages or
dynamic. You can see the idea at
http://www.tate.org.uk/search/default.jsp?terms=sunset%20oil&action=newThis is custom built (using java/Oracle), supports stemming, boolean
operators, exact phrase matching, relevancy and matched term highlighting.You can switch on/off the types of documents that you are not interested
in. Using this analogy, a search facility that could offer you results
from i) the docs and/or ii) the postgres sites static pages would be very
useful.John Sidney-Woollett
Dave Cramer said:
Marc,
No it doesn't spider, it is a specialized tool for searching documents.
I'm curious, what value is there to being able to count the number of
url's ?It does do things like query all documents where CREATE AND TABLE are n
words apart, just as fast, I would think these are more valuable to
document searching?I think the challenge here is what do we want to search. I am betting
that folks use this page as they would man? ie. what is the command for
create trigger?As I said my offer stands to help out, but I think if the goal is to
search the entire website, then this particular tool is not useful.At this point I am working on indexing the sgml directly as it has less
cruft in it. For instance all the links that appear in every summary are
just noise.Dave
On Wed, 2003-12-31 at 00:44, Marc G. Fournier wrote:
On Wed, 31 Dec 2003, Dave Cramer wrote:
I can modify mine to be client server if you want?
It is a java app, so we need to be able to run jdk1.3 at least?
jdk1.4 is available on the VMs ... does your spider? for instance, you
mention that you have the docs indexed right now, but we are currently
indexing:Server http://archives.postgresql.org/
Server http://advocacy.postgresql.org/
Server http://developer.postgresql.org/
Server http://gborg.postgresql.org/
Server http://pgadmin.postgresql.org/
Server http://techdocs.postgresql.org/
Server http://www.postgresql.org/will it be able to handle:
186_archives=# select count(*) from url;
count
--------
393551
(1 row)as fast as you are finding with just the docs?
----
Marc G. Fournier Hub.Org Networking Services
(http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ:
7615664--
Dave Cramer
519 939 0336
ICQ # 1467551---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
Wow, you're right - I could have probably saved myself a load of time! :)
Although you do learn a lot reinventing the wheel... ...or at least you
hit the same issues and insights others did before...
John
Ericson Smith said:
Show quoted text
You should probably take a look at the Swish project. For a certain
project, we tried Tsearch2/Tsearch, even (gasp) MySQL fulltext search,
but with over 600,000 documents to index, both took too long to conduct
searches, especially as the database was swapped in and out of memory
based on search segment. MySQL full text was the most unusable.Swish uses its own internal DB format, and comes with a simple spider as
well. You can make it search by category, date and other nifty criteria
also.
http://swish-e.orgYou can take a look over at the project and do some searches to see what
I mean:
http://cbd-net.comWarmest regards, Ericson Smith Tracking Specialist/DBA +-----------------------+----------------------------+ | http://www.did-it.com | "When I'm paid, I always | | eric@did-it.com | follow the job through. | | 516-255-0500 | You know that." -Angel Eyes| +-----------------------+----------------------------+John Sidney-Woollett wrote:
I think that Oleg's new search offering looks really good and fast. (I
can't wait till I have some task that needs tsearch!).I agree with Dave that searching the docs is more important for me than
the sites - but it would be really nice to have both, in one tool.I built something similar for the Tate Gallery in the UK - here you can
select the type of content that you want returned, either static pages or
dynamic. You can see the idea at
http://www.tate.org.uk/search/default.jsp?terms=sunset%20oil&action=newThis is custom built (using java/Oracle), supports stemming, boolean
operators, exact phrase matching, relevancy and matched term
highlighting.You can switch on/off the types of documents that you are not interested
in. Using this analogy, a search facility that could offer you results
from i) the docs and/or ii) the postgres sites static pages would be very
useful.John Sidney-Woollett
Dave Cramer said:
Marc,
No it doesn't spider, it is a specialized tool for searching documents.
I'm curious, what value is there to being able to count the number of
url's ?It does do things like query all documents where CREATE AND TABLE are n
words apart, just as fast, I would think these are more valuable to
document searching?I think the challenge here is what do we want to search. I am betting
that folks use this page as they would man? ie. what is the command for
create trigger?As I said my offer stands to help out, but I think if the goal is to
search the entire website, then this particular tool is not useful.At this point I am working on indexing the sgml directly as it has less
cruft in it. For instance all the links that appear in every summary are
just noise.Dave
On Wed, 2003-12-31 at 00:44, Marc G. Fournier wrote:
On Wed, 31 Dec 2003, Dave Cramer wrote:
I can modify mine to be client server if you want?
It is a java app, so we need to be able to run jdk1.3 at least?
jdk1.4 is available on the VMs ... does your spider? for instance, you
mention that you have the docs indexed right now, but we are currently
indexing:Server http://archives.postgresql.org/
Server http://advocacy.postgresql.org/
Server http://developer.postgresql.org/
Server http://gborg.postgresql.org/
Server http://pgadmin.postgresql.org/
Server http://techdocs.postgresql.org/
Server http://www.postgresql.org/will it be able to handle:
186_archives=# select count(*) from url;
count
--------
393551
(1 row)as fast as you are finding with just the docs?
----
Marc G. Fournier Hub.Org Networking Services
(http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ:
7615664--
Dave Cramer
519 939 0336
ICQ # 1467551---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if
your
joining column's datatypes do not match---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
The search engine I am using is lucene
http://jakarta.apache.org/lucene/docs/index.html
it too uses it's own internal database format, optimized for searching,
it is quite flexible, and allow searching on arbitrary fields as well.
The section on querying explains more
http://jakarta.apache.org/lucene/docs/queryparsersyntax.html
It is even possible to index text data inside a database.
Dave
On Wed, 2003-12-31 at 08:44, John Sidney-Woollett wrote:
Wow, you're right - I could have probably saved myself a load of time! :)
Although you do learn a lot reinventing the wheel... ...or at least you
hit the same issues and insights others did before...John
Ericson Smith said:
You should probably take a look at the Swish project. For a certain
project, we tried Tsearch2/Tsearch, even (gasp) MySQL fulltext search,
but with over 600,000 documents to index, both took too long to conduct
searches, especially as the database was swapped in and out of memory
based on search segment. MySQL full text was the most unusable.Swish uses its own internal DB format, and comes with a simple spider as
well. You can make it search by category, date and other nifty criteria
also.
http://swish-e.orgYou can take a look over at the project and do some searches to see what
I mean:
http://cbd-net.comWarmest regards, Ericson Smith Tracking Specialist/DBA +-----------------------+----------------------------+ | http://www.did-it.com | "When I'm paid, I always | | eric@did-it.com | follow the job through. | | 516-255-0500 | You know that." -Angel Eyes| +-----------------------+----------------------------+John Sidney-Woollett wrote:
I think that Oleg's new search offering looks really good and fast. (I
can't wait till I have some task that needs tsearch!).I agree with Dave that searching the docs is more important for me than
the sites - but it would be really nice to have both, in one tool.I built something similar for the Tate Gallery in the UK - here you can
select the type of content that you want returned, either static pages or
dynamic. You can see the idea at
http://www.tate.org.uk/search/default.jsp?terms=sunset%20oil&action=newThis is custom built (using java/Oracle), supports stemming, boolean
operators, exact phrase matching, relevancy and matched term
highlighting.You can switch on/off the types of documents that you are not interested
in. Using this analogy, a search facility that could offer you results
from i) the docs and/or ii) the postgres sites static pages would be very
useful.John Sidney-Woollett
Dave Cramer said:
Marc,
No it doesn't spider, it is a specialized tool for searching documents.
I'm curious, what value is there to being able to count the number of
url's ?It does do things like query all documents where CREATE AND TABLE are n
words apart, just as fast, I would think these are more valuable to
document searching?I think the challenge here is what do we want to search. I am betting
that folks use this page as they would man? ie. what is the command for
create trigger?As I said my offer stands to help out, but I think if the goal is to
search the entire website, then this particular tool is not useful.At this point I am working on indexing the sgml directly as it has less
cruft in it. For instance all the links that appear in every summary are
just noise.Dave
On Wed, 2003-12-31 at 00:44, Marc G. Fournier wrote:
On Wed, 31 Dec 2003, Dave Cramer wrote:
I can modify mine to be client server if you want?
It is a java app, so we need to be able to run jdk1.3 at least?
jdk1.4 is available on the VMs ... does your spider? for instance, you
mention that you have the docs indexed right now, but we are currently
indexing:Server http://archives.postgresql.org/
Server http://advocacy.postgresql.org/
Server http://developer.postgresql.org/
Server http://gborg.postgresql.org/
Server http://pgadmin.postgresql.org/
Server http://techdocs.postgresql.org/
Server http://www.postgresql.org/will it be able to handle:
186_archives=# select count(*) from url;
count
--------
393551
(1 row)as fast as you are finding with just the docs?
----
Marc G. Fournier Hub.Org Networking Services
(http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ:
7615664--
Dave Cramer
519 939 0336
ICQ # 1467551---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if
your
joining column's datatypes do not match---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
--
Dave Cramer
519 939 0336
ICQ # 1467551
Well it appears there are quite a few solutions to use so the next
question should be what are we trying to accomplish here?
One thing that I think is that the documentation search should be
limited to the documentation.
Who is in a position to make the decision of which solution to use?
Dave
On Wed, 2003-12-31 at 08:44, John Sidney-Woollett wrote:
Wow, you're right - I could have probably saved myself a load of time! :)
Although you do learn a lot reinventing the wheel... ...or at least you
hit the same issues and insights others did before...John
Ericson Smith said:
You should probably take a look at the Swish project. For a certain
project, we tried Tsearch2/Tsearch, even (gasp) MySQL fulltext search,
but with over 600,000 documents to index, both took too long to conduct
searches, especially as the database was swapped in and out of memory
based on search segment. MySQL full text was the most unusable.Swish uses its own internal DB format, and comes with a simple spider as
well. You can make it search by category, date and other nifty criteria
also.
http://swish-e.orgYou can take a look over at the project and do some searches to see what
I mean:
http://cbd-net.comWarmest regards, Ericson Smith Tracking Specialist/DBA +-----------------------+----------------------------+ | http://www.did-it.com | "When I'm paid, I always | | eric@did-it.com | follow the job through. | | 516-255-0500 | You know that." -Angel Eyes| +-----------------------+----------------------------+John Sidney-Woollett wrote:
I think that Oleg's new search offering looks really good and fast. (I
can't wait till I have some task that needs tsearch!).I agree with Dave that searching the docs is more important for me than
the sites - but it would be really nice to have both, in one tool.I built something similar for the Tate Gallery in the UK - here you can
select the type of content that you want returned, either static pages or
dynamic. You can see the idea at
http://www.tate.org.uk/search/default.jsp?terms=sunset%20oil&action=newThis is custom built (using java/Oracle), supports stemming, boolean
operators, exact phrase matching, relevancy and matched term
highlighting.You can switch on/off the types of documents that you are not interested
in. Using this analogy, a search facility that could offer you results
from i) the docs and/or ii) the postgres sites static pages would be very
useful.John Sidney-Woollett
Dave Cramer said:
Marc,
No it doesn't spider, it is a specialized tool for searching documents.
I'm curious, what value is there to being able to count the number of
url's ?It does do things like query all documents where CREATE AND TABLE are n
words apart, just as fast, I would think these are more valuable to
document searching?I think the challenge here is what do we want to search. I am betting
that folks use this page as they would man? ie. what is the command for
create trigger?As I said my offer stands to help out, but I think if the goal is to
search the entire website, then this particular tool is not useful.At this point I am working on indexing the sgml directly as it has less
cruft in it. For instance all the links that appear in every summary are
just noise.Dave
On Wed, 2003-12-31 at 00:44, Marc G. Fournier wrote:
On Wed, 31 Dec 2003, Dave Cramer wrote:
I can modify mine to be client server if you want?
It is a java app, so we need to be able to run jdk1.3 at least?
jdk1.4 is available on the VMs ... does your spider? for instance, you
mention that you have the docs indexed right now, but we are currently
indexing:Server http://archives.postgresql.org/
Server http://advocacy.postgresql.org/
Server http://developer.postgresql.org/
Server http://gborg.postgresql.org/
Server http://pgadmin.postgresql.org/
Server http://techdocs.postgresql.org/
Server http://www.postgresql.org/will it be able to handle:
186_archives=# select count(*) from url;
count
--------
393551
(1 row)as fast as you are finding with just the docs?
----
Marc G. Fournier Hub.Org Networking Services
(http://www.hub.org)
Email: scrappy@hub.org Yahoo!: yscrappy ICQ:
7615664--
Dave Cramer
519 939 0336
ICQ # 1467551---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if
your
joining column's datatypes do not match---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
--
Dave Cramer
519 939 0336
ICQ # 1467551