integration of fulltext search in bytea/docs

Started by Radek Novotnýover 16 years ago8 messagesgeneral
Jump to latest
#1Radek Novotný
radek.novotny@mediawork.cz

Hello,
is there in the roadmap of postgre integration of fulltext searching in
documents saved in blobs (bytea)?

For example linux antiword can export fine text output that can be inserted into
varchar field.

Would be very very nice (postgre users can be proud to be first) to save
documents into bytea and search that field via to_tsvector, to_tsquery ...

--

www.publicstream.cz - vytvořili jsme produkt pro živé přenosy (online streaming)
a videoarchív ve formátu FLASH.

Bc. Radek Novotný
jednatel Mediawork group s.r.o.

tel.: +420 724 020 361
email: radek.novotny@mediawork.cz
http://www.mediawork.cz

#2Sam Mason
sam@samason.me.uk
In reply to: Radek Novotný (#1)
Re: integration of fulltext search in bytea/docs

On Wed, Jul 29, 2009 at 04:46:43PM +0200, Radek Novotnnn wrote:

is there in the roadmap of postgre integration of fulltext searching in
documents saved in blobs (bytea)?

Do you mean bytea or large-objects?

Would be very very nice (postgre users can be proud to be first) to save
documents into bytea and search that field via to_tsvector, to_tsquery ...

This seems easy; for large objects, just use lo_export() to dump the
blob out to the filesystem, and then use something like pl/perl to run
antiword on it, saving the results to another file and then returning
the file line-by-line as a SETOF TEXT (I think this is the best way of
handling things in case the resulting text file is enormous anyway). If
this code was called "runfilter" we can use it like:

UPDATE myfiles f SET tsidx = (
SELECT ts_accum(to_tsvector(t))
FROM runfilter(f.loid) t);

Where we've defined ts_accum to be:

CREATE AGGREGATE ts_accum (tsvector) (
SFUNC = tsvector_concat,
STYPE = tsvector,
INITCOND = ''
);

bytea is different because you know when the values has changed (i.e.
write a trigger) but you need to write more code to get the bytea value
out into the filesystem.

--
Sam http://samason.me.uk/

#3Pavel Stehule
pavel.stehule@gmail.com
In reply to: Radek Novotný (#1)
Re: integration of fulltext search in bytea/docs

Hello

Dne 29. červenec 2009 16:46 Radek Novotný <radek.novotny@mediawork.cz>
napsal(a):

Hello,
is there in the roadmap of postgre integration of fulltext searching in
documents saved in blobs (bytea)?

What I know, no. PostgreSQL doesn't know about others binary formats,
so it cannot do it.

For example linux antiword can export fine text output that can be inserted
into varchar field.

I understand it well. Three years ago, we used antiword. We stored two
values - bytea - original word doc, and text - antiword output. The
overhead is minimal, and this solution worked very well.

Would be very very nice (postgre users can be proud to be first) to save
documents into bytea and search that field via to_tsvector, to_tsquery ...

It's should be very slow. You have to do repeated transformation.

Pavel Stehule

Show quoted text

--

www.publicstream.cz - vytvořili jsme produkt pro živé přenosy (online
streaming) a videoarchív ve formátu FLASH.

Bc. Radek Novotný
jednatel Mediawork group s.r.o.

tel.: +420 724 020 361
email: radek.novotny@mediawork.cz
http://www.mediawork.cz

#4Radek Novotný
radek.novotny@mediawork.cz
In reply to: Pavel Stehule (#3)
Re: integration of fulltext search in bytea/docs

Is there possible to create pg trigger that runs shell script?
--

www.publicstream.cz - vytvořili jsme produkt pro živé přenosy (online streaming)
a videoarchív ve formátu FLASH.

Bc. Radek Novotný
jednatel Mediawork group s.r.o.

tel.: +420 724 020 361
email: radek.novotny@mediawork.cz
http://www.mediawork.cz

Dne St 29. července 2009 17:38:31 Pavel Stehule napsal(a):

Show quoted text

Hello

Dne 29. červenec 2009 16:46 Radek Novotný <radek.novotny@mediawork.cz>

napsal(a):

Hello,
is there in the roadmap of postgre integration of fulltext searching in
documents saved in blobs (bytea)?

What I know, no. PostgreSQL doesn't know about others binary formats,
so it cannot do it.

For example linux antiword can export fine text output that can be
inserted into varchar field.

I understand it well. Three years ago, we used antiword. We stored two
values - bytea - original word doc, and text - antiword output. The
overhead is minimal, and this solution worked very well.

Would be very very nice (postgre users can be proud to be first) to save
documents into bytea and search that field via to_tsvector, to_tsquery
...

It's should be very slow. You have to do repeated transformation.

Pavel Stehule

--

www.publicstream.cz - vytvořili jsme produkt pro živé přenosy (online
streaming) a videoarchív ve formátu FLASH.

Bc. Radek Novotný
jednatel Mediawork group s.r.o.

tel.: +420 724 020 361
email: radek.novotny@mediawork.cz
http://www.mediawork.cz

#5Sam Mason
sam@samason.me.uk
In reply to: Radek Novotný (#4)
Re: integration of fulltext search in bytea/docs

On Thu, Jul 30, 2009 at 03:09:12PM +0200, Radek Novotnnn wrote:

Is there possible to create pg trigger that runs shell script?

Yes, pl/perl can do this.

--
Sam http://samason.me.uk/

#6Michael Glaesemann
grzm@seespotcode.net
In reply to: Radek Novotný (#4)
Re: integration of fulltext search in bytea/docs

On Jul 30, 2009, at 9:09 , Radek Novotný wrote:

Is there possible to create pg trigger that runs shell script?

[Please don't top post.]

Yes. You can use an untrusted language such as pl/perlu to run system
commands.

Michael Glaesemann
grzm seespotcode net

#7A. Kretschmer
andreas.kretschmer@schollglas.com
In reply to: Radek Novotný (#4)
Re: integration of fulltext search in bytea/docs

In response to Radek Novotn� :

Is there possible to create pg trigger that runs shell script?

Sure, use an untrusted language for the trigger-function.

Andreas
--
Andreas Kretschmer
Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Glaesemann (#6)
Re: integration of fulltext search in bytea/docs

Michael Glaesemann <grzm@seespotcode.net> writes:

On Jul 30, 2009, at 9:09 , Radek Novotn� wrote:

Is there possible to create pg trigger that runs shell script?

Yes. You can use an untrusted language such as pl/perlu to run system
commands.

The fact that you can do it doesn't make it a good idea ...

If you do this, you'll need to consider what happens if the database
transaction rolls back after calling your trigger. The effects in the
filesystem are still there, but the effects in the database aren't.

regards, tom lane