Text search
I have a text column in a table. We store XML in this column. Now I want to
search for tags and values
Example data:
<bank>
<name>Citi Bank</name>
.....
.....
/<bank>
I would like to run the following query:
select * from xxxx where to_tsvector('english',xml_column) @@
to_tsquery('<name>Citi
Bank</name>')
This works fine but it also works for any tag as long as the name 'Citi
Bank' is present.
How do I have to setup my search in order for this to work so I get an exact
match for the tag and value ?
On 16/03/10 10:29, Chris Roffler wrote:
I have a text column in a table. We store XML in this column. Now I want to
search for tags and values
select * from xxxx where to_tsvector('english',xml_column) @@
to_tsquery('<name>Citi
Bank</name>')This works fine but it also works for any tag as long as the name 'Citi
Bank' is present.How do I have to setup my search in order for this to work so I get an exact
match for the tag and value ?
Would you not be better off with an XML data-type if that's how you want
to treat the data?
http://www.postgresql.org/docs/8.4/static/datatype-xml.html
http://www.postgresql.org/docs/8.4/static/functions-xml.html
http://www.postgresql.org/docs/8.4/static/xml2.html
--
Richard Huxton
Archonet Ltd
Richard
I tried all that and you can see it on this thread, there are some
limitations on indexs on xpath work
http://archives.postgresql.org/pgsql-general/2010-03/msg00270.php
<http://archives.postgresql.org/pgsql-general/2010-03/msg00270.php>
On Tue, Mar 16, 2010 at 2:21 PM, Richard Huxton <dev@archonet.com> wrote:
Show quoted text
On 16/03/10 10:29, Chris Roffler wrote:
I have a text column in a table. We store XML in this column. Now I want
to
search for tags and valuesselect * from xxxx where to_tsvector('english',xml_column) @@
to_tsquery('<name>Citi
Bank</name>')This works fine but it also works for any tag as long as the name 'Citi
Bank' is present.How do I have to setup my search in order for this to work so I get an
exact
match for the tag and value ?Would you not be better off with an XML data-type if that's how you want to
treat the data?
http://www.postgresql.org/docs/8.4/static/datatype-xml.html
http://www.postgresql.org/docs/8.4/static/functions-xml.html
http://www.postgresql.org/docs/8.4/static/xml2.html--
Richard Huxton
Archonet Ltd--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 16/03/10 12:36, Chris Roffler wrote:
Richard
I tried all that and you can see it on this thread, there are some
limitations on indexs on xpath workhttp://archives.postgresql.org/pgsql-general/2010-03/msg00270.php
<http://archives.postgresql.org/pgsql-general/2010-03/msg00270.php>
OK - I've read through your other thread and I think you either:
1. Don't want to use a relational database for this (use an XML database
instead)
2. Want to make the data at least slightly relational.
You're trying to index multiple fragments of a value. In your case it
seems to be <name> tags within an XML fragment containing a list of
<attributes>. That's exactly equivalent to trying to get an index search
for '%ABC%' on text.
There's nothing you can do without exposing the structure of your value
to the database.
You could write a custom parser for tsearch so it picked out only the
relevant pieces of XML. That's probably more work than you want though.
You could run an xslt transform over the xml fragments and extract what
you want and then use tsearch to index that, I suppose. Similarly, you
might be able to do the same via xslt and xquery.
Finally, and to my mind most sensibly, if you want to search attributes,
then store attributes. Parse out your XML and have an "attributes" table
(id, name, value, last_changed, changed_by). That's not brilliant
because every value will just be text, but at least each attribute is
its own row.
--
Richard Huxton
Archonet Ltd
On 16/03/10 13:49, Richard Huxton wrote:
You could run an xslt transform over the xml fragments and extract what
you want and then use tsearch to index that, I suppose. Similarly, you
might be able to do the same via xslt and xquery.
Actually, if it's only attribute names you're interested in you could do
it with xpath
Something like (untested):
ALTER TABLE time_series ADD attr_names text;
UPDATE time_series SET attr_names = array_to_string(
xpath('*/Attribute/Name/text()', external_attributes)
,' '
);
CREATE INDEX fti_attr_names ON time_series USING gin(
to_tsvector('simple', attr_names)
);
SELECT * FROM time_series WHERE
to_tsvector('simple', attr_names)
@@
to_tsquery('simple', 'attribute22');
I'd probably just store the tsvector rather than text unless the text is
of some use in itself.
If you plan to do anything with the attributes it'd still be better to
split them out into their own table though.
--
Richard Huxton
Archonet Ltd
Richard
thanks for the pointers .... unfortunately its not just attribute names.
Here is what I am thinking of doing;
In a first step I run a query
SELECT id FROM time_series WHERE
to_tsvector(xml_string)
@@
to_tsquery( anystring );
then I load the actual xml string into memory for each id found and use
xpath to search the document in memory. This will at least use my text index
on the first hit.
Thanks
Chris
On Tue, Mar 16, 2010 at 4:16 PM, Richard Huxton <dev@archonet.com> wrote:
Show quoted text
On 16/03/10 13:49, Richard Huxton wrote:
You could run an xslt transform over the xml fragments and extract what
you want and then use tsearch to index that, I suppose. Similarly, you
might be able to do the same via xslt and xquery.Actually, if it's only attribute names you're interested in you could do it
with xpathSomething like (untested):
ALTER TABLE time_series ADD attr_names text;
UPDATE time_series SET attr_names = array_to_string(
xpath('*/Attribute/Name/text()', external_attributes)
,' '
);CREATE INDEX fti_attr_names ON time_series USING gin(
to_tsvector('simple', attr_names)
);SELECT * FROM time_series WHERE
to_tsvector('simple', attr_names)
@@
to_tsquery('simple', 'attribute22');I'd probably just store the tsvector rather than text unless the text is of
some use in itself.If you plan to do anything with the attributes it'd still be better to
split them out into their own table though.--
Richard Huxton
Archonet Ltd--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general