proposal casting from XML[] to int[], numeric[], text[]

Started by Pavel Stehuleover 18 years ago16 messages
#1Pavel Stehule
pavel.stehule@gmail.com

Hello

Current result from xpath function isn't indexable. It cannot be
problem with possibility cast it to some base types.

CREATE OR REPLACE FUNCTION xml_list_to_int(xml[])
RETURNS int[] AS $$
SELECT ARRAY(SELECT to_number(($1[i])::text,'999999.99')::int
FROM generate_series(1, array_upper($1,1)) g(i))
$$ LANGUAGE SQL IMMUTABLE;

CREATE CAST (xml[] AS int[]) WITH FUNCTION xml_list_to_int(xml[]);

-- now I can build functional index
CREATE INDEX fx ON foo USING
GIN((xpath('//id/text()',objednavka_v_xml)::int[]));

Does anybody know better solution?

Regards
Pavel Stehule

#2Peter Eisentraut
peter_e@gmx.net
In reply to: Pavel Stehule (#1)
Re: proposal casting from XML[] to int[], numeric[], text[]

Am Dienstag, 25. September 2007 schrieb Pavel Stehule:

Current result from xpath function isn't indexable. It cannot be
problem with possibility cast it to some base types.

Nikolay might be able to remind us what happened to the proposed functions
xpath_bool, xpath_text, etc.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

#3Nikolay Samokhvalov
nikolay@samokhvalov.com
In reply to: Peter Eisentraut (#2)
Re: proposal casting from XML[] to int[], numeric[], text[]

The problem with contrib/xml2's xpath_* functions (that return
scalars) was that they are very specific. If XPath expression
evaluation returns array of values (set of XML pieces), but the
function returns only the first, significant information is lost,
while there is no any gain in speed at all.

The key idea was to create only one generic function at the first
stage -- xpath(), returning an array of XML pieces.

We would create wrappers returning int[], bool[], string[], but there
are several issues with such functions:
- if the type of the data located on nodes that match XPath
expression differs from what is expected, what should we do?
- in XML world, if you request for a text under some node, all
descendants should be involved in generating result string (example:
what should be returned for XML like "<em><strong>PostgreSQL</strong>
is a powerful, open source relational database system</em>" if user
requests for text under "em" node? In XML world, the correct answer is
"PostgreSQL is a powerful, open source relational database system" --
concatenation of all strings from the node itself and all its
descendants, in the correct order. Will be this expected for RDBMS
users?).

Regarding GIN indexes, alternative approach would be creating opclass
for xml[], it should be pretty simple (and better than creating
implicit CASTs for xml[]<->int[], xml[]<->bool[], etc). Can we do this
for 8.3 or it's too late? It would be very helpful feature.

Without that, the only way to have indexes is to use functional btree
indexes over XPath expression (smth like "...btree(((xpath('...',
field)[1]::text))" -- pretty ugly construction...)

On 9/25/07, Peter Eisentraut <peter_e@gmx.net> wrote:

Am Dienstag, 25. September 2007 schrieb Pavel Stehule:

Current result from xpath function isn't indexable. It cannot be
problem with possibility cast it to some base types.

Nikolay might be able to remind us what happened to the proposed functions
xpath_bool, xpath_text, etc.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

--
Best regards,
Nikolay

#4Pavel Stehule
pavel.stehule@gmail.com
In reply to: Nikolay Samokhvalov (#3)
Re: proposal casting from XML[] to int[], numeric[], text[]

We would create wrappers returning int[], bool[], string[], but there
are several issues with such functions:
- if the type of the data located on nodes that match XPath
expression differs from what is expected, what should we do?

raise exception

- in XML world, if you request for a text under some node, all
descendants should be involved in generating result string (example:
what should be returned for XML like "<em><strong>PostgreSQL</strong>
is a powerful, open source relational database system</em>" if user
requests for text under "em" node? In XML world, the correct answer is
"PostgreSQL is a powerful, open source relational database system" --
concatenation of all strings from the node itself and all its
descendants, in the correct order. Will be this expected for RDBMS
users?).

It is corect. Or we can disallow any nested elements in casting array.
It's poblem only for text type. Numeric types are clear.

Regarding GIN indexes, alternative approach would be creating opclass
for xml[], it should be pretty simple (and better than creating
implicit CASTs for xml[]<->int[], xml[]<->bool[], etc). Can we do this
for 8.3 or it's too late? It would be very helpful feature.

It's not practic. If I would to use it for functional indexes for
xpath functions I need constructor for xml[], and I have not it
currently:

xpath('/root/id/text()', column)::int[] @< ARRAY[199,2200,222]

Without that, the only way to have indexes is to use functional btree
indexes over XPath expression (smth like "...btree(((xpath('...',
field)[1]::text))" -- pretty ugly construction...)

It's not usefull, if xpath returns more values

Regards
Pavel Stehule

#5Nikolay Samokhvalov
nikolay@samokhvalov.com
In reply to: Pavel Stehule (#4)
Re: proposal casting from XML[] to int[], numeric[], text[]

On 9/28/07, Pavel Stehule <pavel.stehule@gmail.com> wrote:

We would create wrappers returning int[], bool[], string[], but there
are several issues with such functions:
- if the type of the data located on nodes that match XPath
expression differs from what is expected, what should we do?

raise exception

Will it be convenient for cases when there are many different (various
structures) XMLs in one column (no single DTD)?

- in XML world, if you request for a text under some node, all
descendants should be involved in generating result string (example:
what should be returned for XML like "<em><strong>PostgreSQL</strong>
is a powerful, open source relational database system</em>" if user
requests for text under "em" node? In XML world, the correct answer is
"PostgreSQL is a powerful, open source relational database system" --
concatenation of all strings from the node itself and all its
descendants, in the correct order. Will be this expected for RDBMS
users?).

It is corect. Or we can disallow any nested elements in casting array.
It's poblem only for text type. Numeric types are clear.

Actually, casting to numeric types might seem to be odd. But there is
some sense from practical point of view -- it works and that's better
that nothing (like now). But it's too late for 8.3, isn't it?

Regarding GIN indexes, alternative approach would be creating opclass
for xml[], it should be pretty simple (and better than creating
implicit CASTs for xml[]<->int[], xml[]<->bool[], etc). Can we do this
for 8.3 or it's too late? It would be very helpful feature.

It's not practic. If I would to use it for functional indexes for
xpath functions I need constructor for xml[], and I have not it
currently:

xpath('/root/id/text()', column)::int[] @< ARRAY[199,2200,222]

I do not understand. Do you mean that there is no equality comparison
operator for type xml yet?

To implement GIN for xml[] we need to have comparison operator for
xml. Standard says "XML values are not comparable" (subclause 4.2.4 of
the latest draft from wiscorp.com), but without that cannot implement
straight GIN support, what is not good :-/

--
Best regards,
Nikolay

#6Pavel Stehule
pavel.stehule@gmail.com
In reply to: Nikolay Samokhvalov (#5)
Re: proposal casting from XML[] to int[], numeric[], text[]

2007/9/28, Nikolay Samokhvalov <nikolay@samokhvalov.com>:

On 9/28/07, Pavel Stehule <pavel.stehule@gmail.com> wrote:

We would create wrappers returning int[], bool[], string[], but there
are several issues with such functions:
- if the type of the data located on nodes that match XPath
expression differs from what is expected, what should we do?

raise exception

Will it be convenient for cases when there are many different (various
structures) XMLs in one column (no single DTD)?

I don't know

- in XML world, if you request for a text under some node, all
descendants should be involved in generating result string (example:
what should be returned for XML like "<em><strong>PostgreSQL</strong>
is a powerful, open source relational database system</em>" if user
requests for text under "em" node? In XML world, the correct answer is
"PostgreSQL is a powerful, open source relational database system" --
concatenation of all strings from the node itself and all its
descendants, in the correct order. Will be this expected for RDBMS
users?).

It is corect. Or we can disallow any nested elements in casting array.
It's poblem only for text type. Numeric types are clear.

Actually, casting to numeric types might seem to be odd. But there is
some sense from practical point of view -- it works and that's better
that nothing (like now). But it's too late for 8.3, isn't it?

I thing so SQL based casting like my cust functions are relative
simple for adding to core now.

Regarding GIN indexes, alternative approach would be creating opclass
for xml[], it should be pretty simple (and better than creating
implicit CASTs for xml[]<->int[], xml[]<->bool[], etc). Can we do this
for 8.3 or it's too late? It would be very helpful feature.

It's not practic. If I would to use it for functional indexes for
xpath functions I need constructor for xml[], and I have not it
currently:

xpath('/root/id/text()', column)::int[] @< ARRAY[199,2200,222]

I do not understand. Do you mean that there is no equality comparison
operator for type xml yet?

No, I mean some different. Nobody will construct special xml nodes for
quality comparision with xpath function when expect xpath's result as
int[], or float. So when result of xpath is xml[] but is with possible
casting to int[] it's more simple do casting and build index on int[]
because I can search int[].

To implement GIN for xml[] we need to have comparison operator for
xml. Standard says "XML values are not comparable" (subclause 4.2.4 of
the latest draft from wiscorp.com), but without that cannot implement
straight GIN support, what is not good :-/

I belive so xml values are not comparable, but I belive so the are
transferable to some of base types.

Pavel

#7Peter Eisentraut
peter_e@gmx.net
In reply to: Nikolay Samokhvalov (#3)
Re: proposal casting from XML[] to int[], numeric[], text[]

Am Freitag, 28. September 2007 schrieb Nikolay Samokhvalov:

what should be returned for XML like "<em><strong>PostgreSQL</strong>
is a powerful, open source relational database system</em>" if user
requests for text under "em" node? In XML world, the correct answer is
"PostgreSQL �is a powerful, open source relational database system" --
concatenation of all strings from the node itself and all its
descendants, in the correct order. Will be this expected for RDBMS
users?).

Well, if that is the defined behavior for XPath, then that's what we should
do.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

#8Andrew Dunstan
andrew@dunslane.net
In reply to: Peter Eisentraut (#7)
Re: proposal casting from XML[] to int[], numeric[], text[]

Peter Eisentraut wrote:

Am Freitag, 28. September 2007 schrieb Nikolay Samokhvalov:

what should be returned for XML like "<em><strong>PostgreSQL</strong>
is a powerful, open source relational database system</em>" if user
requests for text under "em" node? In XML world, the correct answer is
"PostgreSQL is a powerful, open source relational database system" --
concatenation of all strings from the node itself and all its
descendants, in the correct order. Will be this expected for RDBMS
users?).

Well, if that is the defined behavior for XPath, then that's what we should
do.

The xpath string value of a single node is the concatentation of the
text children of the node and all its children in document order, IIRC.
But that's not what we're dealing with here. xpath() doesn't return a
single node but a node set (or so say the docs). The string value of a
node set is in effect the string value of its first member, which seems
less than useful in this context, or at least no great guide for us.

I think there's probably a good case for a cast from xml[] to text[] if
we don't have one.

cheers

andrew

#9Bruce Momjian
bruce@momjian.us
In reply to: Andrew Dunstan (#8)
Re: proposal casting from XML[] to int[], numeric[], text[]

This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Andrew Dunstan wrote:

Peter Eisentraut wrote:

Am Freitag, 28. September 2007 schrieb Nikolay Samokhvalov:

what should be returned for XML like "<em><strong>PostgreSQL</strong>
is a powerful, open source relational database system</em>" if user
requests for text under "em" node? In XML world, the correct answer is
"PostgreSQL is a powerful, open source relational database system" --
concatenation of all strings from the node itself and all its
descendants, in the correct order. Will be this expected for RDBMS
users?).

Well, if that is the defined behavior for XPath, then that's what we should
do.

The xpath string value of a single node is the concatentation of the
text children of the node and all its children in document order, IIRC.
But that's not what we're dealing with here. xpath() doesn't return a
single node but a node set (or so say the docs). The string value of a
node set is in effect the string value of its first member, which seems
less than useful in this context, or at least no great guide for us.

I think there's probably a good case for a cast from xml[] to text[] if
we don't have one.

cheers

andrew

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#10Nikolay Samokhvalov
samokhvalov@gmail.com
In reply to: Pavel Stehule (#1)
Re: proposal casting from XML[] to int[], numeric[], text[]

On Sep 25, 2007 10:57 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Hello

Current result from xpath function isn't indexable. It cannot be
problem with possibility cast it to some base types.

CREATE OR REPLACE FUNCTION xml_list_to_int(xml[])
RETURNS int[] AS $$
SELECT ARRAY(SELECT to_number(($1[i])::text,'999999.99')::int
FROM generate_series(1, array_upper($1,1)) g(i))
$$ LANGUAGE SQL IMMUTABLE;

CREATE CAST (xml[] AS int[]) WITH FUNCTION xml_list_to_int(xml[]);

-- now I can build functional index
CREATE INDEX fx ON foo USING
GIN((xpath('//id/text()',objednavka_v_xml)::int[]));

Does anybody know better solution?

Alternative (and maybe better) approach would be:
- create comparison functions that work in the same way as string
comparison functions do (currently, it's straight forward since XML is
stored as string);
- do NOT create comparison operators to avoid explicit comparing XML values
(to follow standard ways to work with XML and to avoid possible unexpected
behaviors);
- create opclass based on these functions and, therefore, obtain GIN
indexes support for xml[];
- describe in the docs, that one can use GIN indexes over XPath
expressions, but should be aware that comparison with non-trivial XML
constants have to be used carefully because of possible problems with
whitespaces, etc (in other words, comparison here is doing letter by letter,
as for varchar).

If there are no objections I'll send patch for this.

#11Pavel Stehule
pavel.stehule@gmail.com
In reply to: Nikolay Samokhvalov (#10)
Re: proposal casting from XML[] to int[], numeric[], text[]

On 11/11/2007, Nikolay Samokhvalov <samokhvalov@gmail.com> wrote:

On Sep 25, 2007 10:57 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Hello

Current result from xpath function isn't indexable. It cannot be
problem with possibility cast it to some base types.

CREATE OR REPLACE FUNCTION xml_list_to_int(xml[])
RETURNS int[] AS $$
SELECT ARRAY(SELECT to_number(($1[i])::text,'999999.99')::int
FROM generate_series(1, array_upper($1,1)) g(i))
$$ LANGUAGE SQL IMMUTABLE;

CREATE CAST (xml[] AS int[]) WITH FUNCTION xml_list_to_int(xml[]);

-- now I can build functional index
CREATE INDEX fx ON foo USING
GIN((xpath('//id/text()',objednavka_v_xml)::int[]));

Does anybody know better solution?

Alternative (and maybe better) approach would be:
- create comparison functions that work in the same way as string
comparison functions do (currently, it's straight forward since XML is
stored as string);
- do NOT create comparison operators to avoid explicit comparing XML values
(to follow standard ways to work with XML and to avoid possible unexpected
behaviors);
- create opclass based on these functions and, therefore, obtain GIN
indexes support for xml[];
- describe in the docs, that one can use GIN indexes over XPath
expressions, but should be aware that comparison with non-trivial XML
constants have to be used carefully because of possible problems with
whitespaces, etc (in other words, comparison here is doing letter by letter,
as for varchar).

If there are no objections I'll send patch for this.

It's good proposal. So only this is solution for indexing. I belive so
casting from xml[] to any others (mainly varchar[] and numeric[] can
be usefull.

Regards
Pavel

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nikolay Samokhvalov (#10)
Re: proposal casting from XML[] to int[], numeric[], text[]

"Nikolay Samokhvalov" <samokhvalov@gmail.com> writes:

Alternative (and maybe better) approach would be:
- create comparison functions that work in the same way as string
comparison functions do (currently, it's straight forward since XML is
stored as string);
- do NOT create comparison operators to avoid explicit comparing XML values
(to follow standard ways to work with XML and to avoid possible unexpected
behaviors);
- create opclass based on these functions and, therefore, obtain GIN
indexes support for xml[];

I'm not clear on what you're proposing. There is no such thing as an
opclass with no operators (or at least, not a useful one), so this seems
mutually contradictory.

regards, tom lane

#13Nikolay Samokhvalov
samokhvalov@gmail.com
In reply to: Tom Lane (#12)
Re: proposal casting from XML[] to int[], numeric[], text[]

On Nov 12, 2007 12:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm not clear on what you're proposing. There is no such thing as an
opclass with no operators (or at least, not a useful one), so this seems
mutually contradictory.

regards, tom lane

You're right, that's my mistake, sorry. So, having casting rules seems
to be the only option..

#14Andrew Dunstan
andrew@dunslane.net
In reply to: Nikolay Samokhvalov (#13)
Re: proposal casting from XML[] to int[], numeric[], text[]

Nikolay Samokhvalov wrote:

On Nov 12, 2007 12:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm not clear on what you're proposing. There is no such thing as an
opclass with no operators (or at least, not a useful one), so this seems
mutually contradictory.

regards, tom lane

You're right, that's my mistake, sorry. So, having casting rules seems
to be the only option..

We can already cast as text[], and so we can do this:

andrew=# select
xpath('//foo/text()','<a><foo>1</foo><foo>2</foo></a>')::text[]::int[];
xpath
-------
{1,2}
(1 row)

So why do we desperately need anything extra at all?

cheers

andrew

#15Pavel Stehule
pavel.stehule@gmail.com
In reply to: Andrew Dunstan (#14)
Re: proposal casting from XML[] to int[], numeric[], text[]

You're right, that's my mistake, sorry. So, having casting rules seems
to be the only option..

We can already cast as text[], and so we can do this:

andrew=# select
xpath('//foo/text()','<a><foo>1</foo><foo>2</foo></a>')::text[]::int[];
xpath
-------
{1,2}
(1 row)

So why do we desperately need anything extra at all?

I was blind. My problem was with function index over xml array, that
isn't indexable.

I didn't find multiple casting.

Regards
Pavel

Show quoted text

cheers

andrew

#16Bruce Momjian
bruce@momjian.us
In reply to: Pavel Stehule (#1)
Re: proposal casting from XML[] to int[], numeric[], text[]

Added to TODO:

* Allow xml arrays to be cast to other data types

http://archives.postgresql.org/pgsql-hackers/2007-09/msg00981.php
http://archives.postgresql.org/pgsql-hackers/2007-10/msg00231.php
http://archives.postgresql.org/pgsql-hackers/2007-11/msg00471.php

---------------------------------------------------------------------------

Pavel Stehule wrote:

Hello

Current result from xpath function isn't indexable. It cannot be
problem with possibility cast it to some base types.

CREATE OR REPLACE FUNCTION xml_list_to_int(xml[])
RETURNS int[] AS $$
SELECT ARRAY(SELECT to_number(($1[i])::text,'999999.99')::int
FROM generate_series(1, array_upper($1,1)) g(i))
$$ LANGUAGE SQL IMMUTABLE;

CREATE CAST (xml[] AS int[]) WITH FUNCTION xml_list_to_int(xml[]);

-- now I can build functional index
CREATE INDEX fx ON foo USING
GIN((xpath('//id/text()',objednavka_v_xml)::int[]));

Does anybody know better solution?

Regards
Pavel Stehule

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +