pgsql: Fix XML tag namespace change inadvertantly missed from previous

Started by Andrew Dunstanover 18 years ago5 messagescomitters
Jump to latest
#1Andrew Dunstan
andrew@dunslane.net

Log Message:
-----------
Fix XML tag namespace change inadvertantly missed from previous fix. Add
regression test for XML names and numeric entities.

Modified Files:
--------------
pgsql/src/backend/tsearch:
wparser_def.c (r1.11 -> r1.12)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/tsearch/wparser_def.c?r1=1.11&r2=1.12)
pgsql/src/test/regress/expected:
tsearch.out (r1.9 -> r1.10)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/test/regress/expected/tsearch.out?r1=1.9&r2=1.10)
pgsql/src/test/regress/sql:
tsearch.sql (r1.4 -> r1.5)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/test/regress/sql/tsearch.sql?r1=1.4&r2=1.5)

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#1)
Re: pgsql: Fix XML tag namespace change inadvertantly missed from previous

adunstan@postgresql.org (Andrew Dunstan) writes:

Fix XML tag namespace change inadvertantly missed from previous fix. Add
regression test for XML names and numeric entities.

Still one gripe:

regression=# select * from ts_debug(' λ λ');
alias | description | token | dictionaries | dictionary | lexemes
---------+--------------------------+---------+--------------+------------+---------
blank | Space symbols | | {} | |
entity | XML entity | λ | {} | |
blank | Space symbols | | {} | |
blank | Space symbols | &# | {} | |
numword | Word, letters and digits | X3BB | {simple} | simple | {x3bb}
blank | Space symbols | ; | {} | |
(6 rows)

Aren't hexadecimal entities supposed to be case-insensitive?

regards, tom lane

#3Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#2)
Re: pgsql: Fix XML tag namespace change inadvertantly missed from previous

Tom Lane wrote:

adunstan@postgresql.org (Andrew Dunstan) writes:

Fix XML tag namespace change inadvertantly missed from previous fix. Add
regression test for XML names and numeric entities.

Still one gripe:

regression=# select * from ts_debug(' λ λ');
alias | description | token | dictionaries | dictionary | lexemes
---------+--------------------------+---------+--------------+------------+---------
blank | Space symbols | | {} | |
entity | XML entity | λ | {} | |
blank | Space symbols | | {} | |
blank | Space symbols | &# | {} | |
numword | Word, letters and digits | X3BB | {simple} | simple | {x3bb}
blank | Space symbols | ; | {} | |
(6 rows)

Aren't hexadecimal entities supposed to be case-insensitive?

The 'x' must be lower case, the hex digits can be upper or lower. The
XML spec says:

CharRef ::= '&#' [0-9]+ ';'
| '&#x' [0-9a-fA-F]+ ';'

cheers

andrew

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#3)
Re: pgsql: Fix XML tag namespace change inadvertantly missed from previous

Andrew Dunstan <andrew@dunslane.net> writes:

Tom Lane wrote:

Aren't hexadecimal entities supposed to be case-insensitive?

The 'x' must be lower case, the hex digits can be upper or lower. The
XML spec says:

But we're also interested in parsing HTML, and upper case X is
allowed in HTML:
http://www.w3.org/TR/REC-html40/charset.html#h-5.3.1

regards, tom lane

#5Andrew Dunstan
andrew@dunslane.net
In reply to: Andrew Dunstan (#3)
Re: pgsql: Fix XML tag namespace change inadvertantly missed from previous

I wrote:

Tom Lane wrote:

Aren't hexadecimal entities supposed to be case-insensitive?

The 'x' must be lower case, the hex digits can be upper or lower. The
XML spec says:

CharRef ::= '&#' [0-9]+ ';'
| '&#x' [0-9a-fA-F]+ ';'

But I also see that the HTML spec allows for 'X' as well as 'x', so I'll
change it.

cheers

andrew