[PATCH] Expand character set for ltree labels
Dear hackers,
I am submitting a patch to expand the label requirements for ltree.
The current format is restricted to alphanumeric characters, plus _.
Unfortunately, for non-English labels, this set is insufficient. Rather
than figure out how to expand this set to include characters beyond the
ASCII limit, I have instead opted to provide users with some mechanism for
storing encoded UTF-8 characters which is widely used: punycode (
https://en.wikipedia.org/wiki/Punycode).
The punycode range of characters is the exact same set as the existing
ltree range, with the addition of a hyphen (-). Within this system, any
human language can be encoded using just A-Za-z0-9-.
On top of this, I added support for two more characters: # and ;, which are
used for HTML entities. Note that & and % have special significance in the
existing ltree logic; users would have to encode items as #20; (rather than
%20). This seems a fair compromise.
Since the encoding could make a regular slug even longer, I have also
doubled the character limit, from 256 to 512.
Please let me know if I can provide any more information or changes.
Very sincerely,
Garen
Attachments:
0001-Expand-character-set-for-ltree-labels.patchapplication/octet-stream; name=0001-Expand-character-set-for-ltree-labels.patchDownload
From 0fcef15b15879c73ff3e93b492fe02fe6ea7628f Mon Sep 17 00:00:00 2001
From: "Garen J. Torikian" <gjtorikian@users.noreply.github.com>
Date: Tue, 4 Oct 2022 12:45:12 -0400
Subject: [PATCH] Expand character set for ltree labels This patch expands the
character set for ltree labels to include four additional characters. These
characters can be used to represent non-alphanumeric characters, via punycode
or HTML encoding.
Furthermore, the label length is doubled to account for potentially
longer strings when saving an encoded format.
---
contrib/ltree/expected/ltree.out | 72 ++++++++++++++++++++++++--------
contrib/ltree/ltree.h | 13 ++++--
contrib/ltree/ltree_io.c | 10 ++---
contrib/ltree/ltxtquery_io.c | 4 +-
contrib/ltree/sql/ltree.sql | 19 ++++++---
doc/src/sgml/ltree.sgml | 6 +--
6 files changed, 87 insertions(+), 37 deletions(-)
diff --git a/contrib/ltree/expected/ltree.out b/contrib/ltree/expected/ltree.out
index c6d8f3ef75..6581c32b28 100644
--- a/contrib/ltree/expected/ltree.out
+++ b/contrib/ltree/expected/ltree.out
@@ -25,6 +25,24 @@ SELECT '1.2'::ltree;
1.2
(1 row)
+SELECT '1.2.#3'::ltree;
+ ltree
+--------
+ 1.2.#3
+(1 row)
+
+SELECT '1.2.-3'::ltree;
+ ltree
+--------
+ 1.2.-3
+(1 row)
+
+SELECT '1.2.;3'::ltree;
+ ltree
+--------
+ 1.2.;3
+(1 row)
+
SELECT '1.2._3'::ltree;
ltree
--------
@@ -45,15 +63,15 @@ ERROR: ltree syntax error
LINE 1: SELECT '1.2.'::ltree;
^
DETAIL: Unexpected end of input.
-SELECT repeat('x', 255)::ltree;
- repeat
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
- xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+SELECT repeat('x', 511)::ltree;
+ repeat
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
(1 row)
-SELECT repeat('x', 256)::ltree;
+SELECT repeat('x', 512)::ltree;
ERROR: label string is too long
-DETAIL: Label length is 256, must be at most 255, at character 257.
+DETAIL: Label length is 512, must be at most 511, at character 513.
SELECT ltree2text('1.2.3.34.sdf');
ltree2text
--------------
@@ -531,24 +549,24 @@ SELECT '1.2.3|@.4'::lquery;
ERROR: lquery syntax error at character 7
LINE 1: SELECT '1.2.3|@.4'::lquery;
^
-SELECT (repeat('x', 255) || '*@@*')::lquery;
- lquery
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
- xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx@*
+SELECT (repeat('x', 511) || '*@@*')::lquery;
+ lquery
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx@*
(1 row)
-SELECT (repeat('x', 256) || '*@@*')::lquery;
+SELECT (repeat('x', 512) || '*@@*')::lquery;
ERROR: label string is too long
-DETAIL: Label length is 256, must be at most 255, at character 257.
-SELECT ('!' || repeat('x', 255))::lquery;
- lquery
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
- !xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+DETAIL: Label length is 512, must be at most 511, at character 513.
+SELECT ('!' || repeat('x', 511))::lquery;
+ lquery
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ !xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
(1 row)
-SELECT ('!' || repeat('x', 256))::lquery;
+SELECT ('!' || repeat('x', 512))::lquery;
ERROR: label string is too long
-DETAIL: Label length is 256, must be at most 255, at character 258.
+DETAIL: Label length is 512, must be at most 511, at character 514.
SELECT nlevel('1.2.3.4');
nlevel
--------
@@ -1195,6 +1213,24 @@ SELECT 'tree & aw_qw%*'::ltxtquery;
tree & aw_qw%*
(1 row)
+SELECT 'tree & aw#qw%*'::ltxtquery;
+ ltxtquery
+----------------
+ tree & aw#qw%*
+(1 row)
+
+SELECT 'tree & aw-qw%*'::ltxtquery;
+ ltxtquery
+----------------
+ tree & aw-qw%*
+(1 row)
+
+SELECT 'tree & aw;qw%*'::ltxtquery;
+ ltxtquery
+----------------
+ tree & aw;qw%*
+(1 row)
+
SELECT 'ltree.awdfg'::ltree @ '!tree & aWdf@*'::ltxtquery;
?column?
----------
diff --git a/contrib/ltree/ltree.h b/contrib/ltree/ltree.h
index 40aed0ca0c..cd585fa042 100644
--- a/contrib/ltree/ltree.h
+++ b/contrib/ltree/ltree.h
@@ -12,10 +12,10 @@
/*
* We want the maximum length of a label to be encoding-independent, so
- * set it somewhat arbitrarily at 255 characters (not bytes), while using
+ * set it somewhat arbitrarily at 511 characters (not bytes), while using
* uint16 fields to hold the byte length.
*/
-#define LTREE_LABEL_MAX_CHARS 255
+#define LTREE_LABEL_MAX_CHARS 511
/*
* LOWER_NODE used to be defined in the Makefile via the compile flags.
@@ -126,7 +126,14 @@ typedef struct
#define LQUERY_HASNOT 0x01
-#define ISALNUM(x) ( t_isalpha(x) || t_isdigit(x) || ( pg_mblen(x) == 1 && t_iseq((x), '_') ) )
+#define ISPRINT(x) (( pg_mblen(x) == 1 && \
+ (t_iseq((x), '#') || \
+ t_iseq((x), '-') || \
+ t_iseq((x), ';') || \
+ t_iseq((x), '_'))))
+#define ISALNUM(x) ( t_isalpha(x) || t_isdigit(x))
+
+#define ISVALID(x) ( ISALNUM(x) || ISPRINT(x) )
/* full text query */
diff --git a/contrib/ltree/ltree_io.c b/contrib/ltree/ltree_io.c
index 15115cb29f..b1dd7104e0 100644
--- a/contrib/ltree/ltree_io.c
+++ b/contrib/ltree/ltree_io.c
@@ -74,7 +74,7 @@ parse_ltree(const char *buf)
switch (state)
{
case LTPRS_WAITNAME:
- if (ISALNUM(ptr))
+ if (ISVALID(ptr))
{
lptr->start = ptr;
lptr->wlen = 0;
@@ -91,7 +91,7 @@ parse_ltree(const char *buf)
lptr++;
state = LTPRS_WAITNAME;
}
- else if (!ISALNUM(ptr))
+ else if (!ISVALID(ptr))
UNCHAR;
break;
default:
@@ -310,7 +310,7 @@ parse_lquery(const char *buf)
switch (state)
{
case LQPRS_WAITLEVEL:
- if (ISALNUM(ptr))
+ if (ISVALID(ptr))
{
GETVAR(curqlevel) = lptr = (nodeitem *) palloc0(sizeof(nodeitem) * (numOR + 1));
lptr->start = ptr;
@@ -333,7 +333,7 @@ parse_lquery(const char *buf)
UNCHAR;
break;
case LQPRS_WAITVAR:
- if (ISALNUM(ptr))
+ if (ISVALID(ptr))
{
lptr++;
lptr->start = ptr;
@@ -376,7 +376,7 @@ parse_lquery(const char *buf)
state = LQPRS_WAITLEVEL;
curqlevel = NEXTLEV(curqlevel);
}
- else if (ISALNUM(ptr))
+ else if (ISVALID(ptr))
{
/* disallow more chars after a flag */
if (lptr->flag)
diff --git a/contrib/ltree/ltxtquery_io.c b/contrib/ltree/ltxtquery_io.c
index 3eca5cb8ff..dfe641399e 100644
--- a/contrib/ltree/ltxtquery_io.c
+++ b/contrib/ltree/ltxtquery_io.c
@@ -76,7 +76,7 @@ gettoken_query(QPRS_STATE *state, int32 *val, int32 *lenval, char **strval, uint
(state->buf)++;
return OPEN;
}
- else if (ISALNUM(state->buf))
+ else if (ISVALID(state->buf))
{
state->state = INOPERAND;
*strval = state->buf;
@@ -89,7 +89,7 @@ gettoken_query(QPRS_STATE *state, int32 *val, int32 *lenval, char **strval, uint
errmsg("operand syntax error")));
break;
case INOPERAND:
- if (ISALNUM(state->buf))
+ if (ISVALID(state->buf))
{
if (*flag)
ereport(ERROR,
diff --git a/contrib/ltree/sql/ltree.sql b/contrib/ltree/sql/ltree.sql
index bf733ed17b..612c2072dc 100644
--- a/contrib/ltree/sql/ltree.sql
+++ b/contrib/ltree/sql/ltree.sql
@@ -8,6 +8,9 @@ WHERE opc.oid >= 16384 AND NOT amvalidate(opc.oid);
SELECT ''::ltree;
SELECT '1'::ltree;
SELECT '1.2'::ltree;
+SELECT '1.2.#3'::ltree;
+SELECT '1.2.-3'::ltree;
+SELECT '1.2.;3'::ltree;
SELECT '1.2._3'::ltree;
-- empty labels not allowed
@@ -15,8 +18,8 @@ SELECT '.2.3'::ltree;
SELECT '1..3'::ltree;
SELECT '1.2.'::ltree;
-SELECT repeat('x', 255)::ltree;
-SELECT repeat('x', 256)::ltree;
+SELECT repeat('x', 511)::ltree;
+SELECT repeat('x', 512)::ltree;
SELECT ltree2text('1.2.3.34.sdf');
SELECT text2ltree('1.2.3.34.sdf');
@@ -111,10 +114,10 @@ SELECT '1.!.3'::lquery;
SELECT '1.2.!'::lquery;
SELECT '1.2.3|@.4'::lquery;
-SELECT (repeat('x', 255) || '*@@*')::lquery;
-SELECT (repeat('x', 256) || '*@@*')::lquery;
-SELECT ('!' || repeat('x', 255))::lquery;
-SELECT ('!' || repeat('x', 256))::lquery;
+SELECT (repeat('x', 511) || '*@@*')::lquery;
+SELECT (repeat('x', 512) || '*@@*')::lquery;
+SELECT ('!' || repeat('x', 511))::lquery;
+SELECT ('!' || repeat('x', 512))::lquery;
SELECT nlevel('1.2.3.4');
SELECT nlevel(('1' || repeat('.1', 65534))::ltree);
@@ -233,6 +236,10 @@ SELECT 'QWER_GY'::ltree ~ 'q_t%@*';
--ltxtquery
SELECT '!tree & aWdf@*'::ltxtquery;
SELECT 'tree & aw_qw%*'::ltxtquery;
+SELECT 'tree & aw#qw%*'::ltxtquery;
+SELECT 'tree & aw-qw%*'::ltxtquery;
+SELECT 'tree & aw;qw%*'::ltxtquery;
+
SELECT 'ltree.awdfg'::ltree @ '!tree & aWdf@*'::ltxtquery;
SELECT 'tree.awdfg'::ltree @ '!tree & aWdf@*'::ltxtquery;
SELECT 'tree.awdfg'::ltree @ '!tree | aWdf@*'::ltxtquery;
diff --git a/doc/src/sgml/ltree.sgml b/doc/src/sgml/ltree.sgml
index 508f404ae8..144bccd2c5 100644
--- a/doc/src/sgml/ltree.sgml
+++ b/doc/src/sgml/ltree.sgml
@@ -24,9 +24,9 @@
<para>
A <firstterm>label</firstterm> is a sequence of alphanumeric characters
- and underscores (for example, in C locale the characters
- <literal>A-Za-z0-9_</literal> are allowed).
- Labels must be less than 256 characters long.
+ and a subset of printable characters. In C locale, the characters
+ <literal>A-Za-z0-9#-;_</literal> are allowed.
+ Labels must be less than 512 characters long.
</para>
<para>
--
2.37.3
On Tue, Oct 04, 2022 at 12:54:46PM -0400, Garen Torikian wrote:
The punycode range of characters is the exact same set as the existing
ltree range, with the addition of a hyphen (-). Within this system, any
human language can be encoded using just A-Za-z0-9-.
IIUC ASCII characters like '!' and '<' are valid Punycode characters, but
even with your proposal, those wouldn't be allowed.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
No, not quite.
Valid Punycode characters are `[A-Za-z0-9-]`. This proposal includes `-`,
as well as `#` and `;` for HTML entities.
I double-checked the RFC to see the valid Punycode characters and the set
above is indeed correct:
https://datatracker.ietf.org/doc/html/draft-ietf-idn-punycode-02#section-5
While it would be nice for ltree labels to support *any* printable
character, it can't because symbols like `!` and `%` already have special
meaning in the querying. This proposal leaves those as is and does not
depend on any existing special character.
On Tue, Oct 4, 2022 at 6:32 PM Nathan Bossart <nathandbossart@gmail.com>
wrote:
Show quoted text
On Tue, Oct 04, 2022 at 12:54:46PM -0400, Garen Torikian wrote:
The punycode range of characters is the exact same set as the existing
ltree range, with the addition of a hyphen (-). Within this system, any
human language can be encoded using just A-Za-z0-9-.IIUC ASCII characters like '!' and '<' are valid Punycode characters, but
even with your proposal, those wouldn't be allowed.--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Garen Torikian <gjtorikian@gmail.com> writes:
I am submitting a patch to expand the label requirements for ltree.
The current format is restricted to alphanumeric characters, plus _.
Unfortunately, for non-English labels, this set is insufficient.
Hm? Perhaps the docs are a bit unclear about that, but it's not
restricted to ASCII alphanumerics. AFAICS the code will accept
whatever iswalpha() and iswdigit() will accept in the database's
default locale. There's certainly work that could/should be done
to allow use of not-so-default locales, but that's not specific
to ltree. I'm not sure that doing an application-side encoding
is attractive compared to just using that ability directly.
If you do want to do application-side encoding, I'm unsure why
punycode would be the choice anyway, as opposed to something
that can fit in the existing restrictions.
On top of this, I added support for two more characters: # and ;, which are
used for HTML entities.
That seems really pretty random.
regards, tom lane
Hi Tom,
Perhaps the docs are a bit unclear about that, but it's not
restricted to ASCII alphanumerics. AFAICS the code will accept
whatever iswalpha() and iswdigit() will accept in the database's
default locale.
Sorry but I don't think that is correct. Here is the single
definition check of what constitutes a valid character:
https://github.com/postgres/postgres/blob/c3315a7da57be720222b119385ed0f7ad7c15268/contrib/ltree/ltree.h#L129
As you can see, there are no `is_*` calls at all. Where in this contrib
package do you see `iswalpha`? Perhaps I missed it.
That seems really pretty random.
Ok. I am trying to avoid a situation where other users may wish to use
other delimiters other than `-`, due to its commonplace presence in words
(eg., compound ones).
On Wed, Oct 5, 2022 at 2:59 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Show quoted text
Garen Torikian <gjtorikian@gmail.com> writes:
I am submitting a patch to expand the label requirements for ltree.
The current format is restricted to alphanumeric characters, plus _.
Unfortunately, for non-English labels, this set is insufficient.Hm? Perhaps the docs are a bit unclear about that, but it's not
restricted to ASCII alphanumerics. AFAICS the code will accept
whatever iswalpha() and iswdigit() will accept in the database's
default locale. There's certainly work that could/should be done
to allow use of not-so-default locales, but that's not specific
to ltree. I'm not sure that doing an application-side encoding
is attractive compared to just using that ability directly.If you do want to do application-side encoding, I'm unsure why
punycode would be the choice anyway, as opposed to something
that can fit in the existing restrictions.On top of this, I added support for two more characters: # and ;, which
are
used for HTML entities.
That seems really pretty random.
regards, tom lane
Garen Torikian <gjtorikian@gmail.com> writes:
Perhaps the docs are a bit unclear about that, but it's not
restricted to ASCII alphanumerics. AFAICS the code will accept
whatever iswalpha() and iswdigit() will accept in the database's
default locale.
Sorry but I don't think that is correct. Here is the single
definition check of what constitutes a valid character:
https://github.com/postgres/postgres/blob/c3315a7da57be720222b119385ed0f7ad7c15268/contrib/ltree/ltree.h#L129
As you can see, there are no `is_*` calls at all.
Did you chase down what t_isalpha and t_isdigit do?
regards, tom lane
After digging into it, you are completely correct. I had to do a bit more
reading to understand the relationships between UTF-8 and wchar, but
ultimately the existing locale support works for my use case.
Therefore I have updated the patch with three much smaller changes:
* Support for `-` in addition to `_`
* Expanding the limit to 512 chars (from the existing 256); again it's not
uncommon for non-English strings to be much longer
* Fixed the documentation to expand on what the ltree label's relationship
to the DB locale is
Thank you,
Garen
On Wed, Oct 5, 2022 at 3:56 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Show quoted text
Garen Torikian <gjtorikian@gmail.com> writes:
Perhaps the docs are a bit unclear about that, but it's not
restricted to ASCII alphanumerics. AFAICS the code will accept
whatever iswalpha() and iswdigit() will accept in the database's
default locale.Sorry but I don't think that is correct. Here is the single
definition check of what constitutes a valid character:As you can see, there are no `is_*` calls at all.
Did you chase down what t_isalpha and t_isdigit do?
regards, tom lane
Attachments:
0002-Expand-character-set-for-ltree-labels.patchapplication/octet-stream; name=0002-Expand-character-set-for-ltree-labels.patchDownload
commit 3cb57bb601d28b923e0c6c5d0f5a938c9abf68aa
Author: Garen J. Torikian <gjtorikian@users.noreply.github.com>
Date: Wed Oct 5 17:59:54 2022 -0400
Expand character set for ltree labels
This patch expands the character set for ltree labels to one additional character:
the hyphen.
Furthermore, the label length is doubled to account for longer labels.
diff --git a/contrib/ltree/expected/ltree.out b/contrib/ltree/expected/ltree.out
index c6d8f3ef75..0f558bd7b4 100644
--- a/contrib/ltree/expected/ltree.out
+++ b/contrib/ltree/expected/ltree.out
@@ -25,6 +25,12 @@ SELECT '1.2'::ltree;
1.2
(1 row)
+SELECT '1.2.-3'::ltree;
+ ltree
+--------
+ 1.2.-3
+(1 row)
+
SELECT '1.2._3'::ltree;
ltree
--------
@@ -45,15 +51,15 @@ ERROR: ltree syntax error
LINE 1: SELECT '1.2.'::ltree;
^
DETAIL: Unexpected end of input.
-SELECT repeat('x', 255)::ltree;
- repeat
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
- xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+SELECT repeat('x', 511)::ltree;
+ repeat
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
(1 row)
-SELECT repeat('x', 256)::ltree;
+SELECT repeat('x', 512)::ltree;
ERROR: label string is too long
-DETAIL: Label length is 256, must be at most 255, at character 257.
+DETAIL: Label length is 512, must be at most 511, at character 513.
SELECT ltree2text('1.2.3.34.sdf');
ltree2text
--------------
@@ -531,24 +537,24 @@ SELECT '1.2.3|@.4'::lquery;
ERROR: lquery syntax error at character 7
LINE 1: SELECT '1.2.3|@.4'::lquery;
^
-SELECT (repeat('x', 255) || '*@@*')::lquery;
- lquery
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
- xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx@*
+SELECT (repeat('x', 511) || '*@@*')::lquery;
+ lquery
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx@*
(1 row)
-SELECT (repeat('x', 256) || '*@@*')::lquery;
+SELECT (repeat('x', 512) || '*@@*')::lquery;
ERROR: label string is too long
-DETAIL: Label length is 256, must be at most 255, at character 257.
-SELECT ('!' || repeat('x', 255))::lquery;
- lquery
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
- !xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+DETAIL: Label length is 512, must be at most 511, at character 513.
+SELECT ('!' || repeat('x', 511))::lquery;
+ lquery
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ !xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
(1 row)
-SELECT ('!' || repeat('x', 256))::lquery;
+SELECT ('!' || repeat('x', 512))::lquery;
ERROR: label string is too long
-DETAIL: Label length is 256, must be at most 255, at character 258.
+DETAIL: Label length is 512, must be at most 511, at character 514.
SELECT nlevel('1.2.3.4');
nlevel
--------
@@ -1195,6 +1201,12 @@ SELECT 'tree & aw_qw%*'::ltxtquery;
tree & aw_qw%*
(1 row)
+SELECT 'tree & aw-qw%*'::ltxtquery;
+ ltxtquery
+----------------
+ tree & aw-qw%*
+(1 row)
+
SELECT 'ltree.awdfg'::ltree @ '!tree & aWdf@*'::ltxtquery;
?column?
----------
diff --git a/contrib/ltree/ltree.h b/contrib/ltree/ltree.h
index 40aed0ca0c..a9c9f881b0 100644
--- a/contrib/ltree/ltree.h
+++ b/contrib/ltree/ltree.h
@@ -12,10 +12,10 @@
/*
* We want the maximum length of a label to be encoding-independent, so
- * set it somewhat arbitrarily at 255 characters (not bytes), while using
+ * set it somewhat arbitrarily at 511 characters (not bytes), while using
* uint16 fields to hold the byte length.
*/
-#define LTREE_LABEL_MAX_CHARS 255
+#define LTREE_LABEL_MAX_CHARS 511
/*
* LOWER_NODE used to be defined in the Makefile via the compile flags.
@@ -126,7 +126,12 @@ typedef struct
#define LQUERY_HASNOT 0x01
-#define ISALNUM(x) ( t_isalpha(x) || t_isdigit(x) || ( pg_mblen(x) == 1 && t_iseq((x), '_') ) )
+#define ISDASH(x) (( pg_mblen(x) == 1 && \
+ (t_iseq((x), '-') || \
+ t_iseq((x), '_'))))
+#define ISALNUM(x) ( t_isalpha(x) || t_isdigit(x))
+
+#define ISVALID(x) ( ISALNUM(x) || ISDASH(x) )
/* full text query */
diff --git a/contrib/ltree/ltree_io.c b/contrib/ltree/ltree_io.c
index 15115cb29f..b1dd7104e0 100644
--- a/contrib/ltree/ltree_io.c
+++ b/contrib/ltree/ltree_io.c
@@ -74,7 +74,7 @@ parse_ltree(const char *buf)
switch (state)
{
case LTPRS_WAITNAME:
- if (ISALNUM(ptr))
+ if (ISVALID(ptr))
{
lptr->start = ptr;
lptr->wlen = 0;
@@ -91,7 +91,7 @@ parse_ltree(const char *buf)
lptr++;
state = LTPRS_WAITNAME;
}
- else if (!ISALNUM(ptr))
+ else if (!ISVALID(ptr))
UNCHAR;
break;
default:
@@ -310,7 +310,7 @@ parse_lquery(const char *buf)
switch (state)
{
case LQPRS_WAITLEVEL:
- if (ISALNUM(ptr))
+ if (ISVALID(ptr))
{
GETVAR(curqlevel) = lptr = (nodeitem *) palloc0(sizeof(nodeitem) * (numOR + 1));
lptr->start = ptr;
@@ -333,7 +333,7 @@ parse_lquery(const char *buf)
UNCHAR;
break;
case LQPRS_WAITVAR:
- if (ISALNUM(ptr))
+ if (ISVALID(ptr))
{
lptr++;
lptr->start = ptr;
@@ -376,7 +376,7 @@ parse_lquery(const char *buf)
state = LQPRS_WAITLEVEL;
curqlevel = NEXTLEV(curqlevel);
}
- else if (ISALNUM(ptr))
+ else if (ISVALID(ptr))
{
/* disallow more chars after a flag */
if (lptr->flag)
diff --git a/contrib/ltree/ltxtquery_io.c b/contrib/ltree/ltxtquery_io.c
index 3eca5cb8ff..dfe641399e 100644
--- a/contrib/ltree/ltxtquery_io.c
+++ b/contrib/ltree/ltxtquery_io.c
@@ -76,7 +76,7 @@ gettoken_query(QPRS_STATE *state, int32 *val, int32 *lenval, char **strval, uint
(state->buf)++;
return OPEN;
}
- else if (ISALNUM(state->buf))
+ else if (ISVALID(state->buf))
{
state->state = INOPERAND;
*strval = state->buf;
@@ -89,7 +89,7 @@ gettoken_query(QPRS_STATE *state, int32 *val, int32 *lenval, char **strval, uint
errmsg("operand syntax error")));
break;
case INOPERAND:
- if (ISALNUM(state->buf))
+ if (ISVALID(state->buf))
{
if (*flag)
ereport(ERROR,
diff --git a/contrib/ltree/sql/ltree.sql b/contrib/ltree/sql/ltree.sql
index bf733ed17b..373a7c6569 100644
--- a/contrib/ltree/sql/ltree.sql
+++ b/contrib/ltree/sql/ltree.sql
@@ -8,6 +8,7 @@ WHERE opc.oid >= 16384 AND NOT amvalidate(opc.oid);
SELECT ''::ltree;
SELECT '1'::ltree;
SELECT '1.2'::ltree;
+SELECT '1.2.-3'::ltree;
SELECT '1.2._3'::ltree;
-- empty labels not allowed
@@ -15,8 +16,8 @@ SELECT '.2.3'::ltree;
SELECT '1..3'::ltree;
SELECT '1.2.'::ltree;
-SELECT repeat('x', 255)::ltree;
-SELECT repeat('x', 256)::ltree;
+SELECT repeat('x', 511)::ltree;
+SELECT repeat('x', 512)::ltree;
SELECT ltree2text('1.2.3.34.sdf');
SELECT text2ltree('1.2.3.34.sdf');
@@ -111,10 +112,10 @@ SELECT '1.!.3'::lquery;
SELECT '1.2.!'::lquery;
SELECT '1.2.3|@.4'::lquery;
-SELECT (repeat('x', 255) || '*@@*')::lquery;
-SELECT (repeat('x', 256) || '*@@*')::lquery;
-SELECT ('!' || repeat('x', 255))::lquery;
-SELECT ('!' || repeat('x', 256))::lquery;
+SELECT (repeat('x', 511) || '*@@*')::lquery;
+SELECT (repeat('x', 512) || '*@@*')::lquery;
+SELECT ('!' || repeat('x', 511))::lquery;
+SELECT ('!' || repeat('x', 512))::lquery;
SELECT nlevel('1.2.3.4');
SELECT nlevel(('1' || repeat('.1', 65534))::ltree);
@@ -233,6 +234,8 @@ SELECT 'QWER_GY'::ltree ~ 'q_t%@*';
--ltxtquery
SELECT '!tree & aWdf@*'::ltxtquery;
SELECT 'tree & aw_qw%*'::ltxtquery;
+SELECT 'tree & aw-qw%*'::ltxtquery;
+
SELECT 'ltree.awdfg'::ltree @ '!tree & aWdf@*'::ltxtquery;
SELECT 'tree.awdfg'::ltree @ '!tree & aWdf@*'::ltxtquery;
SELECT 'tree.awdfg'::ltree @ '!tree | aWdf@*'::ltxtquery;
diff --git a/doc/src/sgml/ltree.sgml b/doc/src/sgml/ltree.sgml
index 508f404ae8..42f659537f 100644
--- a/doc/src/sgml/ltree.sgml
+++ b/doc/src/sgml/ltree.sgml
@@ -23,10 +23,11 @@
<title>Definitions</title>
<para>
- A <firstterm>label</firstterm> is a sequence of alphanumeric characters
- and underscores (for example, in C locale the characters
- <literal>A-Za-z0-9_</literal> are allowed).
- Labels must be less than 256 characters long.
+ A <firstterm>label</firstterm> is a sequence of alphanumeric characters,
+ underscores, and hyphens. Valid alphanumerics character ranges are dependent on
+ your database locale. For example, in C locale, the characters
+ <literal>A-Za-z0-9-_</literal> are allowed.
+ Labels must be less than 512 characters long.
</para>
<para>
2022年10月6日(木) 7:05 Garen Torikian <gjtorikian@gmail.com>:
After digging into it, you are completely correct. I had to do a bit more reading to understand the relationships between UTF-8 and wchar, but ultimately the existing locale support works for my use case.
Therefore I have updated the patch with three much smaller changes:
* Support for `-` in addition to `_`
* Expanding the limit to 512 chars (from the existing 256); again it's not uncommon for non-English strings to be much longer
* Fixed the documentation to expand on what the ltree label's relationship to the DB locale isThank you,
Garen
This entry was marked as "Needs review" in the CommitFest app but cfbot
reports the patch no longer applies.
We've marked it as "Waiting on Author". As CommitFest 2022-11 is
currently underway, this would be an excellent time update the patch.
Once you think the patchset is ready for review again, you (or any
interested party) can move the patch entry forward by visiting
https://commitfest.postgresql.org/40/3929/
and changing the status to "Needs review".
Thanks
Ian Barwick
On Thu, 6 Oct 2022 at 03:35, Garen Torikian <gjtorikian@gmail.com> wrote:
After digging into it, you are completely correct. I had to do a bit more reading to understand the relationships between UTF-8 and wchar, but ultimately the existing locale support works for my use case.
Therefore I have updated the patch with three much smaller changes:
* Support for `-` in addition to `_`
* Expanding the limit to 512 chars (from the existing 256); again it's not uncommon for non-English strings to be much longer
* Fixed the documentation to expand on what the ltree label's relationship to the DB locale is
The patch does not apply on top of HEAD as in [1]http://cfbot.cputube.org/patch_41_3929.log, please post a rebased patch:
=== Applying patches on top of PostgreSQL commit ID
e351f85418313e97c203c73181757a007dfda6d0 ===
=== applying patch ./0002-Expand-character-set-for-ltree-labels.patch
patching file contrib/ltree/expected/ltree.out
patching file contrib/ltree/ltree.h
Hunk #2 FAILED at 126.
1 out of 2 hunks FAILED -- saving rejects to file contrib/ltree/ltree.h.rej
[1]: http://cfbot.cputube.org/patch_41_3929.log
Regards,
Vignesh
Sure. Rebased onto HEAD.
On Tue, Jan 3, 2023 at 7:27 AM vignesh C <vignesh21@gmail.com> wrote:
Show quoted text
On Thu, 6 Oct 2022 at 03:35, Garen Torikian <gjtorikian@gmail.com> wrote:
After digging into it, you are completely correct. I had to do a bit
more reading to understand the relationships between UTF-8 and wchar, but
ultimately the existing locale support works for my use case.Therefore I have updated the patch with three much smaller changes:
* Support for `-` in addition to `_`
* Expanding the limit to 512 chars (from the existing 256); again it'snot uncommon for non-English strings to be much longer
* Fixed the documentation to expand on what the ltree label's
relationship to the DB locale is
The patch does not apply on top of HEAD as in [1], please post a rebased
patch:
=== Applying patches on top of PostgreSQL commit ID
e351f85418313e97c203c73181757a007dfda6d0 ===
=== applying patch ./0002-Expand-character-set-for-ltree-labels.patch
patching file contrib/ltree/expected/ltree.out
patching file contrib/ltree/ltree.h
Hunk #2 FAILED at 126.
1 out of 2 hunks FAILED -- saving rejects to file contrib/ltree/ltree.h.rej[1] - http://cfbot.cputube.org/patch_41_3929.log
Regards,
Vignesh
Attachments:
0003-Expand-character-set-for-ltree-labels.patchapplication/octet-stream; name=0003-Expand-character-set-for-ltree-labels.patchDownload
From c99abc8ef3d9367bc7442252afdf1e301d71111a Mon Sep 17 00:00:00 2001
From: "Garen J. Torikian" <gjtorikian@users.noreply.github.com>
Date: Tue, 3 Jan 2023 13:51:10 -0500
Subject: [PATCH] Expand character set for ltree labels This patch expands the
character set for ltree labels to one additional character: the hyphen.
Furthermore, the label length is doubled to account for longer labels.
---
contrib/ltree/expected/ltree.out | 48 ++++++++++++++++++++------------
contrib/ltree/ltree.h | 9 ++++--
contrib/ltree/ltree_io.c | 10 +++----
contrib/ltree/ltxtquery_io.c | 4 +--
contrib/ltree/sql/ltree.sql | 15 ++++++----
doc/src/sgml/ltree.sgml | 9 +++---
6 files changed, 57 insertions(+), 38 deletions(-)
diff --git a/contrib/ltree/expected/ltree.out b/contrib/ltree/expected/ltree.out
index b95be71c78..0c61f4fffc 100644
--- a/contrib/ltree/expected/ltree.out
+++ b/contrib/ltree/expected/ltree.out
@@ -25,6 +25,12 @@ SELECT '1.2'::ltree;
1.2
(1 row)
+SELECT '1.2.-3'::ltree;
+ ltree
+--------
+ 1.2.-3
+(1 row)
+
SELECT '1.2._3'::ltree;
ltree
--------
@@ -45,15 +51,15 @@ ERROR: ltree syntax error
LINE 1: SELECT '1.2.'::ltree;
^
DETAIL: Unexpected end of input.
-SELECT repeat('x', 255)::ltree;
- repeat
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
- xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+SELECT repeat('x', 511)::ltree;
+ repeat
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
(1 row)
-SELECT repeat('x', 256)::ltree;
+SELECT repeat('x', 512)::ltree;
ERROR: label string is too long
-DETAIL: Label length is 256, must be at most 255, at character 257.
+DETAIL: Label length is 512, must be at most 511, at character 513.
SELECT ltree2text('1.2.3.34.sdf');
ltree2text
--------------
@@ -531,24 +537,24 @@ SELECT '1.2.3|@.4'::lquery;
ERROR: lquery syntax error at character 7
LINE 1: SELECT '1.2.3|@.4'::lquery;
^
-SELECT (repeat('x', 255) || '*@@*')::lquery;
- lquery
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
- xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx@*
+SELECT (repeat('x', 511) || '*@@*')::lquery;
+ lquery
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx@*
(1 row)
-SELECT (repeat('x', 256) || '*@@*')::lquery;
+SELECT (repeat('x', 512) || '*@@*')::lquery;
ERROR: label string is too long
-DETAIL: Label length is 256, must be at most 255, at character 257.
-SELECT ('!' || repeat('x', 255))::lquery;
- lquery
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
- !xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+DETAIL: Label length is 512, must be at most 511, at character 513.
+SELECT ('!' || repeat('x', 511))::lquery;
+ lquery
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ !xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
(1 row)
-SELECT ('!' || repeat('x', 256))::lquery;
+SELECT ('!' || repeat('x', 512))::lquery;
ERROR: label string is too long
-DETAIL: Label length is 256, must be at most 255, at character 258.
+DETAIL: Label length is 512, must be at most 511, at character 514.
SELECT nlevel('1.2.3.4');
nlevel
--------
@@ -1195,6 +1201,12 @@ SELECT 'tree & aw_qw%*'::ltxtquery;
tree & aw_qw%*
(1 row)
+SELECT 'tree & aw-qw%*'::ltxtquery;
+ ltxtquery
+----------------
+ tree & aw-qw%*
+(1 row)
+
SELECT 'ltree.awdfg'::ltree @ '!tree & aWdf@*'::ltxtquery;
?column?
----------
diff --git a/contrib/ltree/ltree.h b/contrib/ltree/ltree.h
index 2a80a02495..cfcdaba7e9 100644
--- a/contrib/ltree/ltree.h
+++ b/contrib/ltree/ltree.h
@@ -12,10 +12,10 @@
/*
* We want the maximum length of a label to be encoding-independent, so
- * set it somewhat arbitrarily at 255 characters (not bytes), while using
+ * set it somewhat arbitrarily at 511 characters (not bytes), while using
* uint16 fields to hold the byte length.
*/
-#define LTREE_LABEL_MAX_CHARS 255
+#define LTREE_LABEL_MAX_CHARS 511
/*
* LOWER_NODE used to be defined in the Makefile via the compile flags.
@@ -126,7 +126,10 @@ typedef struct
#define LQUERY_HASNOT 0x01
-#define ISALNUM(x) ( t_isalnum(x) || t_iseq(x, '_') )
+#define ISALNUM(x) ( t_isalnum(x) )
+#define ISDASH(x) ( t_iseq(x, '_') || t_iseq(x, '-') )
+
+#define ISVALID(x) ( ISALNUM(x) || ISDASH(x) )
/* full text query */
diff --git a/contrib/ltree/ltree_io.c b/contrib/ltree/ltree_io.c
index f0dd3df511..cd94265352 100644
--- a/contrib/ltree/ltree_io.c
+++ b/contrib/ltree/ltree_io.c
@@ -74,7 +74,7 @@ parse_ltree(const char *buf, struct Node *escontext)
switch (state)
{
case LTPRS_WAITNAME:
- if (ISALNUM(ptr))
+ if (ISVALID(ptr))
{
lptr->start = ptr;
lptr->wlen = 0;
@@ -92,7 +92,7 @@ parse_ltree(const char *buf, struct Node *escontext)
lptr++;
state = LTPRS_WAITNAME;
}
- else if (!ISALNUM(ptr))
+ else if (!ISVALID(ptr))
UNCHAR;
break;
default:
@@ -316,7 +316,7 @@ parse_lquery(const char *buf, struct Node *escontext)
switch (state)
{
case LQPRS_WAITLEVEL:
- if (ISALNUM(ptr))
+ if (ISVALID(ptr))
{
GETVAR(curqlevel) = lptr = (nodeitem *) palloc0(sizeof(nodeitem) * (numOR + 1));
lptr->start = ptr;
@@ -339,7 +339,7 @@ parse_lquery(const char *buf, struct Node *escontext)
UNCHAR;
break;
case LQPRS_WAITVAR:
- if (ISALNUM(ptr))
+ if (ISVALID(ptr))
{
lptr++;
lptr->start = ptr;
@@ -385,7 +385,7 @@ parse_lquery(const char *buf, struct Node *escontext)
state = LQPRS_WAITLEVEL;
curqlevel = NEXTLEV(curqlevel);
}
- else if (ISALNUM(ptr))
+ else if (ISVALID(ptr))
{
/* disallow more chars after a flag */
if (lptr->flag)
diff --git a/contrib/ltree/ltxtquery_io.c b/contrib/ltree/ltxtquery_io.c
index a16e577303..96a9de0bd9 100644
--- a/contrib/ltree/ltxtquery_io.c
+++ b/contrib/ltree/ltxtquery_io.c
@@ -80,7 +80,7 @@ gettoken_query(QPRS_STATE *state, int32 *val, int32 *lenval, char **strval, uint
(state->buf)++;
return OPEN;
}
- else if (ISALNUM(state->buf))
+ else if (ISVALID(state->buf))
{
state->state = INOPERAND;
*strval = state->buf;
@@ -93,7 +93,7 @@ gettoken_query(QPRS_STATE *state, int32 *val, int32 *lenval, char **strval, uint
errmsg("operand syntax error")));
break;
case INOPERAND:
- if (ISALNUM(state->buf))
+ if (ISVALID(state->buf))
{
if (*flag)
ereturn(state->escontext, ERR,
diff --git a/contrib/ltree/sql/ltree.sql b/contrib/ltree/sql/ltree.sql
index eabef4f851..2f2fda88ee 100644
--- a/contrib/ltree/sql/ltree.sql
+++ b/contrib/ltree/sql/ltree.sql
@@ -8,6 +8,7 @@ WHERE opc.oid >= 16384 AND NOT amvalidate(opc.oid);
SELECT ''::ltree;
SELECT '1'::ltree;
SELECT '1.2'::ltree;
+SELECT '1.2.-3'::ltree;
SELECT '1.2._3'::ltree;
-- empty labels not allowed
@@ -15,8 +16,8 @@ SELECT '.2.3'::ltree;
SELECT '1..3'::ltree;
SELECT '1.2.'::ltree;
-SELECT repeat('x', 255)::ltree;
-SELECT repeat('x', 256)::ltree;
+SELECT repeat('x', 511)::ltree;
+SELECT repeat('x', 512)::ltree;
SELECT ltree2text('1.2.3.34.sdf');
SELECT text2ltree('1.2.3.34.sdf');
@@ -111,10 +112,10 @@ SELECT '1.!.3'::lquery;
SELECT '1.2.!'::lquery;
SELECT '1.2.3|@.4'::lquery;
-SELECT (repeat('x', 255) || '*@@*')::lquery;
-SELECT (repeat('x', 256) || '*@@*')::lquery;
-SELECT ('!' || repeat('x', 255))::lquery;
-SELECT ('!' || repeat('x', 256))::lquery;
+SELECT (repeat('x', 511) || '*@@*')::lquery;
+SELECT (repeat('x', 512) || '*@@*')::lquery;
+SELECT ('!' || repeat('x', 511))::lquery;
+SELECT ('!' || repeat('x', 512))::lquery;
SELECT nlevel('1.2.3.4');
SELECT nlevel(('1' || repeat('.1', 65534))::ltree);
@@ -233,6 +234,8 @@ SELECT 'QWER_GY'::ltree ~ 'q_t%@*';
--ltxtquery
SELECT '!tree & aWdf@*'::ltxtquery;
SELECT 'tree & aw_qw%*'::ltxtquery;
+SELECT 'tree & aw-qw%*'::ltxtquery;
+
SELECT 'ltree.awdfg'::ltree @ '!tree & aWdf@*'::ltxtquery;
SELECT 'tree.awdfg'::ltree @ '!tree & aWdf@*'::ltxtquery;
SELECT 'tree.awdfg'::ltree @ '!tree | aWdf@*'::ltxtquery;
diff --git a/doc/src/sgml/ltree.sgml b/doc/src/sgml/ltree.sgml
index 508f404ae8..42f659537f 100644
--- a/doc/src/sgml/ltree.sgml
+++ b/doc/src/sgml/ltree.sgml
@@ -23,10 +23,11 @@
<title>Definitions</title>
<para>
- A <firstterm>label</firstterm> is a sequence of alphanumeric characters
- and underscores (for example, in C locale the characters
- <literal>A-Za-z0-9_</literal> are allowed).
- Labels must be less than 256 characters long.
+ A <firstterm>label</firstterm> is a sequence of alphanumeric characters,
+ underscores, and hyphens. Valid alphanumerics character ranges are dependent on
+ your database locale. For example, in C locale, the characters
+ <literal>A-Za-z0-9-_</literal> are allowed.
+ Labels must be less than 512 characters long.
</para>
<para>
--
2.39.0
On Wed, 4 Jan 2023 at 00:27, Garen Torikian <gjtorikian@gmail.com> wrote:
Sure. Rebased onto HEAD.
There is one more merge conflict, please post a rebased patch:
=== Applying patches on top of PostgreSQL commit ID
eb5ad4ff05fd382ac98cab60b82f7fd6ce4cfeb8 ===
=== applying patch ./0003-Expand-character-set-for-ltree-labels.patch
patching file contrib/ltree/expected/ltree.out
Hunk #1 succeeded at 25 with fuzz 2.
Hunk #2 FAILED at 51.
Hunk #3 FAILED at 537.
Hunk #4 succeeded at 1201 with fuzz 2.
2 out of 4 hunks FAILED -- saving rejects to file
contrib/ltree/expected/ltree.out.rej
Regards,
Vignesh
On 2022-10-05 We 18:05, Garen Torikian wrote:
After digging into it, you are completely correct. I had to do a bit
more reading to understand the relationships between UTF-8 and wchar,
but ultimately the existing locale support works for my use case.Therefore I have updated the patch with three much smaller changes:
* Support for `-` in addition to `_`
* Expanding the limit to 512 chars (from the existing 256); again it's
not uncommon for non-English strings to be much longer
* Fixed the documentation to expand on what the ltree label's
relationship to the DB locale is
Regardless of the punycode issue, allowing hyphens in ltree labels seems
quite reasonable. I haven't reviewed the patch yet, but if it's OK I
intend to commit it.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
Andrew Dunstan <andrew@dunslane.net> writes:
On 2022-10-05 We 18:05, Garen Torikian wrote:
Therefore I have updated the patch with three much smaller changes:
* Support for `-` in addition to `_`
* Expanding the limit to 512 chars (from the existing 256); again it's
not uncommon for non-English strings to be much longer
* Fixed the documentation to expand on what the ltree label's
relationship to the DB locale is
Regardless of the punycode issue, allowing hyphens in ltree labels seems
quite reasonable. I haven't reviewed the patch yet, but if it's OK I
intend to commit it.
No objection to allowing hyphens. If we're going to increase the length
limit, how about using 1000 characters? AFAICS we could even get away
with 10K, but it's probably best to hold a bit or two in reserve in case
we ever want flags or something associated with labels.
(I've not read the patch.)
regards, tom lane
On 2023-01-06 Fr 11:29, Tom Lane wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
Regardless of the punycode issue, allowing hyphens in ltree labels seems
quite reasonable. I haven't reviewed the patch yet, but if it's OK I
intend to commit it.No objection to allowing hyphens. If we're going to increase the length
limit, how about using 1000 characters? AFAICS we could even get away
with 10K, but it's probably best to hold a bit or two in reserve in case
we ever want flags or something associated with labels.
OK, done that way.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com