general purpose array_sort

zhjwpku@gmail.com

over 1 year ago

Hi hackers,

per David's suggestion, this patch implements general
purpose array sort.

We can do the following with this patch:

SELECT array_sort('{1.1,3.3,5.5,2.2,4.4,6.6}'::float8[], 'asc');
SELECT array_sort('{abc DEF 123abc,ábc sßs ßss DÉF,ǄxxǄ ǆxxǅ
ǅxxǆ,ȺȺȺ,ⱥⱥⱥ,ⱥȺ}'::text[]);
SELECT array_sort('{abc DEF 123abc,ábc sßs ßss DÉF,ǄxxǄ ǆxxǅ
ǅxxǆ,ȺȺȺ,ⱥⱥⱥ,ⱥȺ}'::text[], 'asc', 'pg_c_utf8');

--
Regards
Junwang Zhao

zhjwpku@gmail.com

over 1 year ago

In reply to: Junwang Zhao (#1)

Re: general purpose array_sort

On Fri, Sep 27, 2024 at 9:15 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

Hi hackers,

per David's suggestion, this patch implements general
purpose array sort.

We can do the following with this patch:

SELECT array_sort('{1.1,3.3,5.5,2.2,4.4,6.6}'::float8[], 'asc');
SELECT array_sort('{abc DEF 123abc,ábc sßs ßss DÉF,ǄxxǄ ǆxxǅ
ǅxxǆ,ȺȺȺ,ⱥⱥⱥ,ⱥȺ}'::text[]);
SELECT array_sort('{abc DEF 123abc,ábc sßs ßss DÉF,ǄxxǄ ǆxxǅ
ǅxxǆ,ȺȺȺ,ⱥⱥⱥ,ⱥȺ}'::text[], 'asc', 'pg_c_utf8');

--
Regards
Junwang Zhao

PFA v2, use COLLATE keyword to supply the collation suggested by
Andreas offlist.

SELECT array_sort('{abc DEF 123abc,ábc sßs ßss DÉF,ǄxxǄ ǆxxǅ
ǅxxǆ,ȺȺȺ,ⱥⱥⱥ,ⱥȺ}'::text[]);
SELECT array_sort('{abc DEF 123abc,ábc sßs ßss DÉF,ǄxxǄ ǆxxǅ
ǅxxǆ,ȺȺȺ,ⱥⱥⱥ,ⱥȺ}'::text[] COLLATE "pg_c_utf8");

I also created a CF entry[1]https://commitfest.postgresql.org/50/5277/ so it can be easily reviewed.

[1]: https://commitfest.postgresql.org/50/5277/

--
Regards
Junwang Zhao

jian.universality@gmail.com

over 1 year ago

In reply to: Junwang Zhao (#2)

Re: general purpose array_sort

On Sat, Sep 28, 2024 at 7:52 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

PFA v2, use COLLATE keyword to supply the collation suggested by
Andreas offlist.

this is better. otherwise we need extra care to handle case like:
SELECT array_sort('{1,3,5,2,4,6}'::int[] COLLATE "pg_c_utf8");

+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>array_sort</primary>
+        </indexterm>
+        <function>array_sort</function> ( <type>anyarray</type>
<optional>, <parameter>dir</parameter> </optional>)
+        <returnvalue>anyarray</returnvalue>
+       </para>
+       <para>
+        Sorts the array in either ascending or descending order.
+        <parameter>dir</parameter> must be <literal>asc</literal>
+        or <literal>desc</literal>. The array must be empty or one-dimensional.
+       </para>
+       <para>
+        <literal>array_sort(ARRAY[1,2,5,6,3,4])</literal>
+        <returnvalue>{1,2,3,4,5,6}</returnvalue>
+       </para></entry>
+      </row>
I am confused with <parameter>dir</parameter>. I guess you want to say
"direction"
But here, I think <parameter>sort_asc</parameter> would be more appropriate?

<parameter>dir</parameter> can have only two potential values, make it
as a boolean would be more easier?
you didn't mention information: "by default, it will sort by
ascending order; the sort collation by default is using the array
element type's collation"

tuplesort_begin_datum can do null-first, null-last, so the
one-dimension array can allow null values.

Based on the above and others, I did some refactoring, feel free to take it.
my changes, changed the function signature, so you need to pay
attention to sql test file.

zhjwpku@gmail.com

over 1 year ago

In reply to: jian he (#3)

Re: general purpose array_sort

On Sat, Sep 28, 2024 at 10:41 PM jian he <jian.universality@gmail.com> wrote:

On Sat, Sep 28, 2024 at 7:52 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

PFA v2, use COLLATE keyword to supply the collation suggested by
Andreas offlist.

this is better. otherwise we need extra care to handle case like:
SELECT array_sort('{1,3,5,2,4,6}'::int[] COLLATE "pg_c_utf8");

+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>array_sort</primary>
+        </indexterm>
+        <function>array_sort</function> ( <type>anyarray</type>
<optional>, <parameter>dir</parameter> </optional>)
+        <returnvalue>anyarray</returnvalue>
+       </para>
+       <para>
+        Sorts the array in either ascending or descending order.
+        <parameter>dir</parameter> must be <literal>asc</literal>
+        or <literal>desc</literal>. The array must be empty or one-dimensional.
+       </para>
+       <para>
+        <literal>array_sort(ARRAY[1,2,5,6,3,4])</literal>
+        <returnvalue>{1,2,3,4,5,6}</returnvalue>
+       </para></entry>
+      </row>
I am confused with <parameter>dir</parameter>. I guess you want to say
"direction"
But here, I think <parameter>sort_asc</parameter> would be more appropriate?

This doc is mostly copied and edited from intarray.sgml sort part.

And the logic is basically the same, you can check the intarray module.

<parameter>dir</parameter> can have only two potential values, make it
as a boolean would be more easier?
you didn't mention information: "by default, it will sort by
ascending order; the sort collation by default is using the array
element type's collation"

tuplesort_begin_datum can do null-first, null-last, so the
one-dimension array can allow null values.

The following(create extension intarry first) will give an error, I
keep the same for array_sort.

SELECT sort('{1234234,-30,234234, null}');

Based on the above and others, I did some refactoring, feel free to take it.
my changes, changed the function signature, so you need to pay
attention to sql test file.

Thanks for your refactor, I will take some in the next version.

--
Regards
Junwang Zhao

david.g.johnston@gmail.com

over 1 year ago

In reply to: Junwang Zhao (#4)

Re: general purpose array_sort

On Sat, Sep 28, 2024 at 7:05 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

On Sat, Sep 28, 2024 at 10:41 PM jian he <jian.universality@gmail.com>
wrote:

<parameter>dir</parameter> can have only two potential values, make it
as a boolean would be more easier?
you didn't mention information: "by default, it will sort by
ascending order; the sort collation by default is using the array
element type's collation"

tuplesort_begin_datum can do null-first, null-last, so the
one-dimension array can allow null values.

The following(create extension intarry first) will give an error, I
keep the same for array_sort.

SELECT sort('{1234234,-30,234234, null}');

I would suggest accepting:
asc
desc
asc nulls first
asc nulls last *
desc nulls first *
desc nulls last

As valid inputs for "dir" - and that the starred options are the defaults
when null position is omitted.

In short, mimic create index.

David J.

zhjwpku@gmail.com

over 1 year ago

In reply to: David G. Johnston (#5)

Re: general purpose array_sort

On Sun, Sep 29, 2024 at 10:51 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:

On Sat, Sep 28, 2024 at 7:05 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

On Sat, Sep 28, 2024 at 10:41 PM jian he <jian.universality@gmail.com> wrote:

<parameter>dir</parameter> can have only two potential values, make it
as a boolean would be more easier?
you didn't mention information: "by default, it will sort by
ascending order; the sort collation by default is using the array
element type's collation"

tuplesort_begin_datum can do null-first, null-last, so the
one-dimension array can allow null values.

The following(create extension intarry first) will give an error, I
keep the same for array_sort.

SELECT sort('{1234234,-30,234234, null}');

I would suggest accepting:
asc
desc
asc nulls first
asc nulls last *
desc nulls first *
desc nulls last

As valid inputs for "dir" - and that the starred options are the defaults when null position is omitted.

In short, mimic create index.

David J.

PFA v3 with David's suggestion addressed.

--
Regards
Junwang Zhao

jian.universality@gmail.com

over 1 year ago

In reply to: Junwang Zhao (#6)

Re: general purpose array_sort

On Mon, Sep 30, 2024 at 1:01 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

I would suggest accepting:
asc
desc
asc nulls first
asc nulls last *
desc nulls first *
desc nulls last

As valid inputs for "dir" - and that the starred options are the defaults when null position is omitted.

In short, mimic create index.

David J.

PFA v3 with David's suggestion addressed.

I think just adding 2 bool arguments (asc/desc, nulls last/not nulls
last) would be easier.
but either way, (i don't have a huge opinion)
but document the second argument, imagine case
SELECT array_sort('{a,B}'::text[] , E'aSc NulLs LaST \t\r\n');
would be tricky?

errmsg("multidimensional arrays sorting are not supported")));
write a sql test to trigger the error message that would be great.

you can add two or one example to collate.icu.utf8.sql to demo that it
actually works with COLLATE collation_name
like:
SELECT array_sort('{a,B}'::text[] COLLATE case_insensitive);
SELECT array_sort('{a,B}'::text[] COLLATE "C");

#define WHITESPACE " \t\n\r"
you may also check function scanner_isspace

+ typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
+ if (typentry == NULL || typentry->type_id != elmtyp)
+ {
+ typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
TYPECACHE_GT_OPR);
+ fcinfo->flinfo->fn_extra = (void *) typentry;
+ }
you need to one-time check typentry->lt_opr or typentry->gt_opr exists?
see CreateStatistics.
            /* Disallow data types without a less-than operator */
            type = lookup_type_cache(attForm->atttypid, TYPECACHE_LT_OPR);
            if (type->lt_opr == InvalidOid)
                ereport(ERROR,
                        (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                         errmsg("column \"%s\" cannot be used in
statistics because its type %s has no default btree operator class",
                                attname, format_type_be(attForm->atttypid))));

zhjwpku@gmail.com

over 1 year ago

In reply to: jian he (#7)

Re: general purpose array_sort

Hi Jian,

On Mon, Sep 30, 2024 at 11:13 PM jian he <jian.universality@gmail.com> wrote:

On Mon, Sep 30, 2024 at 1:01 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

I would suggest accepting:
asc
desc
asc nulls first
asc nulls last *
desc nulls first *
desc nulls last

As valid inputs for "dir" - and that the starred options are the defaults when null position is omitted.

In short, mimic create index.

David J.

PFA v3 with David's suggestion addressed.

I think just adding 2 bool arguments (asc/desc, nulls last/not nulls
last) would be easier.

Yeah, this would be easier, it's just the intarray module use
the direction parameter, I keep it here for the same user
experience, I don't insist if some committer thinks 2 bool arguments
would be a better option.

but either way, (i don't have a huge opinion)
but document the second argument, imagine case
SELECT array_sort('{a,B}'::text[] , E'aSc NulLs LaST \t\r\n');
would be tricky?

The case you provide should give the correct results, but
I doubt users will do this.
I'm not good at document wording, so you might give me some help
with the document part.

errmsg("multidimensional arrays sorting are not supported")));
write a sql test to trigger the error message that would be great.

you can add two or one example to collate.icu.utf8.sql to demo that it
actually works with COLLATE collation_name
like:
SELECT array_sort('{a,B}'::text[] COLLATE case_insensitive);
SELECT array_sort('{a,B}'::text[] COLLATE "C");

Fixed.

#define WHITESPACE " \t\n\r"
you may also check function scanner_isspace

Fixed.

+ typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
+ if (typentry == NULL || typentry->type_id != elmtyp)
+ {
+ typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
TYPECACHE_GT_OPR);
+ fcinfo->flinfo->fn_extra = (void *) typentry;
+ }
you need to one-time check typentry->lt_opr or typentry->gt_opr exists?
see CreateStatistics.
/* Disallow data types without a less-than operator */
type = lookup_type_cache(attForm->atttypid, TYPECACHE_LT_OPR);
if (type->lt_opr == InvalidOid)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("column \"%s\" cannot be used in
statistics because its type %s has no default btree operator class",
attname, format_type_be(attForm->atttypid))));

I added an Assert for this part, not sure if that is enough.

--
Regards
Junwang Zhao

[1]: https://github.com/hlinnaka/colour-datatype/blob/master/colour.c

jian.universality@gmail.com

over 1 year ago

In reply to: Junwang Zhao (#8)

Re: general purpose array_sort

+ typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
+ if (typentry == NULL || typentry->type_id != elmtyp)
+ {
+ typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
TYPECACHE_GT_OPR);
+ fcinfo->flinfo->fn_extra = (void *) typentry;
+ }
you need to one-time check typentry->lt_opr or typentry->gt_opr exists?
see CreateStatistics.
/* Disallow data types without a less-than operator */
type = lookup_type_cache(attForm->atttypid, TYPECACHE_LT_OPR);
if (type->lt_opr == InvalidOid)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("column \"%s\" cannot be used in
statistics because its type %s has no default btree operator class",
attname, format_type_be(attForm->atttypid))));

I added an Assert for this part, not sure if that is enough.

i think it really should be:

if (typentry == NULL || typentry->type_id != elmtyp)
{
typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
TYPECACHE_GT_OPR);
fcinfo->flinfo->fn_extra = (void *) typentry;
if ((sort_asc && !OidIsValid(typentry->lt_opr) || (!sort_as &&
OidIsValid(typentry->gt_opr));
ereport(ERROR,....)
}

Imagine a type that doesn't have TYPECACHE_LT_OPR or TYPECACHE_GT_OPR
then we cannot do the sort, we should just error out.

I just tried this colour type [1]https://github.com/hlinnaka/colour-datatype/blob/master/colour.c with (CREATE TYPE colour (INPUT =
colour_in, OUTPUT = colour_out, LIKE = pg_catalog.int4);

select array_sort('{#FF0000, #FF0000}'::colour[]);
of course it will segfault with your new Assert.

#10

https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F5277

zhjwpku@gmail.com

over 1 year ago

In reply to: jian he (#9)

Re: general purpose array_sort

On Wed, Oct 2, 2024 at 9:51 AM jian he <jian.universality@gmail.com> wrote:

+ typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
+ if (typentry == NULL || typentry->type_id != elmtyp)
+ {
+ typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
TYPECACHE_GT_OPR);
+ fcinfo->flinfo->fn_extra = (void *) typentry;
+ }
you need to one-time check typentry->lt_opr or typentry->gt_opr exists?
see CreateStatistics.
/* Disallow data types without a less-than operator */
type = lookup_type_cache(attForm->atttypid, TYPECACHE_LT_OPR);
if (type->lt_opr == InvalidOid)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("column \"%s\" cannot be used in
statistics because its type %s has no default btree operator class",
attname, format_type_be(attForm->atttypid))));
I added an Assert for this part, not sure if that is enough.
i think it really should be:

if (typentry == NULL || typentry->type_id != elmtyp)
{
typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
TYPECACHE_GT_OPR);
fcinfo->flinfo->fn_extra = (void *) typentry;
if ((sort_asc && !OidIsValid(typentry->lt_opr) || (!sort_as &&
OidIsValid(typentry->gt_opr));
ereport(ERROR,....)
}

Imagine a type that doesn't have TYPECACHE_LT_OPR or TYPECACHE_GT_OPR
then we cannot do the sort, we should just error out.

I just tried this colour type [1] with (CREATE TYPE colour (INPUT =
colour_in, OUTPUT = colour_out, LIKE = pg_catalog.int4);

select array_sort('{#FF0000, #FF0000}'::colour[]);
of course it will segfault with your new Assert.

[1] https://github.com/hlinnaka/colour-datatype/blob/master/colour.c

Make sense, PFA v5 with Jian's suggestion.

--
Regards
Junwang Zhao

#11

Amit Langote

Langote_Amit_f8@lab.ntt.co.jp

over 1 year ago

In reply to: Junwang Zhao (#10)

Re: general purpose array_sort

Hi Junwang,

On Wed, Oct 2, 2024 at 11:46 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

On Wed, Oct 2, 2024 at 9:51 AM jian he <jian.universality@gmail.com> wrote:
+ typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
+ if (typentry == NULL || typentry->type_id != elmtyp)
+ {
+ typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
TYPECACHE_GT_OPR);
+ fcinfo->flinfo->fn_extra = (void *) typentry;
+ }
you need to one-time check typentry->lt_opr or typentry->gt_opr exists?
see CreateStatistics.
/* Disallow data types without a less-than operator */
type = lookup_type_cache(attForm->atttypid, TYPECACHE_LT_OPR);
if (type->lt_opr == InvalidOid)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("column \"%s\" cannot be used in
statistics because its type %s has no default btree operator class",
attname, format_type_be(attForm->atttypid))));
I added an Assert for this part, not sure if that is enough.
i think it really should be:

if (typentry == NULL || typentry->type_id != elmtyp)
{
typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
TYPECACHE_GT_OPR);
fcinfo->flinfo->fn_extra = (void *) typentry;
if ((sort_asc && !OidIsValid(typentry->lt_opr) || (!sort_as &&
OidIsValid(typentry->gt_opr));
ereport(ERROR,....)
}

Imagine a type that doesn't have TYPECACHE_LT_OPR or TYPECACHE_GT_OPR
then we cannot do the sort, we should just error out.

I just tried this colour type [1] with (CREATE TYPE colour (INPUT =
colour_in, OUTPUT = colour_out, LIKE = pg_catalog.int4);

select array_sort('{#FF0000, #FF0000}'::colour[]);
of course it will segfault with your new Assert.

[1] https://github.com/hlinnaka/colour-datatype/blob/master/colour.c
Make sense, PFA v5 with Jian's suggestion.

Have you noticed that the tests have failed on Cirrus CI runs of this patch?

It might be related to the test machines having a different *default*
locale than your local environment, which could result in a different
sort order for the test data. You may need to add an explicit COLLATE
clause to the tests to ensure consistent sorting across systems.

--
Thanks, Amit Langote

#12

zhjwpku@gmail.com

over 1 year ago

In reply to: Amit Langote (#11)

Re: general purpose array_sort

Hi Amit,

On Thu, Oct 3, 2024 at 2:22 PM Amit Langote <amitlangote09@gmail.com> wrote:

Hi Junwang,

On Wed, Oct 2, 2024 at 11:46 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
On Wed, Oct 2, 2024 at 9:51 AM jian he <jian.universality@gmail.com> wrote:
+ typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
+ if (typentry == NULL || typentry->type_id != elmtyp)
+ {
+ typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
TYPECACHE_GT_OPR);
+ fcinfo->flinfo->fn_extra = (void *) typentry;
+ }
you need to one-time check typentry->lt_opr or typentry->gt_opr exists?
see CreateStatistics.
/* Disallow data types without a less-than operator */
type = lookup_type_cache(attForm->atttypid, TYPECACHE_LT_OPR);
if (type->lt_opr == InvalidOid)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("column \"%s\" cannot be used in
statistics because its type %s has no default btree operator class",
attname, format_type_be(attForm->atttypid))));
I added an Assert for this part, not sure if that is enough.
i think it really should be:

if (typentry == NULL || typentry->type_id != elmtyp)
{
typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
TYPECACHE_GT_OPR);
fcinfo->flinfo->fn_extra = (void *) typentry;
if ((sort_asc && !OidIsValid(typentry->lt_opr) || (!sort_as &&
OidIsValid(typentry->gt_opr));
ereport(ERROR,....)
}

Imagine a type that doesn't have TYPECACHE_LT_OPR or TYPECACHE_GT_OPR
then we cannot do the sort, we should just error out.

I just tried this colour type [1] with (CREATE TYPE colour (INPUT =
colour_in, OUTPUT = colour_out, LIKE = pg_catalog.int4);

select array_sort('{#FF0000, #FF0000}'::colour[]);
of course it will segfault with your new Assert.

[1] https://github.com/hlinnaka/colour-datatype/blob/master/colour.c
Make sense, PFA v5 with Jian's suggestion.
Have you noticed that the tests have failed on Cirrus CI runs of this patch?

https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F5277

Sorry for the late reply due to my vacation. I should have paid
more attention to Cirrus CI earlier ;)

It might be related to the test machines having a different *default*
locale than your local environment, which could result in a different
sort order for the test data. You may need to add an explicit COLLATE
clause to the tests to ensure consistent sorting across systems.

I've changed the tests to use just ASCII characters, then added
*COLLATE "C"* to the tests and CI passed, PFA v6.

--
Thanks, Amit Langote

--
Regards
Junwang Zhao

#13

zhjwpku@gmail.com

over 1 year ago

In reply to: Junwang Zhao (#12)

Re: general purpose array_sort

On Wed, Oct 9, 2024 at 10:10 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

Hi Amit,

On Thu, Oct 3, 2024 at 2:22 PM Amit Langote <amitlangote09@gmail.com> wrote:
Hi Junwang,

On Wed, Oct 2, 2024 at 11:46 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
On Wed, Oct 2, 2024 at 9:51 AM jian he <jian.universality@gmail.com> wrote:
+ typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
+ if (typentry == NULL || typentry->type_id != elmtyp)
+ {
+ typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
TYPECACHE_GT_OPR);
+ fcinfo->flinfo->fn_extra = (void *) typentry;
+ }
you need to one-time check typentry->lt_opr or typentry->gt_opr exists?
see CreateStatistics.
/* Disallow data types without a less-than operator */
type = lookup_type_cache(attForm->atttypid, TYPECACHE_LT_OPR);
if (type->lt_opr == InvalidOid)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("column \"%s\" cannot be used in
statistics because its type %s has no default btree operator class",
attname, format_type_be(attForm->atttypid))));
I added an Assert for this part, not sure if that is enough.
i think it really should be:

if (typentry == NULL || typentry->type_id != elmtyp)
{
typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
TYPECACHE_GT_OPR);
fcinfo->flinfo->fn_extra = (void *) typentry;
if ((sort_asc && !OidIsValid(typentry->lt_opr) || (!sort_as &&
OidIsValid(typentry->gt_opr));
ereport(ERROR,....)
}

Imagine a type that doesn't have TYPECACHE_LT_OPR or TYPECACHE_GT_OPR
then we cannot do the sort, we should just error out.

I just tried this colour type [1] with (CREATE TYPE colour (INPUT =
colour_in, OUTPUT = colour_out, LIKE = pg_catalog.int4);

select array_sort('{#FF0000, #FF0000}'::colour[]);
of course it will segfault with your new Assert.

[1] https://github.com/hlinnaka/colour-datatype/blob/master/colour.c
Make sense, PFA v5 with Jian's suggestion.
Have you noticed that the tests have failed on Cirrus CI runs of this patch?

https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F5277
Sorry for the late reply due to my vacation. I should have paid
more attention to Cirrus CI earlier ;)

It might be related to the test machines having a different *default*
locale than your local environment, which could result in a different
sort order for the test data. You may need to add an explicit COLLATE
clause to the tests to ensure consistent sorting across systems.

I've changed the tests to use just ASCII characters, then added
*COLLATE "C"* to the tests and CI passed, PFA v6.

Sadly the CI only passed on my own github repo, it failed on
cfbot[1]https://cirrus-ci.com/task/5815925960605696, will dig into the reason later because I can not open the cirrus
ci page right now ;(

[1]: https://cirrus-ci.com/task/5815925960605696

--
Thanks, Amit Langote

--
Regards
Junwang Zhao

--
Regards
Junwang Zhao

#14

https://api.cirrus-ci.com/v1/artifact/task/5815925960605696/testrun/build/testrun/ssl/001_ssltests/log/001_ssltests_primary.log

zhjwpku@gmail.com

over 1 year ago

In reply to: Junwang Zhao (#13)

Re: general purpose array_sort

On Wed, Oct 9, 2024 at 11:46 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

On Wed, Oct 9, 2024 at 10:10 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
Hi Amit,

On Thu, Oct 3, 2024 at 2:22 PM Amit Langote <amitlangote09@gmail.com> wrote:
Hi Junwang,

On Wed, Oct 2, 2024 at 11:46 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
On Wed, Oct 2, 2024 at 9:51 AM jian he <jian.universality@gmail.com> wrote:
+ typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
+ if (typentry == NULL || typentry->type_id != elmtyp)
+ {
+ typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
TYPECACHE_GT_OPR);
+ fcinfo->flinfo->fn_extra = (void *) typentry;
+ }
you need to one-time check typentry->lt_opr or typentry->gt_opr exists?
see CreateStatistics.
/* Disallow data types without a less-than operator */
type = lookup_type_cache(attForm->atttypid, TYPECACHE_LT_OPR);
if (type->lt_opr == InvalidOid)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("column \"%s\" cannot be used in
statistics because its type %s has no default btree operator class",
attname, format_type_be(attForm->atttypid))));
I added an Assert for this part, not sure if that is enough.
i think it really should be:

if (typentry == NULL || typentry->type_id != elmtyp)
{
typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
TYPECACHE_GT_OPR);
fcinfo->flinfo->fn_extra = (void *) typentry;
if ((sort_asc && !OidIsValid(typentry->lt_opr) || (!sort_as &&
OidIsValid(typentry->gt_opr));
ereport(ERROR,....)
}

Imagine a type that doesn't have TYPECACHE_LT_OPR or TYPECACHE_GT_OPR
then we cannot do the sort, we should just error out.

I just tried this colour type [1] with (CREATE TYPE colour (INPUT =
colour_in, OUTPUT = colour_out, LIKE = pg_catalog.int4);

select array_sort('{#FF0000, #FF0000}'::colour[]);
of course it will segfault with your new Assert.

[1] https://github.com/hlinnaka/colour-datatype/blob/master/colour.c
Make sense, PFA v5 with Jian's suggestion.
Have you noticed that the tests have failed on Cirrus CI runs of this patch?

https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F5277
Sorry for the late reply due to my vacation. I should have paid
more attention to Cirrus CI earlier ;)

It might be related to the test machines having a different *default*
locale than your local environment, which could result in a different
sort order for the test data. You may need to add an explicit COLLATE
clause to the tests to ensure consistent sorting across systems.

I've changed the tests to use just ASCII characters, then added
*COLLATE "C"* to the tests and CI passed, PFA v6.
Sadly the CI only passed on my own github repo, it failed on
cfbot[1], will dig into the reason later because I can not open the cirrus
ci page right now ;(

[1] https://cirrus-ci.com/task/5815925960605696

Seems the failure is not related to this patch, I guess the reason for
this is the stop phase doesn't
release the port properly?

2024-10-09 14:53:10.079 UTC [43052][checkpointer] LOG: checkpoint
complete: wrote 5617 buffers (34.3%), wrote 3 SLRU buffers; 0 WAL
file(s) added, 0 removed, 3 recycled; write=0.107 s, sync=0.001 s,
total=0.107 s; sync files=0, longest=0.000 s, average=0.000 s;
distance=45239 kB, estimate=45239 kB; lsn=0/414A138, redo
lsn=0/414A138
2024-10-09 14:53:10.084 UTC [43050][postmaster] LOG: database system
is shut down
2024-10-09 14:53:10.215 UTC [43270][postmaster] LOG: starting
PostgreSQL 18devel on x86_64-freebsd, compiled by clang-17.0.6, 64-bit
2024-10-09 14:53:10.215 UTC [43270][postmaster] LOG: could not bind
IPv4 address "127.0.0.1": Address already in use
2024-10-09 14:53:10.215 UTC [43270][postmaster] HINT: Is another
postmaster already running on port 11643? If not, wait a few seconds
and retry.
2024-10-09 14:53:10.215 UTC [43270][postmaster] WARNING: could not
create listen socket for "127.0.0.1"
2024-10-09 14:53:10.218 UTC [43270][postmaster] FATAL: could not
create any TCP/IP sockets
2024-10-09 14:53:10.218 UTC [43270][postmaster] LOG: database system
is shut down

--
Thanks, Amit Langote

--
Regards
Junwang Zhao

--
Regards
Junwang Zhao

--
Regards
Junwang Zhao

#15

jian.universality@gmail.com

over 1 year ago

In reply to: Junwang Zhao (#14)

Re: general purpose array_sort

tricky case:
should we allow array element type to be composite/domain?
currently seems to work fine.

create table t(b int[]);
insert into t values ('{{1,3}}'), ('{{1,2}}');

select array_sort((select array_agg(t) from t), 'desc');
array_sort
-----------------------------------
{"(\"{{1,3}}\")","(\"{{1,2}}\")"}

select array_sort((t.b)) from t;
ERROR: multidimensional arrays sorting are not supported

select array_sort((select array_agg(t.b) from t));
ERROR: multidimensional arrays sorting are not supported

#16

zhjwpku@gmail.com

over 1 year ago

In reply to: jian he (#15)

Re: general purpose array_sort

Hi Jian,

On Fri, Oct 11, 2024 at 1:12 PM jian he <jian.universality@gmail.com> wrote:

tricky case:
should we allow array element type to be composite/domain?
currently seems to work fine.

create table t(b int[]);
insert into t values ('{{1,3}}'), ('{{1,2}}');

select array_sort((select array_agg(t) from t), 'desc');
array_sort
-----------------------------------
{"(\"{{1,3}}\")","(\"{{1,2}}\")"}

select array_sort((t.b)) from t;
ERROR: multidimensional arrays sorting are not supported

select array_sort((select array_agg(t.b) from t));
ERROR: multidimensional arrays sorting are not supported

I tried the above cases, and the first one works because it's
a one dim array of composite type, the other two fails because
they are multidimensional.

It seems there is not much meaning to sort composite type,
so are you proposing we should error on that?

--
Regards
Junwang Zhao

#17

tgl@sss.pgh.pa.us

over 1 year ago

In reply to: Junwang Zhao (#16)

Re: general purpose array_sort

Junwang Zhao <zhjwpku@gmail.com> writes:

It seems there is not much meaning to sort composite type,
so are you proposing we should error on that?

It's hardly "general purpose" if it randomly refuses to
sort certain types. I would say it should be able to sort
anything that ORDER BY will handle --- and that certainly
includes the cases shown here.

regards, tom lane

#18

zhjwpku@gmail.com

over 1 year ago

In reply to: Tom Lane (#17)

Re: general purpose array_sort

Sorry for the late reply.

On Mon, Oct 14, 2024 at 4:10 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Junwang Zhao <zhjwpku@gmail.com> writes:

It seems there is not much meaning to sort composite type,
so are you proposing we should error on that?

It's hardly "general purpose" if it randomly refuses to
sort certain types. I would say it should be able to sort
anything that ORDER BY will handle --- and that certainly
includes the cases shown here.

Yeah, agreed.

PFA v7 with multi-array support.

regards, tom lane

--
Regards
Junwang Zhao

#19

jian.universality@gmail.com

over 1 year ago

In reply to: Junwang Zhao (#18)

Re: general purpose array_sort

On Wed, Oct 23, 2024 at 10:28 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

PFA v7 with multi-array support.

if (ARR_NDIM(array) == 1)
{
}
else
{
}
can be simplified.
i think beginning part of array_sort can be like the following:
(newline emitted)

---------------------------------------------------------------------
if (ARR_NDIM(array) < 1)
PG_RETURN_ARRAYTYPE_P(array);
if (dirstr != NULL)
{
if (!parse_sort_order(text_to_cstring(dirstr), &sort_asc, &nulls_first))
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("second parameter must be a valid sort
direction")));
}
elmtyp = ARR_ELEMTYPE(array);
if (ARR_NDIM(array) > 1)
elmtyp = get_array_type(elmtyp);
typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
if (typentry == NULL || typentry->type_id != elmtyp)
{
typentry = lookup_type_cache(elmtyp, sort_asc ?
TYPECACHE_LT_OPR : TYPECACHE_GT_OPR);
if ((sort_asc && !OidIsValid(typentry->lt_opr)) ||
(!sort_asc && !OidIsValid(typentry->gt_opr)))
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_FUNCTION),
errmsg("could not identify an ordering operator
for type %s",
format_type_be(elmtyp))));
fcinfo->flinfo->fn_extra = (void *) typentry;
}
---------------------------------------------------------------------
/*
* array_sort
*
* Sorts the array in either ascending or descending order.
* The array must be empty or one-dimensional.
*/
comments need to be updated.

typedef enum
PARSE_SORT_ORDER_DONE
} ParseSortOrderState;

last one, should have comma, like
"PARSE_SORT_ORDER_DONE, "

#20

zhjwpku@gmail.com

over 1 year ago

In reply to: jian he (#19)

Re: general purpose array_sort

On Thu, Oct 24, 2024 at 8:40 PM jian he <jian.universality@gmail.com> wrote:

On Wed, Oct 23, 2024 at 10:28 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

PFA v7 with multi-array support.

if (ARR_NDIM(array) == 1)
{
}
else
{
}
can be simplified.
i think beginning part of array_sort can be like the following:
(newline emitted)

---------------------------------------------------------------------
if (ARR_NDIM(array) < 1)
PG_RETURN_ARRAYTYPE_P(array);
if (dirstr != NULL)
{
if (!parse_sort_order(text_to_cstring(dirstr), &sort_asc, &nulls_first))
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("second parameter must be a valid sort
direction")));
}
elmtyp = ARR_ELEMTYPE(array);
if (ARR_NDIM(array) > 1)
elmtyp = get_array_type(elmtyp);

I'm wondering should we cache the type entry for both element type and
the corresponding array type, for example if we have a table:

create table t(b int[]);
insert into t values ('{1,3}'),('{{2,3}}'),('{{1,2},{0,2}}');

with 1-d array and m-d array interleaved, then the following query will
call lookup_type_cache multiple times.

select array_sort((t.b)) from t;

typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
if (typentry == NULL || typentry->type_id != elmtyp)
{
typentry = lookup_type_cache(elmtyp, sort_asc ?
TYPECACHE_LT_OPR : TYPECACHE_GT_OPR);
if ((sort_asc && !OidIsValid(typentry->lt_opr)) ||
(!sort_asc && !OidIsValid(typentry->gt_opr)))
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_FUNCTION),
errmsg("could not identify an ordering operator
for type %s",
format_type_be(elmtyp))));
fcinfo->flinfo->fn_extra = (void *) typentry;
}
---------------------------------------------------------------------
/*
* array_sort
*
* Sorts the array in either ascending or descending order.
* The array must be empty or one-dimensional.
*/
comments need to be updated.

will fix it in the next version of patch.

typedef enum
PARSE_SORT_ORDER_DONE
} ParseSortOrderState;

last one, should have comma, like
"PARSE_SORT_ORDER_DONE, "

will fix it.

--
Regards
Junwang Zhao

#21

aleksander@timescale.com

over 1 year ago

In reply to: Tom Lane (#17)

#22

david.g.johnston@gmail.com

over 1 year ago

In reply to: Aleksander Alekseev (#21)

#23

aleksander@timescale.com

over 1 year ago

In reply to: David G. Johnston (#22)

#24

david.g.johnston@gmail.com

over 1 year ago

In reply to: Aleksander Alekseev (#23)

#25

tgl@sss.pgh.pa.us

over 1 year ago

In reply to: David G. Johnston (#24)

#26

david.g.johnston@gmail.com

over 1 year ago

In reply to: Tom Lane (#25)

#27

aleksander@timescale.com

over 1 year ago

In reply to: David G. Johnston (#26)

#28

zhjwpku@gmail.com

over 1 year ago

In reply to: Aleksander Alekseev (#27)

#29

zhjwpku@gmail.com

over 1 year ago

In reply to: Junwang Zhao (#28)

#30

aleksander@timescale.com

over 1 year ago

In reply to: Junwang Zhao (#29)

#31

jian.universality@gmail.com

over 1 year ago

In reply to: Aleksander Alekseev (#30)

#32

aleksander@timescale.com

over 1 year ago

In reply to: jian he (#31)

#33

zhjwpku@gmail.com

over 1 year ago

In reply to: Aleksander Alekseev (#30)

#34

aleksander@timescale.com

over 1 year ago

In reply to: Junwang Zhao (#33)

#35

zhjwpku@gmail.com

over 1 year ago

In reply to: Aleksander Alekseev (#34)

#36

zhjwpku@gmail.com

over 1 year ago

In reply to: Junwang Zhao (#35)

#37

zhjwpku@gmail.com

over 1 year ago

In reply to: Junwang Zhao (#36)

#38

michael@paquier.xyz

over 1 year ago

In reply to: Junwang Zhao (#37)

#39

jian.universality@gmail.com

over 1 year ago

In reply to: Michael Paquier (#38)

#40

Dean Rasheed

dean.a.rasheed@gmail.com

over 1 year ago

In reply to: Junwang Zhao (#37)

#41

michael@paquier.xyz

over 1 year ago

In reply to: jian he (#39)

#42

jian.universality@gmail.com

over 1 year ago

In reply to: Dean Rasheed (#40)

#43

zhjwpku@gmail.com

over 1 year ago

In reply to: jian he (#42)

#44

jian.universality@gmail.com

over 1 year ago

In reply to: Junwang Zhao (#43)

#45

zhjwpku@gmail.com

over 1 year ago

In reply to: Michael Paquier (#38)

#46

zhjwpku@gmail.com

over 1 year ago

In reply to: Michael Paquier (#41)

#47

zhjwpku@gmail.com

over 1 year ago

In reply to: jian he (#42)

#48

Robert Haas

robertmhaas@gmail.com

over 1 year ago

In reply to: Junwang Zhao (#45)

#49

zhjwpku@gmail.com

over 1 year ago

In reply to: Robert Haas (#48)

#50

michael@paquier.xyz

over 1 year ago

In reply to: Robert Haas (#48)

#51

michael@paquier.xyz

over 1 year ago

In reply to: Junwang Zhao (#46)

#52

zhjwpku@gmail.com

over 1 year ago

In reply to: Michael Paquier (#51)

#53

jian.universality@gmail.com

over 1 year ago

In reply to: Michael Paquier (#51)

#54

jian.universality@gmail.com

about 1 year ago

In reply to: jian he (#53)

#55

jian.universality@gmail.com

about 1 year ago

In reply to: jian he (#54)

#56

jian.universality@gmail.com

12 months ago

In reply to: jian he (#55)

#57

zhjwpku@gmail.com

12 months ago

In reply to: jian he (#56)

#58

tgl@sss.pgh.pa.us

12 months ago

In reply to: Junwang Zhao (#57)

#59

zhjwpku@gmail.com

11 months ago

In reply to: Tom Lane (#58)

#60

tgl@sss.pgh.pa.us

11 months ago

In reply to: Junwang Zhao (#59)

#61

zhjwpku@gmail.com

11 months ago

In reply to: Tom Lane (#60)

#62

tgl@sss.pgh.pa.us

11 months ago

In reply to: Junwang Zhao (#61)

#63