general purpose array_sort

Started by Junwang Zhaoover 1 year ago63 messages
Jump to latest
#1Junwang Zhao
zhjwpku@gmail.com

Hi hackers,

per David's suggestion, this patch implements general
purpose array sort.

We can do the following with this patch:

SELECT array_sort('{1.1,3.3,5.5,2.2,4.4,6.6}'::float8[], 'asc');
SELECT array_sort('{abc DEF 123abc,ábc sßs ßss DÉF,DŽxxDŽ džxxDž
Džxxdž,ȺȺȺ,ⱥⱥⱥ,ⱥȺ}'::text[]);
SELECT array_sort('{abc DEF 123abc,ábc sßs ßss DÉF,DŽxxDŽ džxxDž
Džxxdž,ȺȺȺ,ⱥⱥⱥ,ⱥȺ}'::text[], 'asc', 'pg_c_utf8');

--
Regards
Junwang Zhao

Attachments:

v1-0001-general-purpose-array_sort.patchapplication/octet-stream; name=v1-0001-general-purpose-array_sort.patchDownload+229-1
#2Junwang Zhao
zhjwpku@gmail.com
In reply to: Junwang Zhao (#1)
Re: general purpose array_sort

On Fri, Sep 27, 2024 at 9:15 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

Hi hackers,

per David's suggestion, this patch implements general
purpose array sort.

We can do the following with this patch:

SELECT array_sort('{1.1,3.3,5.5,2.2,4.4,6.6}'::float8[], 'asc');
SELECT array_sort('{abc DEF 123abc,ábc sßs ßss DÉF,DŽxxDŽ džxxDž
Džxxdž,ȺȺȺ,ⱥⱥⱥ,ⱥȺ}'::text[]);
SELECT array_sort('{abc DEF 123abc,ábc sßs ßss DÉF,DŽxxDŽ džxxDž
Džxxdž,ȺȺȺ,ⱥⱥⱥ,ⱥȺ}'::text[], 'asc', 'pg_c_utf8');

--
Regards
Junwang Zhao

PFA v2, use COLLATE keyword to supply the collation suggested by
Andreas offlist.

SELECT array_sort('{abc DEF 123abc,ábc sßs ßss DÉF,DŽxxDŽ džxxDž
Džxxdž,ȺȺȺ,ⱥⱥⱥ,ⱥȺ}'::text[]);
SELECT array_sort('{abc DEF 123abc,ábc sßs ßss DÉF,DŽxxDŽ džxxDž
Džxxdž,ȺȺȺ,ⱥⱥⱥ,ⱥȺ}'::text[] COLLATE "pg_c_utf8");

I also created a CF entry[1]https://commitfest.postgresql.org/50/5277/ so it can be easily reviewed.

[1]: https://commitfest.postgresql.org/50/5277/

--
Regards
Junwang Zhao

Attachments:

v2-0001-general-purpose-array_sort.patchapplication/octet-stream; name=v2-0001-general-purpose-array_sort.patchDownload+207-1
#3jian he
jian.universality@gmail.com
In reply to: Junwang Zhao (#2)
Re: general purpose array_sort

On Sat, Sep 28, 2024 at 7:52 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

PFA v2, use COLLATE keyword to supply the collation suggested by
Andreas offlist.

this is better. otherwise we need extra care to handle case like:
SELECT array_sort('{1,3,5,2,4,6}'::int[] COLLATE "pg_c_utf8");

+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>array_sort</primary>
+        </indexterm>
+        <function>array_sort</function> ( <type>anyarray</type>
<optional>, <parameter>dir</parameter> </optional>)
+        <returnvalue>anyarray</returnvalue>
+       </para>
+       <para>
+        Sorts the array in either ascending or descending order.
+        <parameter>dir</parameter> must be <literal>asc</literal>
+        or <literal>desc</literal>. The array must be empty or one-dimensional.
+       </para>
+       <para>
+        <literal>array_sort(ARRAY[1,2,5,6,3,4])</literal>
+        <returnvalue>{1,2,3,4,5,6}</returnvalue>
+       </para></entry>
+      </row>
I am confused with <parameter>dir</parameter>. I guess you want to say
"direction"
But here, I think <parameter>sort_asc</parameter> would be more appropriate?

<parameter>dir</parameter> can have only two potential values, make it
as a boolean would be more easier?
you didn't mention information: "by default, it will sort by
ascending order; the sort collation by default is using the array
element type's collation"

tuplesort_begin_datum can do null-first, null-last, so the
one-dimension array can allow null values.

Based on the above and others, I did some refactoring, feel free to take it.
my changes, changed the function signature, so you need to pay
attention to sql test file.

Attachments:

array_sort_changes.no-cfbotapplication/octet-stream; name=array_sort_changes.no-cfbotDownload+74-47
#4Junwang Zhao
zhjwpku@gmail.com
In reply to: jian he (#3)
Re: general purpose array_sort

On Sat, Sep 28, 2024 at 10:41 PM jian he <jian.universality@gmail.com> wrote:

On Sat, Sep 28, 2024 at 7:52 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

PFA v2, use COLLATE keyword to supply the collation suggested by
Andreas offlist.

this is better. otherwise we need extra care to handle case like:
SELECT array_sort('{1,3,5,2,4,6}'::int[] COLLATE "pg_c_utf8");

+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>array_sort</primary>
+        </indexterm>
+        <function>array_sort</function> ( <type>anyarray</type>
<optional>, <parameter>dir</parameter> </optional>)
+        <returnvalue>anyarray</returnvalue>
+       </para>
+       <para>
+        Sorts the array in either ascending or descending order.
+        <parameter>dir</parameter> must be <literal>asc</literal>
+        or <literal>desc</literal>. The array must be empty or one-dimensional.
+       </para>
+       <para>
+        <literal>array_sort(ARRAY[1,2,5,6,3,4])</literal>
+        <returnvalue>{1,2,3,4,5,6}</returnvalue>
+       </para></entry>
+      </row>
I am confused with <parameter>dir</parameter>. I guess you want to say
"direction"
But here, I think <parameter>sort_asc</parameter> would be more appropriate?

This doc is mostly copied and edited from intarray.sgml sort part.

And the logic is basically the same, you can check the intarray module.

<parameter>dir</parameter> can have only two potential values, make it
as a boolean would be more easier?
you didn't mention information: "by default, it will sort by
ascending order; the sort collation by default is using the array
element type's collation"

tuplesort_begin_datum can do null-first, null-last, so the
one-dimension array can allow null values.

The following(create extension intarry first) will give an error, I
keep the same for array_sort.

SELECT sort('{1234234,-30,234234, null}');

Based on the above and others, I did some refactoring, feel free to take it.
my changes, changed the function signature, so you need to pay
attention to sql test file.

Thanks for your refactor, I will take some in the next version.

--
Regards
Junwang Zhao

#5David G. Johnston
david.g.johnston@gmail.com
In reply to: Junwang Zhao (#4)
Re: general purpose array_sort

On Sat, Sep 28, 2024 at 7:05 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

On Sat, Sep 28, 2024 at 10:41 PM jian he <jian.universality@gmail.com>
wrote:

<parameter>dir</parameter> can have only two potential values, make it
as a boolean would be more easier?
you didn't mention information: "by default, it will sort by
ascending order; the sort collation by default is using the array
element type's collation"

tuplesort_begin_datum can do null-first, null-last, so the
one-dimension array can allow null values.

The following(create extension intarry first) will give an error, I
keep the same for array_sort.

SELECT sort('{1234234,-30,234234, null}');

I would suggest accepting:
asc
desc
asc nulls first
asc nulls last *
desc nulls first *
desc nulls last

As valid inputs for "dir" - and that the starred options are the defaults
when null position is omitted.

In short, mimic create index.

David J.

#6Junwang Zhao
zhjwpku@gmail.com
In reply to: David G. Johnston (#5)
Re: general purpose array_sort

On Sun, Sep 29, 2024 at 10:51 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:

On Sat, Sep 28, 2024 at 7:05 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

On Sat, Sep 28, 2024 at 10:41 PM jian he <jian.universality@gmail.com> wrote:

<parameter>dir</parameter> can have only two potential values, make it
as a boolean would be more easier?
you didn't mention information: "by default, it will sort by
ascending order; the sort collation by default is using the array
element type's collation"

tuplesort_begin_datum can do null-first, null-last, so the
one-dimension array can allow null values.

The following(create extension intarry first) will give an error, I
keep the same for array_sort.

SELECT sort('{1234234,-30,234234, null}');

I would suggest accepting:
asc
desc
asc nulls first
asc nulls last *
desc nulls first *
desc nulls last

As valid inputs for "dir" - and that the starred options are the defaults when null position is omitted.

In short, mimic create index.

David J.

PFA v3 with David's suggestion addressed.

--
Regards
Junwang Zhao

Attachments:

v3-0001-general-purpose-array_sort.patchapplication/octet-stream; name=v3-0001-general-purpose-array_sort.patchDownload+364-1
#7jian he
jian.universality@gmail.com
In reply to: Junwang Zhao (#6)
Re: general purpose array_sort

On Mon, Sep 30, 2024 at 1:01 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

I would suggest accepting:
asc
desc
asc nulls first
asc nulls last *
desc nulls first *
desc nulls last

As valid inputs for "dir" - and that the starred options are the defaults when null position is omitted.

In short, mimic create index.

David J.

PFA v3 with David's suggestion addressed.

I think just adding 2 bool arguments (asc/desc, nulls last/not nulls
last) would be easier.
but either way, (i don't have a huge opinion)
but document the second argument, imagine case
SELECT array_sort('{a,B}'::text[] , E'aSc NulLs LaST \t\r\n');
would be tricky?

errmsg("multidimensional arrays sorting are not supported")));
write a sql test to trigger the error message that would be great.

you can add two or one example to collate.icu.utf8.sql to demo that it
actually works with COLLATE collation_name
like:
SELECT array_sort('{a,B}'::text[] COLLATE case_insensitive);
SELECT array_sort('{a,B}'::text[] COLLATE "C");

#define WHITESPACE " \t\n\r"
you may also check function scanner_isspace

+ typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
+ if (typentry == NULL || typentry->type_id != elmtyp)
+ {
+ typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
TYPECACHE_GT_OPR);
+ fcinfo->flinfo->fn_extra = (void *) typentry;
+ }
you need to one-time check typentry->lt_opr or typentry->gt_opr exists?
see CreateStatistics.
            /* Disallow data types without a less-than operator */
            type = lookup_type_cache(attForm->atttypid, TYPECACHE_LT_OPR);
            if (type->lt_opr == InvalidOid)
                ereport(ERROR,
                        (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                         errmsg("column \"%s\" cannot be used in
statistics because its type %s has no default btree operator class",
                                attname, format_type_be(attForm->atttypid))));
#8Junwang Zhao
zhjwpku@gmail.com
In reply to: jian he (#7)
Re: general purpose array_sort

Hi Jian,

On Mon, Sep 30, 2024 at 11:13 PM jian he <jian.universality@gmail.com> wrote:

On Mon, Sep 30, 2024 at 1:01 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

I would suggest accepting:
asc
desc
asc nulls first
asc nulls last *
desc nulls first *
desc nulls last

As valid inputs for "dir" - and that the starred options are the defaults when null position is omitted.

In short, mimic create index.

David J.

PFA v3 with David's suggestion addressed.

I think just adding 2 bool arguments (asc/desc, nulls last/not nulls
last) would be easier.

Yeah, this would be easier, it's just the intarray module use
the direction parameter, I keep it here for the same user
experience, I don't insist if some committer thinks 2 bool arguments
would be a better option.

but either way, (i don't have a huge opinion)
but document the second argument, imagine case
SELECT array_sort('{a,B}'::text[] , E'aSc NulLs LaST \t\r\n');
would be tricky?

The case you provide should give the correct results, but
I doubt users will do this.
I'm not good at document wording, so you might give me some help
with the document part.

errmsg("multidimensional arrays sorting are not supported")));
write a sql test to trigger the error message that would be great.

you can add two or one example to collate.icu.utf8.sql to demo that it
actually works with COLLATE collation_name
like:
SELECT array_sort('{a,B}'::text[] COLLATE case_insensitive);
SELECT array_sort('{a,B}'::text[] COLLATE "C");

Fixed.

#define WHITESPACE " \t\n\r"
you may also check function scanner_isspace

Fixed.

+ typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
+ if (typentry == NULL || typentry->type_id != elmtyp)
+ {
+ typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
TYPECACHE_GT_OPR);
+ fcinfo->flinfo->fn_extra = (void *) typentry;
+ }
you need to one-time check typentry->lt_opr or typentry->gt_opr exists?
see CreateStatistics.
/* Disallow data types without a less-than operator */
type = lookup_type_cache(attForm->atttypid, TYPECACHE_LT_OPR);
if (type->lt_opr == InvalidOid)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("column \"%s\" cannot be used in
statistics because its type %s has no default btree operator class",
attname, format_type_be(attForm->atttypid))));

I added an Assert for this part, not sure if that is enough.

--
Regards
Junwang Zhao

Attachments:

v4-0001-general-purpose-array_sort.patchapplication/octet-stream; name=v4-0001-general-purpose-array_sort.patchDownload+387-1
#9jian he
jian.universality@gmail.com
In reply to: Junwang Zhao (#8)
Re: general purpose array_sort
+ typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
+ if (typentry == NULL || typentry->type_id != elmtyp)
+ {
+ typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
TYPECACHE_GT_OPR);
+ fcinfo->flinfo->fn_extra = (void *) typentry;
+ }
you need to one-time check typentry->lt_opr or typentry->gt_opr exists?
see CreateStatistics.
/* Disallow data types without a less-than operator */
type = lookup_type_cache(attForm->atttypid, TYPECACHE_LT_OPR);
if (type->lt_opr == InvalidOid)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("column \"%s\" cannot be used in
statistics because its type %s has no default btree operator class",
attname, format_type_be(attForm->atttypid))));

I added an Assert for this part, not sure if that is enough.

i think it really should be:

if (typentry == NULL || typentry->type_id != elmtyp)
{
typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
TYPECACHE_GT_OPR);
fcinfo->flinfo->fn_extra = (void *) typentry;
if ((sort_asc && !OidIsValid(typentry->lt_opr) || (!sort_as &&
OidIsValid(typentry->gt_opr));
ereport(ERROR,....)
}

Imagine a type that doesn't have TYPECACHE_LT_OPR or TYPECACHE_GT_OPR
then we cannot do the sort, we should just error out.

I just tried this colour type [1]https://github.com/hlinnaka/colour-datatype/blob/master/colour.c with (CREATE TYPE colour (INPUT =
colour_in, OUTPUT = colour_out, LIKE = pg_catalog.int4);

select array_sort('{#FF0000, #FF0000}'::colour[]);
of course it will segfault with your new Assert.

[1]: https://github.com/hlinnaka/colour-datatype/blob/master/colour.c

#10Junwang Zhao
zhjwpku@gmail.com
In reply to: jian he (#9)
Re: general purpose array_sort

On Wed, Oct 2, 2024 at 9:51 AM jian he <jian.universality@gmail.com> wrote:

+ typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
+ if (typentry == NULL || typentry->type_id != elmtyp)
+ {
+ typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
TYPECACHE_GT_OPR);
+ fcinfo->flinfo->fn_extra = (void *) typentry;
+ }
you need to one-time check typentry->lt_opr or typentry->gt_opr exists?
see CreateStatistics.
/* Disallow data types without a less-than operator */
type = lookup_type_cache(attForm->atttypid, TYPECACHE_LT_OPR);
if (type->lt_opr == InvalidOid)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("column \"%s\" cannot be used in
statistics because its type %s has no default btree operator class",
attname, format_type_be(attForm->atttypid))));

I added an Assert for this part, not sure if that is enough.

i think it really should be:

if (typentry == NULL || typentry->type_id != elmtyp)
{
typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
TYPECACHE_GT_OPR);
fcinfo->flinfo->fn_extra = (void *) typentry;
if ((sort_asc && !OidIsValid(typentry->lt_opr) || (!sort_as &&
OidIsValid(typentry->gt_opr));
ereport(ERROR,....)
}

Imagine a type that doesn't have TYPECACHE_LT_OPR or TYPECACHE_GT_OPR
then we cannot do the sort, we should just error out.

I just tried this colour type [1] with (CREATE TYPE colour (INPUT =
colour_in, OUTPUT = colour_out, LIKE = pg_catalog.int4);

select array_sort('{#FF0000, #FF0000}'::colour[]);
of course it will segfault with your new Assert.

[1] https://github.com/hlinnaka/colour-datatype/blob/master/colour.c

Make sense, PFA v5 with Jian's suggestion.

--
Regards
Junwang Zhao

Attachments:

v5-0001-general-purpose-array_sort.patchapplication/octet-stream; name=v5-0001-general-purpose-array_sort.patchDownload+392-1
#11Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Junwang Zhao (#10)
Re: general purpose array_sort

Hi Junwang,

On Wed, Oct 2, 2024 at 11:46 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

On Wed, Oct 2, 2024 at 9:51 AM jian he <jian.universality@gmail.com> wrote:

+ typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
+ if (typentry == NULL || typentry->type_id != elmtyp)
+ {
+ typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
TYPECACHE_GT_OPR);
+ fcinfo->flinfo->fn_extra = (void *) typentry;
+ }
you need to one-time check typentry->lt_opr or typentry->gt_opr exists?
see CreateStatistics.
/* Disallow data types without a less-than operator */
type = lookup_type_cache(attForm->atttypid, TYPECACHE_LT_OPR);
if (type->lt_opr == InvalidOid)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("column \"%s\" cannot be used in
statistics because its type %s has no default btree operator class",
attname, format_type_be(attForm->atttypid))));

I added an Assert for this part, not sure if that is enough.

i think it really should be:

if (typentry == NULL || typentry->type_id != elmtyp)
{
typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
TYPECACHE_GT_OPR);
fcinfo->flinfo->fn_extra = (void *) typentry;
if ((sort_asc && !OidIsValid(typentry->lt_opr) || (!sort_as &&
OidIsValid(typentry->gt_opr));
ereport(ERROR,....)
}

Imagine a type that doesn't have TYPECACHE_LT_OPR or TYPECACHE_GT_OPR
then we cannot do the sort, we should just error out.

I just tried this colour type [1] with (CREATE TYPE colour (INPUT =
colour_in, OUTPUT = colour_out, LIKE = pg_catalog.int4);

select array_sort('{#FF0000, #FF0000}'::colour[]);
of course it will segfault with your new Assert.

[1] https://github.com/hlinnaka/colour-datatype/blob/master/colour.c

Make sense, PFA v5 with Jian's suggestion.

Have you noticed that the tests have failed on Cirrus CI runs of this patch?

https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F5277

It might be related to the test machines having a different *default*
locale than your local environment, which could result in a different
sort order for the test data. You may need to add an explicit COLLATE
clause to the tests to ensure consistent sorting across systems.

--
Thanks, Amit Langote

#12Junwang Zhao
zhjwpku@gmail.com
In reply to: Amit Langote (#11)
Re: general purpose array_sort

Hi Amit,

On Thu, Oct 3, 2024 at 2:22 PM Amit Langote <amitlangote09@gmail.com> wrote:

Hi Junwang,

On Wed, Oct 2, 2024 at 11:46 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

On Wed, Oct 2, 2024 at 9:51 AM jian he <jian.universality@gmail.com> wrote:

+ typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
+ if (typentry == NULL || typentry->type_id != elmtyp)
+ {
+ typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
TYPECACHE_GT_OPR);
+ fcinfo->flinfo->fn_extra = (void *) typentry;
+ }
you need to one-time check typentry->lt_opr or typentry->gt_opr exists?
see CreateStatistics.
/* Disallow data types without a less-than operator */
type = lookup_type_cache(attForm->atttypid, TYPECACHE_LT_OPR);
if (type->lt_opr == InvalidOid)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("column \"%s\" cannot be used in
statistics because its type %s has no default btree operator class",
attname, format_type_be(attForm->atttypid))));

I added an Assert for this part, not sure if that is enough.

i think it really should be:

if (typentry == NULL || typentry->type_id != elmtyp)
{
typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
TYPECACHE_GT_OPR);
fcinfo->flinfo->fn_extra = (void *) typentry;
if ((sort_asc && !OidIsValid(typentry->lt_opr) || (!sort_as &&
OidIsValid(typentry->gt_opr));
ereport(ERROR,....)
}

Imagine a type that doesn't have TYPECACHE_LT_OPR or TYPECACHE_GT_OPR
then we cannot do the sort, we should just error out.

I just tried this colour type [1] with (CREATE TYPE colour (INPUT =
colour_in, OUTPUT = colour_out, LIKE = pg_catalog.int4);

select array_sort('{#FF0000, #FF0000}'::colour[]);
of course it will segfault with your new Assert.

[1] https://github.com/hlinnaka/colour-datatype/blob/master/colour.c

Make sense, PFA v5 with Jian's suggestion.

Have you noticed that the tests have failed on Cirrus CI runs of this patch?

https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F5277

Sorry for the late reply due to my vacation. I should have paid
more attention to Cirrus CI earlier ;)

It might be related to the test machines having a different *default*
locale than your local environment, which could result in a different
sort order for the test data. You may need to add an explicit COLLATE
clause to the tests to ensure consistent sorting across systems.

I've changed the tests to use just ASCII characters, then added
*COLLATE "C"* to the tests and CI passed, PFA v6.

--
Thanks, Amit Langote

--
Regards
Junwang Zhao

Attachments:

v6-0001-general-purpose-array_sort.patchapplication/octet-stream; name=v6-0001-general-purpose-array_sort.patchDownload+385-1
#13Junwang Zhao
zhjwpku@gmail.com
In reply to: Junwang Zhao (#12)
Re: general purpose array_sort

On Wed, Oct 9, 2024 at 10:10 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

Hi Amit,

On Thu, Oct 3, 2024 at 2:22 PM Amit Langote <amitlangote09@gmail.com> wrote:

Hi Junwang,

On Wed, Oct 2, 2024 at 11:46 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

On Wed, Oct 2, 2024 at 9:51 AM jian he <jian.universality@gmail.com> wrote:

+ typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
+ if (typentry == NULL || typentry->type_id != elmtyp)
+ {
+ typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
TYPECACHE_GT_OPR);
+ fcinfo->flinfo->fn_extra = (void *) typentry;
+ }
you need to one-time check typentry->lt_opr or typentry->gt_opr exists?
see CreateStatistics.
/* Disallow data types without a less-than operator */
type = lookup_type_cache(attForm->atttypid, TYPECACHE_LT_OPR);
if (type->lt_opr == InvalidOid)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("column \"%s\" cannot be used in
statistics because its type %s has no default btree operator class",
attname, format_type_be(attForm->atttypid))));

I added an Assert for this part, not sure if that is enough.

i think it really should be:

if (typentry == NULL || typentry->type_id != elmtyp)
{
typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
TYPECACHE_GT_OPR);
fcinfo->flinfo->fn_extra = (void *) typentry;
if ((sort_asc && !OidIsValid(typentry->lt_opr) || (!sort_as &&
OidIsValid(typentry->gt_opr));
ereport(ERROR,....)
}

Imagine a type that doesn't have TYPECACHE_LT_OPR or TYPECACHE_GT_OPR
then we cannot do the sort, we should just error out.

I just tried this colour type [1] with (CREATE TYPE colour (INPUT =
colour_in, OUTPUT = colour_out, LIKE = pg_catalog.int4);

select array_sort('{#FF0000, #FF0000}'::colour[]);
of course it will segfault with your new Assert.

[1] https://github.com/hlinnaka/colour-datatype/blob/master/colour.c

Make sense, PFA v5 with Jian's suggestion.

Have you noticed that the tests have failed on Cirrus CI runs of this patch?

https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F5277

Sorry for the late reply due to my vacation. I should have paid
more attention to Cirrus CI earlier ;)

It might be related to the test machines having a different *default*
locale than your local environment, which could result in a different
sort order for the test data. You may need to add an explicit COLLATE
clause to the tests to ensure consistent sorting across systems.

I've changed the tests to use just ASCII characters, then added
*COLLATE "C"* to the tests and CI passed, PFA v6.

Sadly the CI only passed on my own github repo, it failed on
cfbot[1]https://cirrus-ci.com/task/5815925960605696, will dig into the reason later because I can not open the cirrus
ci page right now ;(

[1]: https://cirrus-ci.com/task/5815925960605696

--
Thanks, Amit Langote

--
Regards
Junwang Zhao

--
Regards
Junwang Zhao

#14Junwang Zhao
zhjwpku@gmail.com
In reply to: Junwang Zhao (#13)
Re: general purpose array_sort

On Wed, Oct 9, 2024 at 11:46 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

On Wed, Oct 9, 2024 at 10:10 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

Hi Amit,

On Thu, Oct 3, 2024 at 2:22 PM Amit Langote <amitlangote09@gmail.com> wrote:

Hi Junwang,

On Wed, Oct 2, 2024 at 11:46 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

On Wed, Oct 2, 2024 at 9:51 AM jian he <jian.universality@gmail.com> wrote:

+ typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
+ if (typentry == NULL || typentry->type_id != elmtyp)
+ {
+ typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
TYPECACHE_GT_OPR);
+ fcinfo->flinfo->fn_extra = (void *) typentry;
+ }
you need to one-time check typentry->lt_opr or typentry->gt_opr exists?
see CreateStatistics.
/* Disallow data types without a less-than operator */
type = lookup_type_cache(attForm->atttypid, TYPECACHE_LT_OPR);
if (type->lt_opr == InvalidOid)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("column \"%s\" cannot be used in
statistics because its type %s has no default btree operator class",
attname, format_type_be(attForm->atttypid))));

I added an Assert for this part, not sure if that is enough.

i think it really should be:

if (typentry == NULL || typentry->type_id != elmtyp)
{
typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
TYPECACHE_GT_OPR);
fcinfo->flinfo->fn_extra = (void *) typentry;
if ((sort_asc && !OidIsValid(typentry->lt_opr) || (!sort_as &&
OidIsValid(typentry->gt_opr));
ereport(ERROR,....)
}

Imagine a type that doesn't have TYPECACHE_LT_OPR or TYPECACHE_GT_OPR
then we cannot do the sort, we should just error out.

I just tried this colour type [1] with (CREATE TYPE colour (INPUT =
colour_in, OUTPUT = colour_out, LIKE = pg_catalog.int4);

select array_sort('{#FF0000, #FF0000}'::colour[]);
of course it will segfault with your new Assert.

[1] https://github.com/hlinnaka/colour-datatype/blob/master/colour.c

Make sense, PFA v5 with Jian's suggestion.

Have you noticed that the tests have failed on Cirrus CI runs of this patch?

https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F5277

Sorry for the late reply due to my vacation. I should have paid
more attention to Cirrus CI earlier ;)

It might be related to the test machines having a different *default*
locale than your local environment, which could result in a different
sort order for the test data. You may need to add an explicit COLLATE
clause to the tests to ensure consistent sorting across systems.

I've changed the tests to use just ASCII characters, then added
*COLLATE "C"* to the tests and CI passed, PFA v6.

Sadly the CI only passed on my own github repo, it failed on
cfbot[1], will dig into the reason later because I can not open the cirrus
ci page right now ;(

[1] https://cirrus-ci.com/task/5815925960605696

Seems the failure is not related to this patch, I guess the reason for
this is the stop phase doesn't
release the port properly?

2024-10-09 14:53:10.079 UTC [43052][checkpointer] LOG: checkpoint
complete: wrote 5617 buffers (34.3%), wrote 3 SLRU buffers; 0 WAL
file(s) added, 0 removed, 3 recycled; write=0.107 s, sync=0.001 s,
total=0.107 s; sync files=0, longest=0.000 s, average=0.000 s;
distance=45239 kB, estimate=45239 kB; lsn=0/414A138, redo
lsn=0/414A138
2024-10-09 14:53:10.084 UTC [43050][postmaster] LOG: database system
is shut down
2024-10-09 14:53:10.215 UTC [43270][postmaster] LOG: starting
PostgreSQL 18devel on x86_64-freebsd, compiled by clang-17.0.6, 64-bit
2024-10-09 14:53:10.215 UTC [43270][postmaster] LOG: could not bind
IPv4 address "127.0.0.1": Address already in use
2024-10-09 14:53:10.215 UTC [43270][postmaster] HINT: Is another
postmaster already running on port 11643? If not, wait a few seconds
and retry.
2024-10-09 14:53:10.215 UTC [43270][postmaster] WARNING: could not
create listen socket for "127.0.0.1"
2024-10-09 14:53:10.218 UTC [43270][postmaster] FATAL: could not
create any TCP/IP sockets
2024-10-09 14:53:10.218 UTC [43270][postmaster] LOG: database system
is shut down

https://api.cirrus-ci.com/v1/artifact/task/5815925960605696/testrun/build/testrun/ssl/001_ssltests/log/001_ssltests_primary.log

--
Thanks, Amit Langote

--
Regards
Junwang Zhao

--
Regards
Junwang Zhao

--
Regards
Junwang Zhao

#15jian he
jian.universality@gmail.com
In reply to: Junwang Zhao (#14)
Re: general purpose array_sort

tricky case:
should we allow array element type to be composite/domain?
currently seems to work fine.

create table t(b int[]);
insert into t values ('{{1,3}}'), ('{{1,2}}');

select array_sort((select array_agg(t) from t), 'desc');
array_sort
-----------------------------------
{"(\"{{1,3}}\")","(\"{{1,2}}\")"}

select array_sort((t.b)) from t;
ERROR: multidimensional arrays sorting are not supported

select array_sort((select array_agg(t.b) from t));
ERROR: multidimensional arrays sorting are not supported

#16Junwang Zhao
zhjwpku@gmail.com
In reply to: jian he (#15)
Re: general purpose array_sort

Hi Jian,

On Fri, Oct 11, 2024 at 1:12 PM jian he <jian.universality@gmail.com> wrote:

tricky case:
should we allow array element type to be composite/domain?
currently seems to work fine.

create table t(b int[]);
insert into t values ('{{1,3}}'), ('{{1,2}}');

select array_sort((select array_agg(t) from t), 'desc');
array_sort
-----------------------------------
{"(\"{{1,3}}\")","(\"{{1,2}}\")"}

select array_sort((t.b)) from t;
ERROR: multidimensional arrays sorting are not supported

select array_sort((select array_agg(t.b) from t));
ERROR: multidimensional arrays sorting are not supported

I tried the above cases, and the first one works because it's
a one dim array of composite type, the other two fails because
they are multidimensional.

It seems there is not much meaning to sort composite type,
so are you proposing we should error on that?

--
Regards
Junwang Zhao

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Junwang Zhao (#16)
Re: general purpose array_sort

Junwang Zhao <zhjwpku@gmail.com> writes:

It seems there is not much meaning to sort composite type,
so are you proposing we should error on that?

It's hardly "general purpose" if it randomly refuses to
sort certain types. I would say it should be able to sort
anything that ORDER BY will handle --- and that certainly
includes the cases shown here.

regards, tom lane

#18Junwang Zhao
zhjwpku@gmail.com
In reply to: Tom Lane (#17)
Re: general purpose array_sort

Sorry for the late reply.

On Mon, Oct 14, 2024 at 4:10 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Junwang Zhao <zhjwpku@gmail.com> writes:

It seems there is not much meaning to sort composite type,
so are you proposing we should error on that?

It's hardly "general purpose" if it randomly refuses to
sort certain types. I would say it should be able to sort
anything that ORDER BY will handle --- and that certainly
includes the cases shown here.

Yeah, agreed.

PFA v7 with multi-array support.

regards, tom lane

--
Regards
Junwang Zhao

Attachments:

v7-0001-general-purpose-array_sort.patchapplication/octet-stream; name=v7-0001-general-purpose-array_sort.patchDownload+408-1
#19jian he
jian.universality@gmail.com
In reply to: Junwang Zhao (#18)
Re: general purpose array_sort

On Wed, Oct 23, 2024 at 10:28 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

PFA v7 with multi-array support.

if (ARR_NDIM(array) == 1)
{
}
else
{
}
can be simplified.
i think beginning part of array_sort can be like the following:
(newline emitted)

---------------------------------------------------------------------
if (ARR_NDIM(array) < 1)
PG_RETURN_ARRAYTYPE_P(array);
if (dirstr != NULL)
{
if (!parse_sort_order(text_to_cstring(dirstr), &sort_asc, &nulls_first))
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("second parameter must be a valid sort
direction")));
}
elmtyp = ARR_ELEMTYPE(array);
if (ARR_NDIM(array) > 1)
elmtyp = get_array_type(elmtyp);
typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
if (typentry == NULL || typentry->type_id != elmtyp)
{
typentry = lookup_type_cache(elmtyp, sort_asc ?
TYPECACHE_LT_OPR : TYPECACHE_GT_OPR);
if ((sort_asc && !OidIsValid(typentry->lt_opr)) ||
(!sort_asc && !OidIsValid(typentry->gt_opr)))
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_FUNCTION),
errmsg("could not identify an ordering operator
for type %s",
format_type_be(elmtyp))));
fcinfo->flinfo->fn_extra = (void *) typentry;
}
---------------------------------------------------------------------
/*
* array_sort
*
* Sorts the array in either ascending or descending order.
* The array must be empty or one-dimensional.
*/
comments need to be updated.

typedef enum
PARSE_SORT_ORDER_DONE
} ParseSortOrderState;

last one, should have comma, like
"PARSE_SORT_ORDER_DONE, "

#20Junwang Zhao
zhjwpku@gmail.com
In reply to: jian he (#19)
Re: general purpose array_sort

On Thu, Oct 24, 2024 at 8:40 PM jian he <jian.universality@gmail.com> wrote:

On Wed, Oct 23, 2024 at 10:28 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

PFA v7 with multi-array support.

if (ARR_NDIM(array) == 1)
{
}
else
{
}
can be simplified.
i think beginning part of array_sort can be like the following:
(newline emitted)

---------------------------------------------------------------------
if (ARR_NDIM(array) < 1)
PG_RETURN_ARRAYTYPE_P(array);
if (dirstr != NULL)
{
if (!parse_sort_order(text_to_cstring(dirstr), &sort_asc, &nulls_first))
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("second parameter must be a valid sort
direction")));
}
elmtyp = ARR_ELEMTYPE(array);
if (ARR_NDIM(array) > 1)
elmtyp = get_array_type(elmtyp);

I'm wondering should we cache the type entry for both element type and
the corresponding array type, for example if we have a table:

create table t(b int[]);
insert into t values ('{1,3}'),('{{2,3}}'),('{{1,2},{0,2}}');

with 1-d array and m-d array interleaved, then the following query will
call lookup_type_cache multiple times.

select array_sort((t.b)) from t;

typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
if (typentry == NULL || typentry->type_id != elmtyp)
{
typentry = lookup_type_cache(elmtyp, sort_asc ?
TYPECACHE_LT_OPR : TYPECACHE_GT_OPR);
if ((sort_asc && !OidIsValid(typentry->lt_opr)) ||
(!sort_asc && !OidIsValid(typentry->gt_opr)))
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_FUNCTION),
errmsg("could not identify an ordering operator
for type %s",
format_type_be(elmtyp))));
fcinfo->flinfo->fn_extra = (void *) typentry;
}
---------------------------------------------------------------------
/*
* array_sort
*
* Sorts the array in either ascending or descending order.
* The array must be empty or one-dimensional.
*/
comments need to be updated.

will fix it in the next version of patch.

typedef enum
PARSE_SORT_ORDER_DONE
} ParseSortOrderState;

last one, should have comma, like
"PARSE_SORT_ORDER_DONE, "

will fix it.

--
Regards
Junwang Zhao

#21Aleksander Alekseev
aleksander@timescale.com
In reply to: Tom Lane (#17)
#22David G. Johnston
david.g.johnston@gmail.com
In reply to: Aleksander Alekseev (#21)
#23Aleksander Alekseev
aleksander@timescale.com
In reply to: David G. Johnston (#22)
#24David G. Johnston
david.g.johnston@gmail.com
In reply to: Aleksander Alekseev (#23)
#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: David G. Johnston (#24)
#26David G. Johnston
david.g.johnston@gmail.com
In reply to: Tom Lane (#25)
#27Aleksander Alekseev
aleksander@timescale.com
In reply to: David G. Johnston (#26)
#28Junwang Zhao
zhjwpku@gmail.com
In reply to: Aleksander Alekseev (#27)
#29Junwang Zhao
zhjwpku@gmail.com
In reply to: Junwang Zhao (#28)
#30Aleksander Alekseev
aleksander@timescale.com
In reply to: Junwang Zhao (#29)
#31jian he
jian.universality@gmail.com
In reply to: Aleksander Alekseev (#30)
#32Aleksander Alekseev
aleksander@timescale.com
In reply to: jian he (#31)
#33Junwang Zhao
zhjwpku@gmail.com
In reply to: Aleksander Alekseev (#30)
#34Aleksander Alekseev
aleksander@timescale.com
In reply to: Junwang Zhao (#33)
#35Junwang Zhao
zhjwpku@gmail.com
In reply to: Aleksander Alekseev (#34)
#36Junwang Zhao
zhjwpku@gmail.com
In reply to: Junwang Zhao (#35)
#37Junwang Zhao
zhjwpku@gmail.com
In reply to: Junwang Zhao (#36)
#38Michael Paquier
michael@paquier.xyz
In reply to: Junwang Zhao (#37)
#39jian he
jian.universality@gmail.com
In reply to: Michael Paquier (#38)
#40Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Junwang Zhao (#37)
#41Michael Paquier
michael@paquier.xyz
In reply to: jian he (#39)
#42jian he
jian.universality@gmail.com
In reply to: Dean Rasheed (#40)
#43Junwang Zhao
zhjwpku@gmail.com
In reply to: jian he (#42)
#44jian he
jian.universality@gmail.com
In reply to: Junwang Zhao (#43)
#45Junwang Zhao
zhjwpku@gmail.com
In reply to: Michael Paquier (#38)
#46Junwang Zhao
zhjwpku@gmail.com
In reply to: Michael Paquier (#41)
#47Junwang Zhao
zhjwpku@gmail.com
In reply to: jian he (#42)
#48Robert Haas
robertmhaas@gmail.com
In reply to: Junwang Zhao (#45)
#49Junwang Zhao
zhjwpku@gmail.com
In reply to: Robert Haas (#48)
#50Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#48)
#51Michael Paquier
michael@paquier.xyz
In reply to: Junwang Zhao (#46)
#52Junwang Zhao
zhjwpku@gmail.com
In reply to: Michael Paquier (#51)
#53jian he
jian.universality@gmail.com
In reply to: Michael Paquier (#51)
#54jian he
jian.universality@gmail.com
In reply to: jian he (#53)
#55jian he
jian.universality@gmail.com
In reply to: jian he (#54)
#56jian he
jian.universality@gmail.com
In reply to: jian he (#55)
#57Junwang Zhao
zhjwpku@gmail.com
In reply to: jian he (#56)
#58Tom Lane
tgl@sss.pgh.pa.us
In reply to: Junwang Zhao (#57)
#59Junwang Zhao
zhjwpku@gmail.com
In reply to: Tom Lane (#58)
#60Tom Lane
tgl@sss.pgh.pa.us
In reply to: Junwang Zhao (#59)
#61Junwang Zhao
zhjwpku@gmail.com
In reply to: Tom Lane (#60)
#62Tom Lane
tgl@sss.pgh.pa.us
In reply to: Junwang Zhao (#61)
#63Junwang Zhao
zhjwpku@gmail.com
In reply to: Tom Lane (#62)