MIN/MAX functions for a record

Started by Viliam Ďurinaover 2 years ago12 messageshackers

Jump to latest

Viliam Ďurina

viliam.durina@gmail.com

over 2 years ago

In my queries I often need to do MIN/MAX for tuples, for example:

SELECT MAX(row(year, month))
FROM (VALUES(2025, 1), (2024,2)) x(year, month);

This query throws:

ERROR: function max(record) does not exist

In this case you can replace it with `MAX((year||'-'||month||'-1')::date)`.
However in my case I have an event table with `event_time` and `text`
columns, I'm grouping that table by some key and want to have the text for
the newest event. I would do `MAX(ROW(event_time, text)).text`. Workarounds
for this are clumsy, e.g. with a subquery with LIMIT 1.

The lack of this feature is kind of unexpected, because the `>` operator or
`GREATEST` function are defined for records:

SELECT
GREATEST((2025, 1), (2024, 2)),
(2025, 1) > (2024, 2)

Was this ever discussed or is there something preventing the implementation?

Viliam

Aleksander Alekseev

aleksander@timescale.com

over 2 years ago

In reply to: Viliam Ďurina (#1)

Re: MIN/MAX functions for a record

Hi,

In my queries I often need to do MIN/MAX for tuples, for example:

SELECT MAX(row(year, month))
FROM (VALUES(2025, 1), (2024,2)) x(year, month);

This query throws:

ERROR: function max(record) does not exist

In this case you can replace it with `MAX((year||'-'||month||'-1')::date)`. However in my case I have an event table with `event_time` and `text` columns, I'm grouping that table by some key and want to have the text for the newest event. I would do `MAX(ROW(event_time, text)).text`. Workarounds for this are clumsy, e.g. with a subquery with LIMIT 1.

The lack of this feature is kind of unexpected, because the `>` operator or `GREATEST` function are defined for records:

SELECT
GREATEST((2025, 1), (2024, 2)),
(2025, 1) > (2024, 2)

Was this ever discussed or is there something preventing the implementation?

I believe it would be challenging to implement max(record) that would
work reasonably well in a general case.

What if, for instance, one of the columns is JOSNB or XML? Not to
mention the fact that Postgres supports user-defined types which don't
necessarily have a reasonable order. Take a point in a 2D or 3D space
as an example. On top of that I doubt that the proposed query will
perform well since I don't see how it could benefit from using
indexes. I don't claim that this is necessarily true in your case but
generally one could argue that the wrong schema is used here and
instead of (year, month) pair a table should have a date/timestamp(tz)
column.

Personally I would choose format() function [1]https://www.postgresql.org/docs/current/functions-string.html in cases like this in
order to play it safe. Assuming of course that the table is small and
the query is not executed often.

[1]: https://www.postgresql.org/docs/current/functions-string.html

--
Best regards,
Aleksander Alekseev

Tom Lane

tgl@sss.pgh.pa.us

over 2 years ago

In reply to: Aleksander Alekseev (#2)

Re: MIN/MAX functions for a record

Aleksander Alekseev <aleksander@timescale.com> writes:

In my queries I often need to do MIN/MAX for tuples, for example:
SELECT MAX(row(year, month))
FROM (VALUES(2025, 1), (2024,2)) x(year, month);
This query throws:
ERROR: function max(record) does not exist
Was this ever discussed or is there something preventing the implementation?

I believe it would be challenging to implement max(record) that would
work reasonably well in a general case.

As long as you define it as "works the same way record comparison
does", ie base it on record_cmp(), I don't think it would be much
more than a finger exercise [*]. And why would you want it to act
any differently from record_cmp()? Those semantics have been
established for a long time.

regards, tom lane

[*] Although conceivably there are some challenges in getting
record_cmp's caching logic to work in the context of an aggregate.

Viliam Ďurina

viliam.durina@gmail.com

over 2 years ago

In reply to: Tom Lane (#3)

Re: MIN/MAX functions for a record

Exactly Tom, I see no fundamental problem for it not to be implemented,
since comparison operator is already implemented. In fact, MIN/MAX should
work for all types for which comparison operator is defined.

Regarding index support, there should not be an issue if the index is
defined for the record (e.g. `CREATE INDEX ON my_table(ROW(field_a,
field_b))`). However such indexes seem not to be supported. Whether a
composite index is compatible with a record created on the indexed fields
in every edge case I'm not sure...

Alexander, rewriting the year-month example is easy, but how would you
rewrite this query?

CREATE TABLE events(event_time TIMESTAMP, message VARCHAR, user_id VARCHAR);

You want a newest message for each user. It's easy with MAX(record):

SELECT user_id, MAX(ROW(event_time, message)).message
FROM events
GROUP BY user_id;

One option is to rewrite to a subquery with LIMIT 1

SELECT user_id, (SELECT message FROM events e2 WHERE e1.user_id=e2.user_id
ORDER BY event_time DESC LIMIT 1)
FROM events e1
GROUP BY user_id;

If your query already has multiple levels of grouping, multiple joins,
UNIONs etc., it gets much more complex. I also wonder if the optimizer
would pick the same plan as it would be if the MAX(record) is supported.

Viliam

On Fri, Mar 22, 2024 at 4:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Aleksander Alekseev <aleksander@timescale.com> writes:

In my queries I often need to do MIN/MAX for tuples, for example:
SELECT MAX(row(year, month))
FROM (VALUES(2025, 1), (2024,2)) x(year, month);
This query throws:
ERROR: function max(record) does not exist
Was this ever discussed or is there something preventing the

implementation?

I believe it would be challenging to implement max(record) that would
work reasonably well in a general case.

As long as you define it as "works the same way record comparison
does", ie base it on record_cmp(), I don't think it would be much
more than a finger exercise [*]. And why would you want it to act
any differently from record_cmp()? Those semantics have been
established for a long time.

regards, tom lane

[*] Although conceivably there are some challenges in getting
record_cmp's caching logic to work in the context of an aggregate.

Aleksander Alekseev

aleksander@timescale.com

over 2 years ago

In reply to: Viliam Ďurina (#4)

Re: MIN/MAX functions for a record

Hi,

Exactly Tom, I see no fundamental problem for it not to be implemented, since comparison operator is already implemented. In fact, MIN/MAX should work for all types for which comparison operator is defined.

On second thought, this should work reasonably well.

PFA a WIP patch. At this point it implements only MAX(record), no MIN, no tests:

```
=# SELECT MAX(row(year, month)) FROM (VALUES(2025, 1), (2024,2)) x(year, month);
max
----------
(2025,1)
```

One thing I'm not 100% sure of is whether record_larger() should make
a copy of its arguments or the current implementation is safe.

--
Best regards,
Aleksander Alekseev

Tom Lane

tgl@sss.pgh.pa.us

over 2 years ago

In reply to: Aleksander Alekseev (#5)

Re: MIN/MAX functions for a record

Aleksander Alekseev <aleksander@timescale.com> writes:

One thing I'm not 100% sure of is whether record_larger() should make
a copy of its arguments or the current implementation is safe.

I don't see any copying happening in, say, text_larger or
numeric_larger, so this shouldn't need to either.

Personally I'd write "record_cmp(fcinfo) > 0" rather than indirecting
through record_gt. The way you have it is not strictly correct anyhow:
you're cheating by not using DirectFunctionCall.

Also, given that you're passing the fcinfo, there's no need
to extract the arguments from it before that call. So it
seems to me that code like

if (record_cmp(fcinfo) > 0)
PG_RETURN_HEAPTUPLEHEADER(PG_GETARG_HEAPTUPLEHEADER(0));
else
PG_RETURN_HEAPTUPLEHEADER(PG_GETARG_HEAPTUPLEHEADER(1));

should do, and possibly save one useless detoast step. Or you could
do

if (record_cmp(fcinfo) > 0)
PG_RETURN_DATUM(PG_GETARG_DATUM(0));
else
PG_RETURN_DATUM(PG_GETARG_DATUM(1));

because really there's no point in detoasting at all.

regards, tom lane

Aleksander Alekseev

aleksander@timescale.com

about 2 years ago

In reply to: Tom Lane (#6)

Re: MIN/MAX functions for a record

Hi,

I don't see any copying happening in, say, text_larger or
numeric_larger, so this shouldn't need to either.

Personally I'd write "record_cmp(fcinfo) > 0" rather than indirecting
through record_gt. The way you have it is not strictly correct anyhow:
you're cheating by not using DirectFunctionCall.

Also, given that you're passing the fcinfo, there's no need
to extract the arguments from it before that call. So it
seems to me that code like

if (record_cmp(fcinfo) > 0)
PG_RETURN_HEAPTUPLEHEADER(PG_GETARG_HEAPTUPLEHEADER(0));
else
PG_RETURN_HEAPTUPLEHEADER(PG_GETARG_HEAPTUPLEHEADER(1));

should do, and possibly save one useless detoast step. Or you could
do

if (record_cmp(fcinfo) > 0)
PG_RETURN_DATUM(PG_GETARG_DATUM(0));
else
PG_RETURN_DATUM(PG_GETARG_DATUM(1));

because really there's no point in detoasting at all.

Many thanks. Here is the corrected patch. Now it also includes MIN()
support and tests.

--
Best regards,
Aleksander Alekseev

Aleksander Alekseev

aleksander@timescale.com

almost 2 years ago

In reply to: Aleksander Alekseev (#7)

Re: MIN/MAX functions for a record

Hi,

Many thanks. Here is the corrected patch. Now it also includes MIN()
support and tests.

Michael Paquier (cc:'ed) commented offlist that I forgot to change the
documentation.

Here is the corrected patch.

--
Best regards,
Aleksander Alekseev

Michael Paquier

michael@paquier.xyz

almost 2 years ago

In reply to: Aleksander Alekseev (#8)

Re: MIN/MAX functions for a record

On Mon, Jul 08, 2024 at 12:20:30PM +0300, Aleksander Alekseev wrote:

Here is the corrected patch.

313f87a17155 is one example of a similar change with pg_lsn, with four
entries added to pg_proc and two to pg_aggregate. That's what this
patch is doing from what I can see.

-        and arrays of any of these types.
+        and also arrays and records of any of these types.

This update of the docs is incorrect, no? Records could include much
more types than the ones currently supported for min()/max().

I am not sure to get the concerns of upthread regarding the type
caching in the context of an aggregate, which is the business with
lookup_type_cache(), especially since there is a btree operator
relying on record_cmp(). Tom, what were your concerns here?
--
Michael

#10

Tom Lane

tgl@sss.pgh.pa.us

almost 2 years ago

In reply to: Michael Paquier (#9)

Re: MIN/MAX functions for a record

Michael Paquier <michael@paquier.xyz> writes:

I am not sure to get the concerns of upthread regarding the type
caching in the context of an aggregate, which is the business with
lookup_type_cache(), especially since there is a btree operator
relying on record_cmp(). Tom, what were your concerns here?

Don't recall right at this instant, but I've put myself down as
reviewer of this patch to remind me to look at it in the next
day or two.

regards, tom lane

#11

Michael Paquier

michael@paquier.xyz

almost 2 years ago

In reply to: Tom Lane (#10)

Re: MIN/MAX functions for a record

On Mon, Jul 08, 2024 at 08:54:31PM -0400, Tom Lane wrote:

Don't recall right at this instant, but I've put myself down as
reviewer of this patch to remind me to look at it in the next
day or two.

Thanks for the update. WFM.
--
Michael

#12

Tom Lane

tgl@sss.pgh.pa.us

almost 2 years ago

In reply to: Michael Paquier (#9)

Re: MIN/MAX functions for a record

Michael Paquier <michael@paquier.xyz> writes:

On Mon, Jul 08, 2024 at 12:20:30PM +0300, Aleksander Alekseev wrote:
-        and arrays of any of these types.
+        and also arrays and records of any of these types.

This update of the docs is incorrect, no? Records could include much
more types than the ones currently supported for min()/max().

Yeah, actually the contained data types could be anything with
btree sort support. This is true for arrays too, so the text
was wrong already. I changed it to

+ and also arrays and composite types containing sortable data types.

(Using "composite type" not "record" is a judgment call here, but
I don't see anyplace else in func.sgml preferring "record" for this
meaning.)

I am not sure to get the concerns of upthread regarding the type
caching in the context of an aggregate, which is the business with
lookup_type_cache(), especially since there is a btree operator
relying on record_cmp(). Tom, what were your concerns here?

Re-reading, I was just mentioning that as something to check,
not a major problem. It isn't, because array min/max are already
relying on the ability to use fcinfo->flinfo->fn_extra as cache space
in an aggregate. (Indeed, the array aggregate code is almost
identical to where we ended up.)

AFAICS this is good to go. I made a couple of tiny cosmetic
adjustments, added a catversion bump, and pushed.

regards, tom lane

MIN/MAX functions for a record

Attachments:

Attachments:

Attachments: