Custom tuplesorts for extensions

Started by Alexander Korotkovalmost 4 years ago21 messageshackers
Jump to latest
#1Alexander Korotkov
aekorotkov@gmail.com

Hackers,

Some PostgreSQL extensions need to sort their pieces of data. Then it
worth to re-use our tuplesort. But despite our tuplesort having
extensibility, it's hidden inside tuplesort.c. There are at least a
couple of examples of how extensions deal with that.

1. RUM table access method: https://github.com/postgrespro/rum
RUM repository contains a copy of tuplesort.c for each major
PostgreSQL release. A reliable solution, but this is not how things
are intended to work, right?
2. OrioleDB table access method: https://github.com/orioledb/orioledb
OrioleDB runs on patches PostgreSQL. It contains a patch, which just
exposes all the guts of tuplesort.c to the tuplesort.h
https://github.com/orioledb/postgres/commit/d42755f52c

I think we need a proper way to let extension re-use our core
tuplesort facility. The attached patchset is intended to do this the
right way. Patches don't revise all the comments and lack code
beautification. The intention behind publishing this revision is to
verify the direction and get some feedback for further work.

0001-Remove-Tuplesortstate.copytup-v1.patch
It's unclear for me how do we split functionality between
Tuplesortstate.copytup() function and tuplesort_put*() functions. For
instance, copytup_index() and copytup_datum() return error while
tuplesort_putindextuplevalues() and tuplesort_putdatum() do their
work. The patch removes Tuplesortstate.copytup() altogether, putting
their functions to tuplesort_put*().

0002-Tuplesortstate.getdatum1-method-v1.patch
0003-Put-abbreviation-logic-into-puttuple_common-v1.patch
The tuplesort_put*() functions contains common part related to dealing
with abbreviation. The 0002 extracts logic of getting value of
SortTuple.datum1 into Tuplesortstate.getdatum1() function. Thanks to
this new interface function, 0003 puts abbreviation logic into
puttuple().

0004-Move-freeing-memory-away-from-writetup-v1.patch
Assuming that SortTuple.tuple is always just a single chunk of memory,
we can put memory counting logic away from Tuplesortstate.writetup().
This makes Tuplesortstate.getdatum1() easier to implement without
knowledge of tuplesort.c guts.

0005-Reorganize-data-structures-v1.patch
This commit splits the "public" part of Tuplesortstate into
TuplesortOps, which is intended to be exposed outside tuplesort.c.

0006-Split-tuplesortops.c-v1.patch
This patch finally splits tuplesortops.c from tuplesort.c. tuplesort.c
leaves which generic routines for tuple sort, while tuplesortops.c
provides the implementation for particular tuple formats.

------
Regards,
Alexander Korotkov

Attachments:

0001-Remove-Tuplesortstate.copytup-v1.patchapplication/x-patch; name=0001-Remove-Tuplesortstate.copytup-v1.patchDownload+132-199
0002-Tuplesortstate.getdatum1-method-v1.patchapplication/x-patch; name=0002-Tuplesortstate.getdatum1-method-v1.patchDownload+64-37
0003-Put-abbreviation-logic-into-puttuple_common-v1.patchapplication/x-patch; name=0003-Put-abbreviation-logic-into-puttuple_common-v1.patchDownload+50-164
0004-Move-freeing-memory-away-from-writetup-v1.patchapplication/x-patch; name=0004-Move-freeing-memory-away-from-writetup-v1.patchDownload+26-39
0005-Reorganize-data-structures-v1.patchapplication/x-patch; name=0005-Reorganize-data-structures-v1.patchDownload+432-331
0006-Split-tuplesortops.c-v1.patchapplication/x-patch; name=0006-Split-tuplesortops.c-v1.patchDownload+1746-1719
#2Pavel Borisov
pashkin.elfe@gmail.com
In reply to: Alexander Korotkov (#1)
Re: Custom tuplesorts for extensions

I've bumped into this case in RUM extension. The need to build it with
tuplesort changes in different PG versions led me to reluctantly including
different tuplesort.c versions into the extension code. So I totally
support the intention of this patch and I'm planning to invest some time to
review it.

--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com <http://www.postgrespro.com&gt;

#3Pavel Borisov
pashkin.elfe@gmail.com
In reply to: Alexander Korotkov (#1)
Re: Custom tuplesorts for extensions

Some PostgreSQL extensions need to sort their pieces of data. Then it
worth to re-use our tuplesort. But despite our tuplesort having
extensibility, it's hidden inside tuplesort.c. There are at least a
couple of examples of how extensions deal with that.

1. RUM table access method: https://github.com/postgrespro/rum
RUM repository contains a copy of tuplesort.c for each major
PostgreSQL release. A reliable solution, but this is not how things
are intended to work, right?
2. OrioleDB table access method: https://github.com/orioledb/orioledb
OrioleDB runs on patches PostgreSQL. It contains a patch, which just
exposes all the guts of tuplesort.c to the tuplesort.h
https://github.com/orioledb/postgres/commit/d42755f52c

I think we need a proper way to let extension re-use our core
tuplesort facility. The attached patchset is intended to do this the
right way. Patches don't revise all the comments and lack code
beautification. The intention behind publishing this revision is to
verify the direction and get some feedback for further work.

I still have one doubt about the thing: the compatibility with previous PG
versions requires me to support code paths that I already added into RUM
extension. I won't be able to drop it from extension for quite long time in
the future. It could be avoided if we backpatch this, which seems doubtful
to me provided the volume of code changes.

If we just change this thing since say v16 this will only help to
extensions that doesn't support earlier PG versions. I still consider the
change beneficial but wonder do you have some view on how should it be
managed in existing extensions to benefit them?

--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com <http://www.postgrespro.com&gt;

#4Maxim Orlov
orlovmg@gmail.com
In reply to: Pavel Borisov (#3)
Re: Custom tuplesorts for extensions

Hi!

I've reviewed the patchset and noticed some minor issues:
- extra semicolon in macro (lead to warnings)
- comparison of var isWorker should be done in different way

Here is an upgraded version of the patchset.

Overall, I consider this patchset useful. Any opinions?

--
Best regards,
Maxim Orlov.

Attachments:

v2-0004-Move-freeing-memory-away-from-writetup.patchtext/x-patch; charset=US-ASCII; name=v2-0004-Move-freeing-memory-away-from-writetup.patchDownload+26-39
v2-0006-Split-tuplesortops.c.patchtext/x-patch; charset=US-ASCII; name=v2-0006-Split-tuplesortops.c.patchDownload+1746-1719
v2-0002-Tuplesortstate.getdatum1-method.patchtext/x-patch; charset=US-ASCII; name=v2-0002-Tuplesortstate.getdatum1-method.patchDownload+64-37
v2-0005-Reorganize-data-structures.patchtext/x-patch; charset=US-ASCII; name=v2-0005-Reorganize-data-structures.patchDownload+432-331
v2-0001-Remove-Tuplesortstate.copytup.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Remove-Tuplesortstate.copytup.patchDownload+132-199
v2-0003-Put-abbreviation-logic-into-puttuple_common.patchtext/x-patch; charset=US-ASCII; name=v2-0003-Put-abbreviation-logic-into-puttuple_common.patchDownload+50-164
#5Alexander Korotkov
aekorotkov@gmail.com
In reply to: Pavel Borisov (#3)
Re: Custom tuplesorts for extensions

Hi, Pavel!

Thank you for your feedback.

On Thu, Jun 23, 2022 at 2:26 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:

Some PostgreSQL extensions need to sort their pieces of data. Then it
worth to re-use our tuplesort. But despite our tuplesort having
extensibility, it's hidden inside tuplesort.c. There are at least a
couple of examples of how extensions deal with that.

1. RUM table access method: https://github.com/postgrespro/rum
RUM repository contains a copy of tuplesort.c for each major
PostgreSQL release. A reliable solution, but this is not how things
are intended to work, right?
2. OrioleDB table access method: https://github.com/orioledb/orioledb
OrioleDB runs on patches PostgreSQL. It contains a patch, which just
exposes all the guts of tuplesort.c to the tuplesort.h
https://github.com/orioledb/postgres/commit/d42755f52c

I think we need a proper way to let extension re-use our core
tuplesort facility. The attached patchset is intended to do this the
right way. Patches don't revise all the comments and lack code
beautification. The intention behind publishing this revision is to
verify the direction and get some feedback for further work.

I still have one doubt about the thing: the compatibility with previous PG versions requires me to support code paths that I already added into RUM extension. I won't be able to drop it from extension for quite long time in the future. It could be avoided if we backpatch this, which seems doubtful to me provided the volume of code changes.

If we just change this thing since say v16 this will only help to extensions that doesn't support earlier PG versions. I still consider the change beneficial but wonder do you have some view on how should it be managed in existing extensions to benefit them?

I don't think there is a way to help extensions with earlier PG
versions. This is a significant patchset, which shouldn't be a subject
for backpatch. The existing extensions will benefit by simplification
of maintenance for PG 16+ releases. I think that's all we can do.

------
Regards,
Alexander Korotkov

#6Alexander Korotkov
aekorotkov@gmail.com
In reply to: Maxim Orlov (#4)
Re: Custom tuplesorts for extensions

Hi, Maxim!

On Thu, Jun 23, 2022 at 3:12 PM Maxim Orlov <orlovmg@gmail.com> wrote:

I've reviewed the patchset and noticed some minor issues:
- extra semicolon in macro (lead to warnings)
- comparison of var isWorker should be done in different way

Here is an upgraded version of the patchset.

Thank you for fixing this.

Overall, I consider this patchset useful. Any opinions?

Thank you.

------
Regards,
Alexander Korotkov

#7Maxim Orlov
orlovmg@gmail.com
In reply to: Alexander Korotkov (#6)
Re: Custom tuplesorts for extensions

Hi!

Overall patch looks good let's mark it as ready for committer, shall we?

--
Best regards,
Maxim Orlov.

#8Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Maxim Orlov (#4)
Re: Custom tuplesorts for extensions

On Thu, 23 Jun 2022 at 14:12, Maxim Orlov <orlovmg@gmail.com> wrote:

Hi!

I've reviewed the patchset and noticed some minor issues:
- extra semicolon in macro (lead to warnings)
- comparison of var isWorker should be done in different way

Here is an upgraded version of the patchset.

Overall, I consider this patchset useful. Any opinions?

All of the patches are currently missing descriptive commit messages,
which I think is critical for getting this committed. As for per-patch
comments:

0001: This patch removes copytup, but it is not quite clear why -
please describe the reasoning in the commit message.

0002: getdatum1 needs comments on what it does. From the name, it
would return the datum1 from a sorttuple, but that's not what it does;
a better name would be putdatum1 or populatedatum1.

0003: in the various tuplesort_put*tuple[values] functions, the datum1
field is manually extracted. Shouldn't we use the getdatum1 functions
from 0002 instead? We could use either them directly to allow
inlining, or use the indirection through tuplesortstate.

0004: Needs a commit message, but otherwise seems fine.

0005:

+struct TuplesortOps

This struct has no comment on what it is. Something like "Public
interface of tuplesort operators, containing data directly accessable
to users of tuplesort" should suffice, but feel free to update the
wording.

+ void *arg;
+};

This field could use a comment on what it is used for, and how to use it.

+struct Tuplesortstate
+{
+    TuplesortOps ops;

This field needs a comment too.

0006: Needs a commit message, but otherwise seems fine.

Kind regards,

Matthias van de Meent

#9Andres Freund
andres@anarazel.de
In reply to: Maxim Orlov (#4)
Re: Custom tuplesorts for extensions

Hi,

I think this needs to be evaluated for performance...

I agree with the nearby comment that the commits need a bit of justification
at least to review them.

On 2022-06-23 15:12:27 +0300, Maxim Orlov wrote:

From 03b78cdade3b86a0e97723721fa1d0bd64d0c7df Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Tue, 21 Jun 2022 13:28:27 +0300
Subject: [PATCH v2 1/6] Remove Tuplesortstate.copytup

Yea. I was recently complaining about the pointlessness of copytup.

From 1d78e271b22d7c6a1557defbe15ea5039ff28510 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Tue, 21 Jun 2022 14:03:13 +0300
Subject: [PATCH v2 2/6] Tuplesortstate.getdatum1 method

Hm. This adds a bunch of indirect function calls were there previously
weren't.

From 494d46dcf938e5f824a498e38ce77782751208e1 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Tue, 21 Jun 2022 14:13:56 +0300
Subject: [PATCH v2 3/6] Put abbreviation logic into puttuple_common()

There's definitely a lot of redundancy removed... But the list of branches in
puttuple_common() grew. Perhaps we instead can add a few flags to
putttuple_common() that determine whether abbreviation should happen, so that
we only do the work necessary for the "type" of sort?

From ee2dd46b07d62e13ed66b5a38272fb5667c943f3 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Wed, 22 Jun 2022 00:14:51 +0300
Subject: [PATCH v2 4/6] Move freeing memory away from writetup()

Seems to do more than just moving freeing around?

@@ -1973,8 +1963,13 @@ tuplesort_putdatum(Tuplesortstate *state, Datum val, bool isNull)
static void
puttuple_common(Tuplesortstate *state, SortTuple *tuple)
{
+	MemoryContext oldcontext = MemoryContextSwitchTo(state->sortcontext);
+
Assert(!LEADER(state));
+	if (tuple->tuple != NULL)
+		USEMEM(state, GetMemoryChunkSpace(tuple->tuple));
+

Adding even more branches into common code...

From 3a0e1fa7c7e4da46a86f7d5b9dd0392549f3b460 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Wed, 22 Jun 2022 18:11:26 +0300
Subject: [PATCH v2 5/6] Reorganize data structures

Hard to know what this is trying to achieve.

-struct Tuplesortstate
+struct TuplesortOps
{
-	TupSortStatus status;		/* enumerated value as shown above */
-	int			nKeys;			/* number of columns in sort key */
-	int			sortopt;		/* Bitmask of flags used to setup sort */
-	bool		bounded;		/* did caller specify a maximum number of
-								 * tuples to return? */
-	bool		boundUsed;		/* true if we made use of a bounded heap */
-	int			bound;			/* if bounded, the maximum number of tuples */
-	bool		tuples;			/* Can SortTuple.tuple ever be set? */
-	int64		availMem;		/* remaining memory available, in bytes */
-	int64		allowedMem;		/* total memory allowed, in bytes */
-	int			maxTapes;		/* max number of input tapes to merge in each
-								 * pass */
-	int64		maxSpace;		/* maximum amount of space occupied among sort
-								 * of groups, either in-memory or on-disk */
-	bool		isMaxSpaceDisk; /* true when maxSpace is value for on-disk
-								 * space, false when it's value for in-memory
-								 * space */
-	TupSortStatus maxSpaceStatus;	/* sort status when maxSpace was reached */
MemoryContext maincontext;	/* memory context for tuple sort metadata that
* persists across multiple batches */
MemoryContext sortcontext;	/* memory context holding most sort data */
MemoryContext tuplecontext; /* sub-context of sortcontext for tuple data */
-	LogicalTapeSet *tapeset;	/* logtape.c object for tapes in a temp file */

/*
* These function pointers decouple the routines that must know what kind

To me it seems odd to have memory contexts and similar things in a
datastructure calls *Ops.

From b06bcb5f3666f0541dfcc27c9c8462af2b5ec9e0 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Wed, 22 Jun 2022 21:48:05 +0300
Subject: [PATCH v2 6/6] Split tuplesortops.c

I strongly suspect this will cause a slowdown. There was potential for
cross-function optimization that's now removed.

Greetings,

Andres Freund

#10Alexander Korotkov
aekorotkov@gmail.com
In reply to: Andres Freund (#9)
Re: Custom tuplesorts for extensions

.Hi!

On Wed, Jul 6, 2022 at 6:01 PM Andres Freund <andres@anarazel.de> wrote:

I think this needs to be evaluated for performance...

Surely, this needs.

I agree with the nearby comment that the commits need a bit of justification
at least to review them.

Will do this.

From 1d78e271b22d7c6a1557defbe15ea5039ff28510 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Tue, 21 Jun 2022 14:03:13 +0300
Subject: [PATCH v2 2/6] Tuplesortstate.getdatum1 method

Hm. This adds a bunch of indirect function calls were there previously
weren't.

Yep. I think it worth changing this function to deal with many
SortTuple's at once.

From 494d46dcf938e5f824a498e38ce77782751208e1 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Tue, 21 Jun 2022 14:13:56 +0300
Subject: [PATCH v2 3/6] Put abbreviation logic into puttuple_common()

There's definitely a lot of redundancy removed... But the list of branches in
puttuple_common() grew. Perhaps we instead can add a few flags to
putttuple_common() that determine whether abbreviation should happen, so that
we only do the work necessary for the "type" of sort?

Good point, will refactor that.

From ee2dd46b07d62e13ed66b5a38272fb5667c943f3 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Wed, 22 Jun 2022 00:14:51 +0300
Subject: [PATCH v2 4/6] Move freeing memory away from writetup()

Seems to do more than just moving freeing around?

Yes, it also move memory accounting from tuplesort_put*() to
puttuple_common(). Will revise this.

@@ -1973,8 +1963,13 @@ tuplesort_putdatum(Tuplesortstate *state, Datum val, bool isNull)
static void
puttuple_common(Tuplesortstate *state, SortTuple *tuple)
{
+     MemoryContext oldcontext = MemoryContextSwitchTo(state->sortcontext);
+
Assert(!LEADER(state));
+     if (tuple->tuple != NULL)
+             USEMEM(state, GetMemoryChunkSpace(tuple->tuple));
+

Adding even more branches into common code...

I will see how to reduce branching here.

From 3a0e1fa7c7e4da46a86f7d5b9dd0392549f3b460 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Wed, 22 Jun 2022 18:11:26 +0300
Subject: [PATCH v2 5/6] Reorganize data structures

Hard to know what this is trying to achieve.

Split the public interface part out of Tuplesortstate.

-struct Tuplesortstate
+struct TuplesortOps
{
-     TupSortStatus status;           /* enumerated value as shown above */
-     int                     nKeys;                  /* number of columns in sort key */
-     int                     sortopt;                /* Bitmask of flags used to setup sort */
-     bool            bounded;                /* did caller specify a maximum number of
-                                                              * tuples to return? */
-     bool            boundUsed;              /* true if we made use of a bounded heap */
-     int                     bound;                  /* if bounded, the maximum number of tuples */
-     bool            tuples;                 /* Can SortTuple.tuple ever be set? */
-     int64           availMem;               /* remaining memory available, in bytes */
-     int64           allowedMem;             /* total memory allowed, in bytes */
-     int                     maxTapes;               /* max number of input tapes to merge in each
-                                                              * pass */
-     int64           maxSpace;               /* maximum amount of space occupied among sort
-                                                              * of groups, either in-memory or on-disk */
-     bool            isMaxSpaceDisk; /* true when maxSpace is value for on-disk
-                                                              * space, false when it's value for in-memory
-                                                              * space */
-     TupSortStatus maxSpaceStatus;   /* sort status when maxSpace was reached */
MemoryContext maincontext;      /* memory context for tuple sort metadata that
* persists across multiple batches */
MemoryContext sortcontext;      /* memory context holding most sort data */
MemoryContext tuplecontext; /* sub-context of sortcontext for tuple data */
-     LogicalTapeSet *tapeset;        /* logtape.c object for tapes in a temp file */

/*
* These function pointers decouple the routines that must know what kind

To me it seems odd to have memory contexts and similar things in a
datastructure calls *Ops.

Yep, it worth renaming TuplesortOps into TuplesortPublic or something.

From b06bcb5f3666f0541dfcc27c9c8462af2b5ec9e0 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Wed, 22 Jun 2022 21:48:05 +0300
Subject: [PATCH v2 6/6] Split tuplesortops.c

I strongly suspect this will cause a slowdown. There was potential for
cross-function optimization that's now removed.

I wonder how can cross-function optimizations bypass function
pointers. Is it possible?

------
Regards,
Alexander Korotkov

#11Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#10)
Re: Custom tuplesorts for extensions

On Wed, Jul 6, 2022 at 8:45 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:

From b06bcb5f3666f0541dfcc27c9c8462af2b5ec9e0 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Wed, 22 Jun 2022 21:48:05 +0300
Subject: [PATCH v2 6/6] Split tuplesortops.c

I strongly suspect this will cause a slowdown. There was potential for
cross-function optimization that's now removed.

I wonder how can cross-function optimizations bypass function
pointers. Is it possible?

Oh, this is not just functions called by pointer. This is also
puttuple_common() etc. OK, this needs to be checked.

------
Regards,
Alexander Korotkov

#12Alexander Korotkov
aekorotkov@gmail.com
In reply to: Matthias van de Meent (#8)
Re: Custom tuplesorts for extensions

Hi, Matthias!

The revised patchset is attached.

On Wed, Jul 6, 2022 at 5:41 PM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

All of the patches are currently missing descriptive commit messages,
which I think is critical for getting this committed. As for per-patch
comments:

0001: This patch removes copytup, but it is not quite clear why -
please describe the reasoning in the commit message.

Because spit of logic between Tuplesortstate.copytup() function and
tuplesort_put*() functions is unclear. It doesn't look like we need
an abstraction here, while all the work could be done in
tuplesort_put*().

0002: getdatum1 needs comments on what it does. From the name, it
would return the datum1 from a sorttuple, but that's not what it does;
a better name would be putdatum1 or populatedatum1.

0003: in the various tuplesort_put*tuple[values] functions, the datum1
field is manually extracted. Shouldn't we use the getdatum1 functions
from 0002 instead? We could use either them directly to allow
inlining, or use the indirection through tuplesortstate.

getdatum1() was a bad name. Actually it restores original datum1
during rollback of abbreviations. I've replaced it with
removeabbrev(), which seems name to me and process many SortTuple's
during one call.

0004: Needs a commit message, but otherwise seems fine.

Commit message is added.

0005:

+struct TuplesortOps

This struct has no comment on what it is. Something like "Public
interface of tuplesort operators, containing data directly accessable
to users of tuplesort" should suffice, but feel free to update the
wording.

+ void *arg;
+};

This field could use a comment on what it is used for, and how to use it.

+struct Tuplesortstate
+{
+    TuplesortOps ops;

This field needs a comment too.

0006: Needs a commit message, but otherwise seems fine.

TuplesortOps was renamed to TuplesortPublic. Comments and commit
messages are added.

There are some places, which potentially could cause a slowdown. I'm
going to make some experiments with that.

------
Regards,
Alexander Korotkov

Attachments:

0001-Remove-Tuplesortstate.copytup-function-v2.patchapplication/x-patch; name=0001-Remove-Tuplesortstate.copytup-function-v2.patchDownload+132-199
0002-Add-new-Tuplesortstate.removeabbrev-function-v2.patchapplication/x-patch; name=0002-Add-new-Tuplesortstate.removeabbrev-function-v2.patchDownload+103-54
0003-Put-abbreviation-logic-into-puttuple_common-v2.patchapplication/x-patch; name=0003-Put-abbreviation-logic-into-puttuple_common-v2.patchDownload+56-167
0005-Split-TuplesortPublic-from-Tuplesortstate-v2.patchapplication/x-patch; name=0005-Split-TuplesortPublic-from-Tuplesortstate-v2.patchDownload+469-349
0004-Move-memory-management-away-from-writetup-and-tup-v2.patchapplication/x-patch; name=0004-Move-memory-management-away-from-writetup-and-tup-v2.patchDownload+33-46
0006-Split-tuplesortvariants.c-from-tuplesort.c-v2.patchapplication/x-patch; name=0006-Split-tuplesortvariants.c-from-tuplesort.c-v2.patchDownload+1774-1742
#13John Naylor
john.naylor@enterprisedb.com
In reply to: Alexander Korotkov (#12)
Re: Custom tuplesorts for extensions

On Tue, Jul 12, 2022 at 3:23 PM Alexander Korotkov <aekorotkov@gmail.com>
wrote:

There are some places, which potentially could cause a slowdown. I'm
going to make some experiments with that.

I haven't looked at the patches, so I don't know of a specific place to
look for a slowdown, but I thought it might help to perform the same query
tests as my most recent test for evaluating qsort variants (some
description in [1]/messages/by-id/CAFBsxsHeTACMP1JVQ+m35-v2NkmEqsJMHLhEfWk4sTB5aw_jkQ@mail.gmail.com -- John Naylor EDB: http://www.enterprisedb.com), and here is the spreadsheet. Overall, the differences
look like noise. A few cases with unabbreviatable text look a bit faster
with the patch. I'm not sure if that's a real difference, but in any case I
don't see a slowdown anywhere.

[1]: /messages/by-id/CAFBsxsHeTACMP1JVQ+m35-v2NkmEqsJMHLhEfWk4sTB5aw_jkQ@mail.gmail.com -- John Naylor EDB: http://www.enterprisedb.com
/messages/by-id/CAFBsxsHeTACMP1JVQ+m35-v2NkmEqsJMHLhEfWk4sTB5aw_jkQ@mail.gmail.com
--
John Naylor
EDB: http://www.enterprisedb.com

Attachments:

master-vs-custom-ext-v2.odsapplication/vnd.oasis.opendocument.spreadsheet; name=master-vs-custom-ext-v2.odsDownload
#14Alexander Korotkov
aekorotkov@gmail.com
In reply to: John Naylor (#13)
Re: Custom tuplesorts for extensions

Hi, John!

On Thu, Jul 21, 2022 at 6:44 AM John Naylor
<john.naylor@enterprisedb.com> wrote:

On Tue, Jul 12, 2022 at 3:23 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:

There are some places, which potentially could cause a slowdown. I'm
going to make some experiments with that.

I haven't looked at the patches, so I don't know of a specific place to look for a slowdown, but I thought it might help to perform the same query tests as my most recent test for evaluating qsort variants (some description in [1]), and here is the spreadsheet. Overall, the differences look like noise. A few cases with unabbreviatable text look a bit faster with the patch. I'm not sure if that's a real difference, but in any case I don't see a slowdown anywhere.

[1] /messages/by-id/CAFBsxsHeTACMP1JVQ+m35-v2NkmEqsJMHLhEfWk4sTB5aw_jkQ@mail.gmail.com

Great, thank you very much for the feedback!

------
Regards,
Alexander Korotkov

#15Pavel Borisov
pashkin.elfe@gmail.com
In reply to: Alexander Korotkov (#14)
Re: Custom tuplesorts for extensions

I've looked through the updated patch. Overall it looks good enough.

Some minor things:

- PARALLEL_SORT macro is based on coordinate struct instead of state
struct. In some calls(i.e. from _bt_spools_heapscan) coordinate could
appear to be NULL, which can be a segfault on items dereference inside the
macro.

- state->worker and coordinate->isWorker a little bit differ in semantics
i.e.:
..............................................worker............... leader
state -> worker........................ >=0.....................-1
coordinate ->isWorker............. 1..........................0

- in tuplesort_begin_index_btree I suppose it should be base->nKeys instead
of state->nKeys

- Cfbot reports gcc warnings due to mixed code and declarations. So I used
this to beautify code in tuplesortvariants.c a little. (This is added as a
separate patch 0007)

All these things are corrected/done in a new version 3 of a patchset (PFA).
For me, the patchset seems like a long-needed thing to support PostgreSQL
extensibility. Overall corrections in v3 are minor, so I'd like to mark the
patch as RfC if there are no objections.

--
Best regards,
Pavel Borisov
Supabase.

Attachments:

v3-0001-Remove-Tuplesortstate.copytup-function.patchapplication/octet-stream; name=v3-0001-Remove-Tuplesortstate.copytup-function.patchDownload+132-199
v3-0007-Tuplesort-code-beautification.patchapplication/octet-stream; name=v3-0007-Tuplesort-code-beautification.patchDownload+252-189
v3-0006-Split-tuplesortvariants.c-from-tuplesort.c.patchapplication/octet-stream; name=v3-0006-Split-tuplesortvariants.c-from-tuplesort.c.patchDownload+1775-1743
v3-0004-Move-memory-management-away-from-writetup-and-tup.patchapplication/octet-stream; name=v3-0004-Move-memory-management-away-from-writetup-and-tup.patchDownload+33-46
v3-0005-Split-TuplesortPublic-from-Tuplesortstate.patchapplication/octet-stream; name=v3-0005-Split-TuplesortPublic-from-Tuplesortstate.patchDownload+471-350
v3-0003-Put-abbreviation-logic-into-puttuple_common.patchapplication/octet-stream; name=v3-0003-Put-abbreviation-logic-into-puttuple_common.patchDownload+56-167
v3-0002-Add-new-Tuplesortstate.removeabbrev-function.patchapplication/octet-stream; name=v3-0002-Add-new-Tuplesortstate.removeabbrev-function.patchDownload+103-54
#16Alexander Korotkov
aekorotkov@gmail.com
In reply to: Pavel Borisov (#15)
Re: Custom tuplesorts for extensions

Hi, Pavel!

Thank you for your review and corrections.

On Fri, Jul 22, 2022 at 6:57 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:

I've looked through the updated patch. Overall it looks good enough.

Some minor things:

- PARALLEL_SORT macro is based on coordinate struct instead of state struct. In some calls(i.e. from _bt_spools_heapscan) coordinate could appear to be NULL, which can be a segfault on items dereference inside the macro.

- state->worker and coordinate->isWorker a little bit differ in semantics i.e.:
..............................................worker............... leader
state -> worker........................ >=0.....................-1
coordinate ->isWorker............. 1..........................0

- in tuplesort_begin_index_btree I suppose it should be base->nKeys instead of state->nKeys

Perfect, thank you!

- Cfbot reports gcc warnings due to mixed code and declarations. So I used this to beautify code in tuplesortvariants.c a little. (This is added as a separate patch 0007)

It appears that warnings were caused by the extra semicolon in
TuplesortstateGetPublic() macro. I've removed that semicolon, and I
don't think we need a beautification patch. Also, please note that
there is no point to add indentation, which doesn't survive pgindent.

All these things are corrected/done in a new version 3 of a patchset (PFA). For me, the patchset seems like a long-needed thing to support PostgreSQL extensibility. Overall corrections in v3 are minor, so I'd like to mark the patch as RfC if there are no objections.

Thank you. I've also revised the comments in the top of tuplesort.c
and tuplesortvariants.c. The revised patchset is attached.

Also, my OrioleDB colleagues Ilya Kobets and Tatsiana Yaumenenka run
tests to check if the patchset causes a performance regression. The
script and results are present in the "tuplesort_patch_test.zip"
archive. The final comparison is given in the result/final_table.txt.
In short, they repeat each test 10 times and there is no difference
exceeding the random variation.

------
Regards,
Alexander Korotkov

Attachments:

0001-Remove-Tuplesortstate.copytup-function-v4.patchapplication/octet-stream; name=0001-Remove-Tuplesortstate.copytup-function-v4.patchDownload+132-199
0004-Move-memory-management-away-from-writetup-and-tup-v4.patchapplication/octet-stream; name=0004-Move-memory-management-away-from-writetup-and-tup-v4.patchDownload+33-46
0002-Add-new-Tuplesortstate.removeabbrev-function-v4.patchapplication/octet-stream; name=0002-Add-new-Tuplesortstate.removeabbrev-function-v4.patchDownload+103-54
0005-Split-TuplesortPublic-from-Tuplesortstate-v4.patchapplication/octet-stream; name=0005-Split-TuplesortPublic-from-Tuplesortstate-v4.patchDownload+471-350
0003-Put-abbreviation-logic-into-puttuple_common-v4.patchapplication/octet-stream; name=0003-Put-abbreviation-logic-into-puttuple_common-v4.patchDownload+56-167
0006-Split-tuplesortvariants.c-from-tuplesort.c-v4.patchapplication/octet-stream; name=0006-Split-tuplesortvariants.c-from-tuplesort.c-v4.patchDownload+1784-1746
tuplesort_patch_test.zipapplication/zip; name=tuplesort_patch_test.zipDownload+0-1
#17Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#16)
Re: Custom tuplesorts for extensions

On Sun, Jul 24, 2022 at 3:24 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:

Also, my OrioleDB colleagues Ilya Kobets and Tatsiana Yaumenenka run
tests to check if the patchset causes a performance regression. The
script and results are present in the "tuplesort_patch_test.zip"
archive. The final comparison is given in the result/final_table.txt.
In short, they repeat each test 10 times and there is no difference
exceeding the random variation.

I see the last revision passed cfbot without warnings. I've added the
meta information to commit messages. Also, I've re-run through the
thread and it seems all the comments are addressed. I'm going to push
this if there are no objections.

------
Regards,
Alexander Korotkov

Attachments:

0004-Move-memory-management-away-from-writetup-and-tup-v5.patchapplication/x-patch; name=0004-Move-memory-management-away-from-writetup-and-tup-v5.patchDownload+33-46
0002-Add-new-Tuplesortstate.removeabbrev-function-v5.patchapplication/x-patch; name=0002-Add-new-Tuplesortstate.removeabbrev-function-v5.patchDownload+103-54
0005-Split-TuplesortPublic-from-Tuplesortstate-v5.patchapplication/x-patch; name=0005-Split-TuplesortPublic-from-Tuplesortstate-v5.patchDownload+471-350
0001-Remove-Tuplesortstate.copytup-function-v5.patchapplication/x-patch; name=0001-Remove-Tuplesortstate.copytup-function-v5.patchDownload+132-199
0003-Put-abbreviation-logic-into-puttuple_common-v5.patchapplication/x-patch; name=0003-Put-abbreviation-logic-into-puttuple_common-v5.patchDownload+56-167
0006-Split-tuplesortvariants.c-from-tuplesort.c-v5.patchapplication/x-patch; name=0006-Split-tuplesortvariants.c-from-tuplesort.c-v5.patchDownload+1784-1746
#18Justin Pryzby
pryzby@telsasoft.com
In reply to: Alexander Korotkov (#17)
Re: Custom tuplesorts for extensions

Note that 0001+0002 (without the others) incurs warnings:

$ time { make -j4 clean; make -j4; } >/dev/null
tuplesort.c:1883:9: warning: unused variable 'i' [-Wunused-variable]
tuplesort.c:1955:10: warning: unused variable 'i' [-Wunused-variable]
tuplesort.c:2026:9: warning: unused variable 'i' [-Wunused-variable]
tuplesort.c:2103:10: warning: unused variable 'i' [-Wunused-variable]

(I wondered in the past if cfbot should try to test for clean builds of subsets
of patchsets, and it came up recently with the JSON patches.)

Also, this comment has some bad indentation:

* Set state to be consistent with never trying abbreviation.

--
Justin

#19Alexander Korotkov
aekorotkov@gmail.com
In reply to: Justin Pryzby (#18)
Re: Custom tuplesorts for extensions

Hi Justin!

On Mon, Jul 25, 2022 at 2:23 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

Note that 0001+0002 (without the others) incurs warnings:

$ time { make -j4 clean; make -j4; } >/dev/null
tuplesort.c:1883:9: warning: unused variable 'i' [-Wunused-variable]
tuplesort.c:1955:10: warning: unused variable 'i' [-Wunused-variable]
tuplesort.c:2026:9: warning: unused variable 'i' [-Wunused-variable]
tuplesort.c:2103:10: warning: unused variable 'i' [-Wunused-variable]

(I wondered in the past if cfbot should try to test for clean builds of subsets
of patchsets, and it came up recently with the JSON patches.)

Also, this comment has some bad indentation:

* Set state to be consistent with never trying abbreviation.

Thank you for caching this. Fixed in the revision attached.

Testing subsets of patchsets in cfbot looks like a good idea to me.
However, I'm not sure if we always require subsets to be consistent.

------
Regards,
Alexander Korotkov

Attachments:

0004-Move-memory-management-away-from-writetup-and-tup-v6.patchapplication/octet-stream; name=0004-Move-memory-management-away-from-writetup-and-tup-v6.patchDownload+33-46
0001-Remove-Tuplesortstate.copytup-function-v6.patchapplication/octet-stream; name=0001-Remove-Tuplesortstate.copytup-function-v6.patchDownload+133-199
0002-Add-new-Tuplesortstate.removeabbrev-function-v6.patchapplication/octet-stream; name=0002-Add-new-Tuplesortstate.removeabbrev-function-v6.patchDownload+89-49
0003-Put-abbreviation-logic-into-puttuple_common-v6.patchapplication/octet-stream; name=0003-Put-abbreviation-logic-into-puttuple_common-v6.patchDownload+56-159
0005-Split-TuplesortPublic-from-Tuplesortstate-v6.patchapplication/octet-stream; name=0005-Split-TuplesortPublic-from-Tuplesortstate-v6.patchDownload+471-350
0006-Split-tuplesortvariants.c-from-tuplesort.c-v6.patchapplication/octet-stream; name=0006-Split-tuplesortvariants.c-from-tuplesort.c-v6.patchDownload+1784-1746
#20Pavel Borisov
pashkin.elfe@gmail.com
In reply to: Alexander Korotkov (#19)
Re: Custom tuplesorts for extensions

Thank you for caching this. Fixed in the revision attached.

Testing subsets of patchsets in cfbot looks like a good idea to me.
However, I'm not sure if we always require subsets to be consistent.

Hi, hackers!

I've looked through a new v6 of a patchset and find it ok. When applied
0001+0002 only I don't see warnings anymore. Build and tests are successful
and Cfbot also looks good. I've marked the patch as RfC.

--
Best regards,
Pavel Borisov

#21Alexander Korotkov
aekorotkov@gmail.com
In reply to: Pavel Borisov (#20)