About Custom Aggregates, C Extensions and Memory

Started by Marthin Laubscher5 months ago7 messages

postgres@lobeshare.co.za

5 months ago

Hi,

I’ll skip over context (until someone asks or I can showcase the results) and cut to the chase:

A custom aggregate seems the best vehicle for what I seek to implement. Given the processing involved, it’s probably best written in C.

That makes the aggregate and opaque value encoded and compressed to an internal format that allows direct equality testing and comparison. For everything else it needs to be decoded into memory, worked on and then encoded into a value as expected by the database ecosystem.

The challenge being that decoding and encoding presents a massive overhead (easily 2 orders of magnitude or more) compared to the lightning fast operations to e.g. add or remove a value from the aggregate while in memory, killing performance and limiting potential.

Naturally I’m looking for feasible options to retain and reuse accumulator values decoded in memory at least between successive calls when aggregating a set of values, and ideally, also when the aggregate is used again later in the same session or query, within reason of course.

I’ve tried but failed to gain the sufficient understanding of the life and limits of palloc/palloc0 memory in the aggregate and C extension context from the documentation to reformulate my algorithms for such environment around whatever opportunities might exist.

My project needs it, I cannot postpone much longer. But I need a sherpa, someone who knows the terrain, pathways and pitfalls, who’ll engage with me for a little while to alleviate my ignorance and uncertainties. If I knew in advance what all the questions were, they’d no longer be questions, but I don’t. Albeit as async and terse as you like, I need a conversation that can discard unworkable alternatives early and focus instead on whatever is compatible with the environment.

When we’re done, I’d be happy to write up and suggest the documentation updates I believe would have circumvented this cry for help.

Who’s willing to help me find my way up this mountain or turn it into a mole-hill?

Regards,

Marthin Laubscher

Tom Lane

tgl@sss.pgh.pa.us

5 months ago

In reply to: Marthin Laubscher (#1)

Re: About Custom Aggregates, C Extensions and Memory

Marthin Laubscher <postgres@lobeshare.co.za> writes:

A custom aggregate seems the best vehicle for what I seek to implement. Given the processing involved, it’s probably best written in C.
That makes the aggregate and opaque value encoded and compressed to an internal format that allows direct equality testing and comparison. For everything else it needs to be decoded into memory, worked on and then encoded into a value as expected by the database ecosystem.
The challenge being that decoding and encoding presents a massive overhead (easily 2 orders of magnitude or more) compared to the lightning fast operations to e.g. add or remove a value from the aggregate while in memory, killing performance and limiting potential.

Yeah. What you want is to declare the aggregate as having transtype
"internal" (which basically means that ExecAgg will store a pointer
for you) and make that pointer point to a data structure kept in the
"aggcontext", which will have a suitable lifespan. json_agg() might
be a suitable example to look at. Keep in mind that the finalfn
mustn't modify the stored state, as there are optimizations where
it'll be applied more than once.

regards, tom lane

Marthin Laubscher

postgres@lobeshare.co.za

5 months ago

In reply to: Tom Lane (#2)

Re: About Custom Aggregates, C Extensions and Memory

Tom Lane < tgl@sss.pgh.pa.us> writes:

Yeah. What you want is to declare the aggregate as having transtype
"internal" (which basically means that ExecAgg will store a pointer
for you) and make that pointer point to a data structure kept in the
"aggcontext", which will have a suitable lifespan. json_agg() might
be a suitable example to look at. Keep in mind that the finalfn
mustn't modify the stored state, as there are optimizations where
it'll be applied more than once.

Thank you.

Stunned, once more, by the incredible depth of prior art in the Postgres ecosystem, allow me to ask before underestimating it again.

I was fixing on (ab)using user-defined aggregates for a concept that would, on second thought, be better described as a key-only table persisted and manipulated as a single opaque value that is only ever expanded to a table/set/collection in memory. Initial focus is on storing 8-byte integer IDs but bigger UUIDs might be needed down the line.

I've leaned away from involving any composite or table semantics for no better reason than inadequate understanding of in-memory table opportunities and limitations. I ended up planning a user-defined type with the last base-type on the list - pseudo type - which includes "internal" that I would define the aggregate to produce values for.

Then I read your mail, some more documentation, and discovered
a) Moving-Aggregate Mode,
b) which has a separate implementation to account for forward and inverse transition (which is relevant to my use case),
d) accessible mainly via window functions (which are not that relevant to my use case),
d) doesn't cover my "other" other operations including set-operations like intersection, difference, union,
e) which all would benefit from being able to reuse the same decoded into memory representation as the aggregate.

Could PostgreSQL be as brilliantly put together that the internal pseudo base type for a user defined type and the internal transtype in a user defined aggregate are one and the same thing, essentially extending the lifespan and utility of the in-memory version further towards what I really need?

What would be your thoughts and advice here? Are there more prior-art gems that tamed this wilderness before? Am I justifiably or too easily scared by in-memory table semantics and implications when what I'm dealing with really amount to sets rather than (opaque) scalar values? I definitely want to store values representing an entire set at time in a column value, there's no place for names - the opaque value IS the identity of the set (mathematic bijection), which runs orthogonal to the original Sequel, now SQL, but are there other facilities around I'm duplicating here or have enough in common with to leverage?

Thanks again for your time and your utterly unique and dedicated approach keeping such firm yet friendly control of PostgreSQL. It is truly appreciated and noticed, especially in the chaotic context of open source. 8 years your junior I spent a life as design authority, but in a corporate setting, meaning I find dealing with open-source people worse than herding cats, so I think you're doing incredibly well. Thank you.

Regards,
Marthin Laubscher

Tom Lane

tgl@sss.pgh.pa.us

5 months ago

In reply to: Marthin Laubscher (#3)

Re: About Custom Aggregates, C Extensions and Memory

Marthin Laubscher <postgres@lobeshare.co.za> writes:

I was fixing on (ab)using user-defined aggregates for a concept that would, on second thought, be better described as a key-only table persisted and manipulated as a single opaque value that is only ever expanded to a table/set/collection in memory. Initial focus is on storing 8-byte integer IDs but bigger UUIDs might be needed down the line.

Hm. We do not have in-memory tables, although in some cases a
temporary table is close enough. But there is one other pre-existing
mechanism that might help you: "expanded objects". The idea there is
that you have some "flat" representation of your data type that can
go into a table at need, but you also have an in-memory representation
that is better suited to computation, and most of your complicated
operations prefer to work on the expanded representation. The name of
the game then becomes how to minimize the number of times a value gets
flattened and expanded as you push it around in a computation.
As of v18, we have a pretty decent story for that when it comes to
values you are manipulating within pl/pgsql functions, although less
so if you need to use other languages. (The expanded-object
functionality exists before v18, but it's hard for user-defined types
to avoid excess flattening in earlier versions.)

If that sounds promising, see

src/include/utils/expandeddatum.h
src/backend/utils/adt/expandeddatum.c

as well as the two in-core uses of the concept:

src/backend/utils/adt/array_expanded.c
src/backend/utils/adt/expandedrecord.c

This thread that led to the v18 improvements might be useful too:

/messages/by-id/CACxu=vJaKFNsYxooSnW1wEgsAO5u_v1XYBacfVJ14wgJV_PYeg@mail.gmail.com

regards, tom lane

Marthin Laubscher

postgres@lobeshare.co.za

5 months ago

In reply to: Tom Lane (#4)

Re: About Custom Aggregates, C Extensions and Memory

Tom Lane tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us> wrote:

Hm. We do not have in-memory tables, although in some cases a temporary table is close enough.

Yay, I didn't somehow overlook them.

But there is one other pre-existing mechanism that might help you: "expanded objects". The idea there is that you have some "flat" representation of your data type that can go into a table at need, but you also have an in-memory representation that is better suited to computation, and most of your complicated operations prefer to work on the expanded representation. The name of the game then becomes how to minimize the number of times a value gets flattened and expanded as you push it around in a computation.

As of v18, we have a pretty decent story for that when it comes to values you are manipulating within pl/pgsql functions, although less so if you need to use other languages. (The expanded-object functionality exists before v18, but it's hard for user-defined types to avoid excess flattening in earlier versions.)

If that sounds promising, see...

I don't yet command the internal variety to engage with the complexities of Expanded Objects, which I thought formed part of the TOAST domain I was trying not to get too caught up in just yet.

I suspect that putting emphasis on the opacity of the aggregate value made it unclear that the I'd like for values of my UDT to look a lot like ordinary variable length binary data values for which the <, > and = operations apply with no decoding required. I only need the sticky decoding for more involved operations like aggregation, various set type operations including the simplistic adding or removing a scalar value rather than a set of size one, and testing set membership of scalars index values with whichever is more appropriate between IN or ANY. I implemented the first trivial / naïve attempts in plpgsql but that environment really wasn't conducive to get the real functions written, but I once spoke C better than English (not a joke) so I'm pretty confident I can and should do it there.

Another, who shall not be named, suggested I take a look at the different memory contexts so I had a look. If I understood correctly, the User Defined Type itself does store something analogous to the aggcontext in an aggregate, but in the input and output functions as well as other operators and functions on the UDT I could define an internal memory structure, switch to a memory context with a suitable lifespan, allocate and manage memory using the appropriate functions along the lines StringInfo does and remembering to switch back to the original memory context the each function gets called with. There's even a suggestion that UDT specific custom memory contexts may be created in the TopMemoryContext using AllocSetContextCreate with some name or identifier. I'd have to figure out what to use for such identifier. Maybe there UDT instance already has a database or tuple ID that can be adopted for that purpose. It looks like I'd have to find a way to broker between "normal function memory contexts" and per tuple aggregate memory contexts but there are language in the comments around that about what is and isn't kosher that I don't know how to interpret..

I don't particularly trust the source. I've paraphrased the above but not all the terms mentioned could be found in the GitHub code base. So it could be utter bollocks, outdated information, never meant for public consumption, or entirely accurate and useful.

Does hearing me mention such things cause you nightmares, either for the perils I'd be facing or the mess it is likely to cause in your beloved database, or could it be the start of a plan that just might work?

Essentially the aggregate functions would still be front and centre as defined for the user defined type, and though the user defined type itself would be largely unaware of it, all the individual functions that manipulate values of the UDT would go through the same process of getting access to the value in decoded for if it already exist before calling the decoding routines if it doesn't. If I choose the right memory context, would that simply age-out when the session, transaction, query or aggregate is done, or how what else would know we're done with the memory so we can let go of it? As for transitioning between per tuple aggregate context and normal function context a plan can be devised if the transition points can be detected, to copy stuff across in memory. Should be possible e.g. to do that in a final function, even without disturbing the value on the aggregate side of things as advised before.

You're forgiven for thinking I am crazy to consider manually manipulating memory contexts simpler than the high-level support functions created to toast and detoast values at the plpgsql level of abstraction. I'm a big fan of failing early and hard when things (besides user input) isn't as expected, and also a big fan of explicit programming, i.e. as little magical side-effects as possible.

The SuiteSparse:GraphBLASTS stuff that sparked the conversation you referred to is very likely to play a role in my project somewhere down the line. They don't know the first thing about what I'm doing and why, I might not always see eye to eye with all the players from the domains where their work has found fertile grounds, and I'm far from ready for such complications, but it seems inevitable that our paths will cross at some point. Thanks for the reference, however unintentional.

Regards,
Marthin Laubscher

Tom Lane

tgl@sss.pgh.pa.us

5 months ago

In reply to: Marthin Laubscher (#5)

Re: About Custom Aggregates, C Extensions and Memory

Marthin Laubscher <postgres@lobeshare.co.za> writes:

Essentially the aggregate functions would still be front and centre as defined for the user defined type, and though the user defined type itself would be largely unaware of it, all the individual functions that manipulate values of the UDT would go through the same process of getting access to the value in decoded for if it already exist before calling the decoding routines if it doesn't. If I choose the right memory context, would that simply age-out when the session, transaction, query or aggregate is done, or how what else would know we're done with the memory so we can let go of it?

Well, yeah, that's the problem. You can certainly maintain your own
persistent data structure somewhere, but then it's entirely on your
head to manage it and avoid memory leakage/bloating as you process
more and more data. The mechanisms I pointed you at provide a
structure that makes sure space gets reclaimed when there's no longer
a reference to it, but if you go the roll-your-own route then it's
a lot messier.

A mechanism that might work well enough is a transaction-lifespan
hash table. You could look at, for example, uncommitted_enum_types
in pg_enum.c for sample code.

regards, tom lane

Marthin Laubscher

postgres@lobeshare.co.za

5 months ago

In reply to: Tom Lane (#6)

Re: About Custom Aggregates, C Extensions and Memory

Tom Lane tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us> wrote:

Well, yeah, that's the problem. You can certainly maintain your own persistent data structure somewhere, but then it's entirely on your head to manage it and avoid memory leakage/bloating as you process more and more data. The mechanisms I pointed you at provide a structure that makes sure space gets reclaimed when there's no longer a reference to it, but if you go the roll-your-own route then it's a lot messier.

Of course, I'm sole owner and admin of every instance of PostgreSQL involved, but a great many things have always been and will remain on my head if I "gots it wrongly", so yes, rolling my own would be an option of last resort. I was able to "see" how the aggregate memory mechanism worked but confess that I must still connect the dots in the expanded datum case. I know you're seeing something there I don't yet recognise, so I'll look again.

A mechanism that might work well enough is a transaction-lifespan hash table. You could look at, for example, uncommitted_enum_types in pg_enum.c for sample code.

And at that, naturally.

Trying hard to not get too far ahead of myself here, perhaps I should try running into the foreseen performance issue before addressing it. How about I make a trivial implementation (skipping over the complex optimised processing, compression and advanced logic ensuring canonical compressed values) of the aggregate and user defined type the aggregate would calculate, and put that up for a review first.

If all goes to script, the result ought to be that the aggregate would nicely reuse the decoded version of the aggregate in memory and forget it existed when the final function has been run. But then each user defined type function getting called would decode the stored byte array, do its bit on the data, and encode to a byte array again.

It won't be optimal, but it will be simple, and go a long way towards ensuring I got the whole type and aggregate ecosystem set up as it should be. It will also settle any doubts as to whether membership tests would use "IN" or "ANY" semantics.

Step next would be to use say the transaction memory context to retain the decoded version in memory between calls to different (non-aggregate) UDT functions (made in the same transaction). Beyond finding the appropriate context to use, the challenge, I understand, would be to identify the particular instance of the UDT in memory, which is where you're suggesting using a hash table. With fewer outstanding vagaries and uncertainties, the way forward might be quite obvious by then, but from where I stand right now I can only think of worse ways than hashing, so it's most likely be hashing.

That said, the two most important reality checks we'd have to consider at that point would be:

a) whether there are ever enough, as in more than three, of these values present in the same memory context to warrant the hash calculation overhead since a full comparison will be required anyway, and

b) whether the special characteristic of my UDT values where identical values have, by definition, identical structure in memory offers enough opportunity to use a simple reference count to decide if a value can/should be changed in-place, if the changed value should be written to its own fresh slot or just need an increased reference count on memory that already represent that value.

Once we cracked the intra-function nut, we could look for a way to share memory between the aggregate- and other UDT-functions.

Either way, I've got a boat load more reading and writing to do. Thank you ever so much for your time and attention. It is a great kindness, well appreciated.

Regards,
Marthin Laubscher