Add Postgres module info
Hi,
I would like to propose the module_info structure, which aims to let
extension maintainers sew some data into the binary file. Being included
in the module code, this information remains unchanged and is available
for reading by a backend.
As I see, this question was already debated during the introduction of
PG_MODULE_MAGIC [1]/messages/by-id/20060507211705.GB3808@svana.org, and developers didn't add such data because the
practical case wasn't obvious. Right now, when multiple extensions are
maintained and used in installations constantly, we have enough
information to continue this discussion.
The idea was initially born with the support of the extension, which had
a stable UI and frequently changing library code. To answer customer
requests, it was necessary to know the specific version of the code to
reproduce the situation in a test environment. That required introducing
a file naming rule and a specific exported variable into the module
code. However, this is not a sufficient guarantee of version
determination and complicates the technical process when supporting many
extensions obtained from different sources.
Another example is a library without an installation script at all - see
auto_explain. Without a 'CREATE EXTENSION' call, we don't have an option
to find out the library's version.
It would be much easier if the Postgres catalogue contained a function,
for example, module_info(module_name), which would allow you to
determine the file's full path and name containing the desired module in
the psql console and its version.
On the other hand, the Omnigres project (author Yurii Rashkovski) also
came up with the idea of module versioning, although it does this
externally out of the Postgres core. When designing this code, I also
adopted ideas from this repository.
So, let me propose a patch that introduces this tiny feature: the
maintainer can add the PG_MODULE_INFO macro to the library code, and
Postgres reads it on the module's load.
There is a question of how much information makes sense to add to the
module. For now, each time I prepare extensions to release, I have to
add the extension name (to avoid issues with file naming/renaming) and
the version. Format of the version storage? Do we need a separate minor
version number? It is a subject to debate.
[1]: /messages/by-id/20060507211705.GB3808@svana.org
/messages/by-id/20060507211705.GB3808@svana.org
--
regards, Andrei Lepikhov
Attachments:
v0-0001-Introduce-MODULE_INFO-macro.patchtext/plain; charset=UTF-8; name=v0-0001-Introduce-MODULE_INFO-macro.patchDownload+117-3
Andrei Lepikhov <lepihov@gmail.com> writes:
I would like to propose the module_info structure, which aims to let
extension maintainers sew some data into the binary file. Being included
in the module code, this information remains unchanged and is available
for reading by a backend.
I don't have much of an opinion one way or the other about the
usefulness of these additional info fields. But I would like to
object to the way you've gone about it, namely to copy-and-paste
the magic-block mechanism. That doesn't scale: the next time
somebody else wants some more fields, will we have three such
structs?
The approach we foresaw using was that we could simply add more
fields to Pg_magic_struct (obviously, only in a major version).
That's happened at least once already - abi_extra was not there
to begin with.
There are a couple of ways that we could deal with the API
seen by module authors:
1. The PG_MODULE_MAGIC macro keeps the same API and leaves the
additional field(s) empty. Authors who want to fill the
extra field(s) use a new macro, say PG_MODULE_MAGIC_V2.
2. PG_MODULE_MAGIC gains some arguments, forcing everybody
to change their code. While this would be annoying, it'd be
within our compatibility rules for a major version update.
I wouldn't do it though unless there were a compelling reason
why everybody should fill these fields.
3. Maybe we could do something with making PG_MODULE_MAGIC
variadic, but I've not thought hard about what that could
look like. In any case it'd only be a cosmetic improvement
over the above ways.
4. The extra fields are filled indirectly by macros that
extension authors can optionally provide (a variant on the
FMGR_ABI_EXTRA mechanism). This would be code-order-sensitive
so I'm not sure it's really a great idea.
5. Something I didn't think of?
With any of these except #4, authors who want their source code to
support multiple PG major versions would be forced into using #if
tests on CATALOG_VERSION_NO to decide what to write. That's a
bit annoying but hardly unusual.
regards, tom lane
Hi,
On 2024-12-11 13:21:03 -0500, Tom Lane wrote:
Andrei Lepikhov <lepihov@gmail.com> writes:
I would like to propose the module_info structure, which aims to let
extension maintainers sew some data into the binary file. Being included
in the module code, this information remains unchanged and is available
for reading by a backend.I don't have much of an opinion one way or the other about the
usefulness of these additional info fields.
FWIW, Id like to have some more information in there, without commenting on
the specifics.
But I would like to object to the way you've gone about it, namely to
copy-and-paste the magic-block mechanism. That doesn't scale: the next time
somebody else wants some more fields, will we have three such structs?
I agree with that.
The approach we foresaw using was that we could simply add more
fields to Pg_magic_struct (obviously, only in a major version).
That's happened at least once already - abi_extra was not there
to begin with.There are a couple of ways that we could deal with the API
seen by module authors:1. The PG_MODULE_MAGIC macro keeps the same API and leaves the
additional field(s) empty. Authors who want to fill the
extra field(s) use a new macro, say PG_MODULE_MAGIC_V2.2. PG_MODULE_MAGIC gains some arguments, forcing everybody
to change their code. While this would be annoying, it'd be
within our compatibility rules for a major version update.
I wouldn't do it though unless there were a compelling reason
why everybody should fill these fields.
I'd like to avoid needing to do this again if / when we invent the next set of
optional arguments. So just having a different macro with a hardcoded set of
arguments or changing PG_MODULE_MAGIC to have a hardcoded set of arguments
doesn't seem great.
To be future proof, I think it'd be good to declare the arguments as
designated initializers. E.g. like
PG_MODULE_MAGIC_EXT(
.version = 10000,
.threadsafe = 1
);
where the macro would turn the arguments into a struct initializer inside
Pg_magic_struct.
That way we can add/remove arguments and only extensions that use
removed arguments need to change.
3. Maybe we could do something with making PG_MODULE_MAGIC
variadic, but I've not thought hard about what that could
look like. In any case it'd only be a cosmetic improvement
over the above ways.
Yea, it'd be nice to avoid needing an _EXT or _V2. But I can't immediately
think of a way that allows a macro with no arguments and and an argument.
4. The extra fields are filled indirectly by macros that
extension authors can optionally provide (a variant on the
FMGR_ABI_EXTRA mechanism). This would be code-order-sensitive
so I'm not sure it's really a great idea.
Agreed.
With any of these except #4, authors who want their source code to
support multiple PG major versions would be forced into using #if
tests on CATALOG_VERSION_NO to decide what to write. That's a
bit annoying but hardly unusual.
#2 would be bit more annoying than #1, I'd say, because it'd affect every
single extension, even ones not interested in any of this.
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
On 2024-12-11 13:21:03 -0500, Tom Lane wrote:
There are a couple of ways that we could deal with the API
seen by module authors:
To be future proof, I think it'd be good to declare the arguments as
designated initializers. E.g. like
PG_MODULE_MAGIC_EXT(
.version = 10000,
.threadsafe = 1
);
Yeah, I'd come to pretty much the same conclusion after sending
my email. That looks like it should work and be convenient
to extend further.
The other possibly-non-obvious bit is that we should probably
invent a sub-structure holding the ABI-related fields, so as to
minimize the amount of rewriting needed in dfmgr.c.
regards, tom lane
On Wed, Dec 11, 2024, at 4:26 PM, Andres Freund wrote:
On 2024-12-11 13:21:03 -0500, Tom Lane wrote:
Andrei Lepikhov <lepihov@gmail.com> writes:
I would like to propose the module_info structure, which aims to let
extension maintainers sew some data into the binary file. Being included
in the module code, this information remains unchanged and is available
for reading by a backend.I don't have much of an opinion one way or the other about the
usefulness of these additional info fields.FWIW, Id like to have some more information in there, without commenting on
the specifics.
+1 for the general idea. I received some reports like [1]https://github.com/eulerto/wal2json/issues/181 related to wal2json
that people wants to obtain the output plugin version. Since it is not installed
via CREATE EXTENSION, it is not possible to detect what version is installed,
hence, some tools cannot have some logic to probe the module version. In a
managed environment, it is hard to figure out the exact version for
non-CREATE-EXTENSION modules, unless it is explicitly informed by the vendor.
[1]: https://github.com/eulerto/wal2json/issues/181
--
Euler Taveira
EDB https://www.enterprisedb.com/
On 12/12/2024 01:21, Tom Lane wrote:
Andrei Lepikhov <lepihov@gmail.com> writes:
I would like to propose the module_info structure, which aims to let
extension maintainers sew some data into the binary file. Being included
in the module code, this information remains unchanged and is available
for reading by a backend.I don't have much of an opinion one way or the other about the
usefulness of these additional info fields. But I would like to
object to the way you've gone about it, namely to copy-and-paste
the magic-block mechanism. That doesn't scale: the next time
somebody else wants some more fields, will we have three such
structs?
It makes sense. But I want to clarify that I avoided changing
PG_MODULE_MAGIC because the newly introduced structure has a totally
different purpose and usage logic: the struct is designed to check
compatibility, but module info isn't connected to the core version at
all: a single version of the code may be built for multiple PG versions.
At the same time, various versions of the same library may be usable
with the same core.
From the coding point of view, I agree that your approach is more
laconic and reasonable. I will rewrite the code using this approach.
--
regards, Andrei Lepikhov
"Euler Taveira" <euler@eulerto.com> writes:
+1 for the general idea. I received some reports like [1] related to wal2json
that people wants to obtain the output plugin version. Since it is not installed
via CREATE EXTENSION, it is not possible to detect what version is installed,
hence, some tools cannot have some logic to probe the module version. In a
managed environment, it is hard to figure out the exact version for
non-CREATE-EXTENSION modules, unless it is explicitly informed by the vendor.
What would you foresee as the SQL API for inspecting a module that's
not tied to an extension?
regards, tom lane
Andrei Lepikhov <lepihov@gmail.com> writes:
It makes sense. But I want to clarify that I avoided changing
PG_MODULE_MAGIC because the newly introduced structure has a totally
different purpose and usage logic: the struct is designed to check
compatibility, but module info isn't connected to the core version at
all: a single version of the code may be built for multiple PG versions.
At the same time, various versions of the same library may be usable
with the same core.
Surely. But I don't see a need for two separately-looked-up
physical structures. Seems to me it's sufficient to put the
ABI-checking fields into a sub-struct within the magic block.
regards, tom lane
On Wed, Dec 11, 2024 at 08:34:28PM -0500, Tom Lane wrote:
"Euler Taveira" <euler@eulerto.com> writes:
+1 for the general idea. I received some reports like [1] related to wal2json
that people wants to obtain the output plugin version. Since it is not installed
via CREATE EXTENSION, it is not possible to detect what version is installed,
hence, some tools cannot have some logic to probe the module version. In a
managed environment, it is hard to figure out the exact version for
non-CREATE-EXTENSION modules, unless it is explicitly informed by the vendor.What would you foresee as the SQL API for inspecting a module that's
not tied to an extension?
Rather than a function that can be called with a specific module name
in input, invent a new system SRF function that would report back for
a process all the libraries that have been loaded in it? Presumably,
the extra tracking can be done in dfmgr.c with more fields added to
DynamicFileList to track the information involved.
Being able to print the information of DynamicFileList can be argued
as useful on its own as long as its execution can be granted, with
superuser right by default.
--
Michael
Michael Paquier <michael@paquier.xyz> writes:
On Wed, Dec 11, 2024 at 08:34:28PM -0500, Tom Lane wrote:
What would you foresee as the SQL API for inspecting a module that's
not tied to an extension?
Rather than a function that can be called with a specific module name
in input, invent a new system SRF function that would report back for
a process all the libraries that have been loaded in it?
Yeah, that could work.
Presumably,
the extra tracking can be done in dfmgr.c with more fields added to
DynamicFileList to track the information involved.
I wouldn't add any overhead to the normal case for this. Couldn't
we walk the list and re-fetch each module's magic block inside
this new function?
regards, tom lane
On Wed, Dec 11, 2024 at 10:39:38PM -0500, Tom Lane wrote:
Michael Paquier <michael@paquier.xyz> writes:
Presumably,
the extra tracking can be done in dfmgr.c with more fields added to
DynamicFileList to track the information involved.I wouldn't add any overhead to the normal case for this. Couldn't
we walk the list and re-fetch each module's magic block inside
this new function?
Depends on how much we should try to cache to make that less expensive
on repeated calls because we cannot unload libraries, but sure, I
don't see why we could not that for each SQL function call to simplify
the logic and the structures in place.
--
Michael
On 12/12/24 10:44, Michael Paquier wrote:
On Wed, Dec 11, 2024 at 10:39:38PM -0500, Tom Lane wrote:
Michael Paquier <michael@paquier.xyz> writes:
Presumably,
the extra tracking can be done in dfmgr.c with more fields added to
DynamicFileList to track the information involved.I wouldn't add any overhead to the normal case for this. Couldn't
we walk the list and re-fetch each module's magic block inside
this new function?Depends on how much we should try to cache to make that less expensive
on repeated calls because we cannot unload libraries, but sure, I
don't see why we could not that for each SQL function call to simplify
the logic and the structures in place.
I want to say that 'cannot unload libraries' is a negative outcome of
the architecture. It would be better to invent something like
PG_unregister, allowing libraries to at least return a hook routine call
back to the system.
So, maybe it makes sense to design this feature with re-fetching
libraries, supposing it is already implemented somehow and elements of
the DynamicFileList may be removed.
--
regards, Andrei Lepikhov
On 12/12/24 08:36, Tom Lane wrote:
Andrei Lepikhov <lepihov@gmail.com> writes:
It makes sense. But I want to clarify that I avoided changing
PG_MODULE_MAGIC because the newly introduced structure has a totally
different purpose and usage logic: the struct is designed to check
compatibility, but module info isn't connected to the core version at
all: a single version of the code may be built for multiple PG versions.
At the same time, various versions of the same library may be usable
with the same core.Surely. But I don't see a need for two separately-looked-up
physical structures. Seems to me it's sufficient to put the
ABI-checking fields into a sub-struct within the magic block.
Okay, I've rewritten the patch to understand how it works. It seems to
work pretty well. I added separate fields for minor and major versions.
--
regards, Andrei Lepikhov
Attachments:
v1-0001-Introduce-PG_MODULE_MAGIC_EXT-macro.patchtext/x-patch; charset=UTF-8; name=v1-0001-Introduce-PG_MODULE_MAGIC_EXT-macro.patchDownload+116-8
On Thu, Dec 12, 2024 at 3:41 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
On 12/12/24 08:36, Tom Lane wrote:
Andrei Lepikhov <lepihov@gmail.com> writes:
It makes sense. But I want to clarify that I avoided changing
PG_MODULE_MAGIC because the newly introduced structure has a totally
different purpose and usage logic: the struct is designed to check
compatibility, but module info isn't connected to the core version at
all: a single version of the code may be built for multiple PG versions.
At the same time, various versions of the same library may be usable
with the same core.Surely. But I don't see a need for two separately-looked-up
physical structures. Seems to me it's sufficient to put the
ABI-checking fields into a sub-struct within the magic block.Okay, I've rewritten the patch to understand how it works. It seems to
work pretty well. I added separate fields for minor and major versions.
I am keenly interested in helping in this area; as you have mentioned, I've
done similar work using an extension.
Some thoughts/questions:
1. Do we need to latch onto the "magic" structure here? Have we considered
an opportunity to create a separate metadata slot that looks something like
`PG_MODULE_INFO(.version = ...)`. My impression of module magic was that it
should rather be populated during the build – to provide build-time
information. MODULE_INFO would be a rather informational section supplied
by the developer.
2. Any reasons to dictate MAJ.MIN format? With semantic versioning abound,
it's rather common to use MAJ.MIN.PATCH. There are also other extensions to
it (like pre-releases, builds, etc.). All of these indicate distinct
versions. The differences between them can be figured out using semver or
other parsers. Pure PL/pgSQL implementations of that exist [1]https://github.com/bigsmoke/pg_text_semver.
3. In my work, I also introduced the concept of stable module identity – a
unique string (for example, UUID) that represents the identity of the
module even if its name is going to change. Admittedly, this is not _the
most common_ type of problem, but I anticipate it becoming more of an issue
with the growth of the extension ecosystem, potential name clashes, and
renamings. With this approach, developers assign this unique string to a
module once at the beginning and never change it. Have you considered this?
Hi,
On 2024-12-12 11:35:56 +0700, Andrei Lepikhov wrote:
On 12/12/24 10:44, Michael Paquier wrote:
On Wed, Dec 11, 2024 at 10:39:38PM -0500, Tom Lane wrote:
Michael Paquier <michael@paquier.xyz> writes:
Presumably,
the extra tracking can be done in dfmgr.c with more fields added to
DynamicFileList to track the information involved.I wouldn't add any overhead to the normal case for this. Couldn't
we walk the list and re-fetch each module's magic block inside
this new function?Depends on how much we should try to cache to make that less expensive
on repeated calls because we cannot unload libraries, but sure, I
don't see why we could not that for each SQL function call to simplify
the logic and the structures in place.I want to say that 'cannot unload libraries' is a negative outcome of the
architecture. It would be better to invent something like PG_unregister,
allowing libraries to at least return a hook routine call back to the
system.
So, maybe it makes sense to design this feature with re-fetching libraries,
supposing it is already implemented somehow and elements of the
DynamicFileList may be removed.
I am quite certain we'll not support unloading libraries anytime soon. We used
to support it and it caused problems... Changing anything about how exactly
things are tracked in dfmgr.c will be the smallest part of supporting
unloading libraries again.
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
On 2024-12-12 11:35:56 +0700, Andrei Lepikhov wrote:
I want to say that 'cannot unload libraries' is a negative outcome of the
architecture. It would be better to invent something like PG_unregister,
allowing libraries to at least return a hook routine call back to the
system.
I am quite certain we'll not support unloading libraries anytime soon. We used
to support it and it caused problems... Changing anything about how exactly
things are tracked in dfmgr.c will be the smallest part of supporting
unloading libraries again.
Indeed. However, I don't see what that has to do with the current
discussion anyway. The proposed SRF would iterate through whatever
is in the DynamicFileList. What does it care whether there's a way
to add or remove entries?
regards, tom lane
On 12/12/24 21:02, Yurii Rashkovskii wrote:
On Thu, Dec 12, 2024 at 3:41 PM Andrei Lepikhov <lepihov@gmail.com
<mailto:lepihov@gmail.com>> wrote:On 12/12/24 08:36, Tom Lane wrote:
Andrei Lepikhov <lepihov@gmail.com <mailto:lepihov@gmail.com>>
writes:
It makes sense. But I want to clarify that I avoided changing
PG_MODULE_MAGIC because the newly introduced structure has a totally
different purpose and usage logic: the struct is designed to check
compatibility, but module info isn't connected to the coreversion at
all: a single version of the code may be built for multiple PG
versions.
At the same time, various versions of the same library may be usable
with the same core.Surely. But I don't see a need for two separately-looked-up
physical structures. Seems to me it's sufficient to put the
ABI-checking fields into a sub-struct within the magic block.Okay, I've rewritten the patch to understand how it works. It seems to
work pretty well. I added separate fields for minor and major versions.I am keenly interested in helping in this area; as you have mentioned,
I've done similar work using an extension.Some thoughts/questions:
1. Do we need to latch onto the "magic" structure here? Have we
considered an opportunity to create a separate metadata slot that looks
something like `PG_MODULE_INFO(.version = ...)`. My impression of module
magic was that it should rather be populated during the build – to
provide build-time information. MODULE_INFO would be a rather
informational section supplied by the developer.
It has already been debated above. I may agree with colleagues that
maintainer-provided information should be stored in the magic field to
reduce noise.
At the same time, we use a single code part to load all that data into
the DynamicFileList. That looks pretty well, isn't it?
2. Any reasons to dictate MAJ.MIN format? With semantic versioning
abound, it's rather common to use MAJ.MIN.PATCH. There are also other
extensions to it (like pre-releases, builds, etc.). All of these
indicate distinct versions. The differences between them can be figured
out using semver or other parsers. Pure PL/pgSQL implementations of that
exist [1].
Okay, thanks; that's a good catch. I wonder how to follow these rules
with a static fixed-sized structure. I would like to read about any
suggestions and implementation examples.
3. In my work, I also introduced the concept of stable module identity –
a unique string (for example, UUID) that represents the identity of the
module even if its name is going to change. Admittedly, this is not _the
most common_ type of problem, but I anticipate it becoming more of an
issue with the growth of the extension ecosystem, potential name
clashes, and renamings. With this approach, developers assign this
unique string to a module once at the beginning and never change it.
Have you considered this?
This option just needs some live examples. I think, if necessary, it
could be added later.
--
regards, Andrei Lepikhov
Andrei Lepikhov <lepihov@gmail.com> writes:
On 12/12/24 21:02, Yurii Rashkovskii wrote:
2. Any reasons to dictate MAJ.MIN format? With semantic versioning
abound, it's rather common to use MAJ.MIN.PATCH.
Okay, thanks; that's a good catch. I wonder how to follow these rules
with a static fixed-sized structure. I would like to read about any
suggestions and implementation examples.
There's nothing stopping a field of the magic block from being
a "const char *" pointer to a string literal.
regards, tom lane
On 12/13/24 10:17, Tom Lane wrote:
Andrei Lepikhov <lepihov@gmail.com> writes:
On 12/12/24 21:02, Yurii Rashkovskii wrote:
2. Any reasons to dictate MAJ.MIN format? With semantic versioning
abound, it's rather common to use MAJ.MIN.PATCH.Okay, thanks; that's a good catch. I wonder how to follow these rules
with a static fixed-sized structure. I would like to read about any
suggestions and implementation examples.There's nothing stopping a field of the magic block from being
a "const char *" pointer to a string literal.
Ok, See v.2 in attachment.
--
regards, Andrei Lepikhov
Attachments:
v2-0001-Introduce-PG_MODULE_MAGIC_EXT-macro.patchtext/x-patch; charset=UTF-8; name=v2-0001-Introduce-PG_MODULE_MAGIC_EXT-macro.patchDownload+110-8
On Dec 11, 2024, at 19:49, Euler Taveira <euler@eulerto.com> wrote:
FWIW, Id like to have some more information in there, without commenting on
the specifics.+1 for the general idea.
Same.
I received some reports like [1] related to wal2json
that people wants to obtain the output plugin version. Since it is not installed
via CREATE EXTENSION, it is not possible to detect what version is installed,
hence, some tools cannot have some logic to probe the module version.
I’m all for additional metadata for native extensions, but I’d also like to draw attention to the “Future” section my proposal[1]https://justatheory.com/2024/11/rfc-extension-packaging-lookup/#future-deprecate-load to require that module-only extensions also include a control file and be loadable via CREATE EXTENSION (and proposed *_preload_extensions GUCs[2]https://justatheory.com/2024/11/rfc-extension-packaging-lookup/#extension-preloading). This would unify how all types of extensions are added to a database, and would include version information as for all other CREATE EXTENSION extensions.
Not a mutually-exclusive proposal, of course; I think it makes sense to have metadata included in the binary itself. Would be useful to compare against what CREATE EXTENSION thinks is the version and raising an error or warning when they diverge.
Best,
David
[1]: https://justatheory.com/2024/11/rfc-extension-packaging-lookup/#future-deprecate-load
[2]: https://justatheory.com/2024/11/rfc-extension-packaging-lookup/#extension-preloading