Re: Review: extension template

Started by Hannu Krosingover 12 years ago6 messages

hannu@krosing.net

over 12 years ago

On 07/08/2013 09:26 AM, Heikki Linnakangas wrote:

On 08.07.2013 00:48, Markus Wanner wrote:

On 07/07/2013 09:51 PM, Dimitri Fontaine wrote:

The design we found to address that is
called "Extension Templates" and is implemented in the current patch.

I placed my concerns with the proposed implementation. It's certainly
not the only way how Postgres can manage its extensions. And I still
hope we can come up with something that's simpler to use and easier to
understand.

I'm just now dabbling back to this thread after skipping a lot of
discussion, and I'm disappointed to see that this still seems to be
running in circles on the same basic question: What exactly is the
patch trying to accomplish.

The whole point of extensions, as they were originally implemented, is
to allow them to be managed *outside* the database. In particular,
they are not included in pg_dump. If you do want them to be included
in pg_dump, why create it as an extension in the first place? Why not
just run the create script and create the functions, datatypes etc.
directly, like you always did before extensions were even invented.

I think the reason is that extensions provide some handy packaging of
the functions etc, so that you can just do "DROP EXTENSION foo" to get
rid of all of them. Also, pg_extension table keeps track of the
currently installed version. Perhaps we need to step back and invent
another concept that is totally separate from extensions, to provide
those features. Let's call them "modules". A module is like an
extension, in that all the objects in the module can be dropped with a
simple "DROP MODULE foo" command. To create a module, you run "CREATE
MODULE foo AS <SQL script to create the objects in the module>".

I believe that would be pretty much exactly what Dimitri's original
inline extension patches did, except that it's not called an
extension, but a module. I think it's largely been the naming that has
been the problem with this patch from the very beginning. We came up
with the concept of templates after we had decided that the originally
proposed behavior was not what we want from something called
extensions. But if you rewind to the very beginning, the problem was
just with the name. The concept was useful, but not something we want
to call an extension, because the distinguishing feature of an
extension is that it lives outside the database and is *not* included
in pg_dump.

Either MODULE or PACKAGE would be good name candidates.

Still, getting this functionality in seems more important than exact
naming, though naming them "right" would be nice.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: 51DA69B1.6090603@vmware.comReference msg id not found: 51D718EF.8030709@bluegap.chReference msg id not found: m261wnsa7p.fsf@2ndQuadrant.frReference msg id not found: 51D96537.8070008@bluegap.chReference msg id not found: m2wqp2p2r5.fsf@2ndQuadrant.frReference msg id not found: 51D9E243.8010302@bluegap.chReference msg id not found: 51DA69B1.6090603@vmware.com

Dimitri Fontaine

dimitri@2ndQuadrant.fr

over 12 years ago

In reply to: Hannu Krosing (#1)

On 07/08/2013 09:26 AM, Heikki Linnakangas wrote:

I'm just now dabbling back to this thread after skipping a lot of
discussion, and I'm disappointed to see that this still seems to be
running in circles on the same basic question: What exactly is the
patch trying to accomplish.

Bypassing the file system entirely in order to install an extension. As
soon as I figure out how to, including C-coded extensions.

I think the reason is that extensions provide some handy packaging of
the functions etc, so that you can just do "DROP EXTENSION foo" to get
rid of all of them. Also, pg_extension table keeps track of the
currently installed version. Perhaps we need to step back and invent
another concept that is totally separate from extensions, to provide

The main feature of the extensions system is its ability to have a clean
pg_restore process even when you use some extensions. That has been the
only goal of the whole feature development.

Let me stress that the most important value in that behavior is to be
able to pg_restore using a newer version of the extension, the one that
works with the target major version. When upgrading from 9.2 to 9.3 if
you depend on keywords that now are reserved you need to install the
newer version of the extension at pg_restore time.

The main features I'm interested into beside a clean pg_restore are
UPDATE scripts for extensions and dependency management, even if that
still needs improvements. Those improvements will be relevant for both
ways to make extensions available for your system.

those features. Let's call them "modules". A module is like an
extension, in that all the objects in the module can be dropped with a
simple "DROP MODULE foo" command. To create a module, you run "CREATE
MODULE foo AS <SQL script to create the objects in the module>".

Not again the naming. A module is already documented as a shared object
library (.so, .dll or .dylib) that PostgreSQL will LOAD for you. A patch
has already been proposed to track which module is loaded in a session
and offer that in a new system's view, pg_module.

We can not use the name "module" for anything else, IMNSHO.

just with the name. The concept was useful, but not something we want
to call an extension, because the distinguishing feature of an
extension is that it lives outside the database and is *not* included
in pg_dump.

The main goal here is not to have the extension live inside the database
but rather to be able to bypass using the server's filesystem in order
to be able to CREATE EXTENSION foo; and then to still have pg_restore do
the right thing on its own.

If you want to scratch the new catalogs part, then just say that it's
expected to be really complex to pg_restore a database using extensions,
back to exactly how it was before 9.1: create the new database, create
the extensions your dump depends on in that new database, now pg_restore
your backup manually filtering away the extensions' objects or ignoring
the errors when pg_restore tries to duplicate functions you already
installed in the previous step. No fun.

Hannu Krosing <hannu@krosing.net> writes:

Either MODULE or PACKAGE would be good name candidates.

The name "package" is even worse than the "module" one because lots of
people think they know exactly what is a package for having been using a
closed source product that you might have heard of: they are trying to
cope with our ability to implement new features on a yearly basis while
not breaking anything we already have.

Still, getting this functionality in seems more important than exact
naming, though naming them "right" would be nice.

Of course we want to do it right™.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Markus Wanner

markus@bluegap.ch

over 12 years ago

In reply to: Hannu Krosing (#1)

On 06/10/2013 09:43 PM, Hannu Krosing wrote:

On 07/08/2013 09:26 AM, Heikki Linnakangas wrote:

The concept was useful, but not something we want
to call an extension, because the distinguishing feature of an
extension is that it lives outside the database and is *not* included
in pg_dump.

Either MODULE or PACKAGE would be good name candidates.

Still, getting this functionality in seems more important than exact
naming, though naming them "right" would be nice.

Remember that we already have quite a lot of extensions. And PGXN. Are
we really so wedded to the idea of extensions "living" outside of the
database that we need to come up with something different and incompatible?

Or do you envision modules or packages to be compatible with extensions?
Just putting another label on it so we can still claim extensions are
strictly external to the database? Sorry, I don't get the idea, there.

From a users perspective, I want extensions, modules, or packages to be
managed somehow. Including upgrades, migrations (i.e. dump & restore)
and removal. The approach of letting the distributors handle that
packaging clearly has its limitations. What's so terribly wrong with
Postgres itself providing better tools to manage those?

Inventing yet another type of extension, module or package (compatible
or not) doesn't help, but increases confusion even further. Or how do
you explain to an author of an existing extension, whether or not he
should convert his extension to a module (if you want those to be
incompatible)?

If it's the same thing, just with different loading mechanisms, please
keep calling it the same: an extension. (And maintain compatibility
between the different ways to load it.)

I fully agree with the fundamental direction of Dimitri's patch. I think
Postgres needs to better manage its extensions itself. Including dump
and restore cycles. However, I think the implementation isn't optimal,
yet. I pointed out a few usability issues and gave reasons why
"template" is a misnomer (with the proposed implementation). "Extension"
is not.

(I still think "template" would be a good mental model. See my other
thread...
http://archives.postgresql.org/message-id/51D72C1D.7010609@bluegap.ch)

Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Markus Wanner

markus@bluegap.ch

over 12 years ago

In reply to: Dimitri Fontaine (#2)

On 07/08/2013 10:20 AM, Dimitri Fontaine wrote:

Bypassing the file system entirely in order to install an extension. As
soon as I figure out how to, including C-coded extensions.

Do I understand correctly that you want to keep the extensions (or their
templates) out of the dump and require the user to "upload" it via libpq
prior to the restor; instead of him having to install them via .deb or .rpm?

This would explain why you keep the CREATE TEMPLATE FOR EXTENSION as a
separate step from CREATE EXTENSION. And why you, too, insist on wanting
templates, and not just a way to create an extension via libpq.

However, why don't you follow the template model more closely? Why
should the user be unable to create a template, if there already exists
an extension of the same name? That's an unneeded and disturbing
limitation, IMO.

My wish: Please drop the pg_depend link between template and extension
and make the templates shared across databases. So I also have to
install the template only once per cluster. Keep calling them templates,
then. (However, mind that file-system extension templates are templates
as well. In-line vs. out-of-line templates, if you want.)

I think you could then safely allow an upgrade of an extension that has
been created from an out-of-line template by an upgrade script that
lives in-line. And vice-versa. Just as an example. It all gets nicer and
cleaner, if the in-line thing better matches the out-of-line one, IMO.

An extension should look and behave exactly the same, independent of
what kind of template it has been created from. And as we obviously
cannot add a pg_depend link to a file on the file system, we better
don't do that for the in-line variant, either, to maintain the symmetry.

The main feature of the extensions system is its ability to have a clean
pg_restore process even when you use some extensions. That has been the
only goal of the whole feature development.

Great! Very much appreciated.

Let me stress that the most important value in that behavior is to be
able to pg_restore using a newer version of the extension, the one that
works with the target major version. When upgrading from 9.2 to 9.3 if
you depend on keywords that now are reserved you need to install the
newer version of the extension at pg_restore time.

The main features I'm interested into beside a clean pg_restore are
UPDATE scripts for extensions and dependency management, even if that
still needs improvements. Those improvements will be relevant for both
ways to make extensions available for your system.

We can not use the name "module" for anything else, IMNSHO.

Agreed.

The main goal here is not to have the extension live inside the database
but rather to be able to bypass using the server's filesystem in order
to be able to CREATE EXTENSION foo; and then to still have pg_restore do
the right thing on its own.

Note that with the current, out-of-line approach, the *extension*
already lives inside the database. It's just the *template*, that
doesn't. (Modulo DSO, but the patch doesn't handle those either, yet. So
we're still kind of excluding those.)

Allowing for templates to live inside the database as well is a good
thing, IMO.

If you want to scratch the new catalogs part, then just say that it's
expected to be really complex to pg_restore a database using extensions,
back to exactly how it was before 9.1: create the new database, create
the extensions your dump depends on in that new database, now pg_restore
your backup manually filtering away the extensions' objects or ignoring
the errors when pg_restore tries to duplicate functions you already
installed in the previous step. No fun.

Definitely not. Nobody wants to go back there. (And as Heikki pointed
out, if you absolutely want to, you can even punish yourself that way.)

Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Peter Eisentraut

peter_e@gmx.net

over 12 years ago

In reply to: Dimitri Fontaine (#2)

On 7/8/13 4:20 AM, Dimitri Fontaine wrote:

Let me stress that the most important value in that behavior is to be
able to pg_restore using a newer version of the extension, the one that
works with the target major version. When upgrading from 9.2 to 9.3 if
you depend on keywords that now are reserved you need to install the
newer version of the extension at pg_restore time.

I think there is an intrinsic conflict here. You have things inside the
database and outside. When they depend on each other, it gets tricky.
Extensions were invented to copy with that. They do the job, more or
less. Now you want to take the same mechanism and apply it entirely
inside the database. But that wasn't the point of extensions! That's
how you get definitional issues like, should extensions be dumped or not.

I don't believe the above use case. (Even if I did, it's marginal.)
You should always be able to arrange things so that an upgrade of an
inside-the-database-package is possible before or after pg_restore.
pg_dump and pg_restore are interfaces between the database and the
outside. They should have nothing to do with upgrading things that live
entirely inside the database.

There would be value to inside-the-database package management, but it
should be a separate concept.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Markus Wanner

markus@bluegap.ch

over 12 years ago

In reply to: Peter Eisentraut (#5)

Peter,

On 07/09/2013 11:04 PM, Peter Eisentraut wrote:

I think there is an intrinsic conflict here. You have things inside the
database and outside. When they depend on each other, it gets tricky.
Extensions were invented to copy with that. They do the job, more or
less.

I agree. And to extend upon that, I think it's important to distinguish
between the created extension and the available one, i.e. the template.
Only the template lives outside. The created extension itself is firmly
sitting in the database, possibly with multiple dependencies from other
objects. It does not dependent on anything outside of the database
(assuming the absence of a DSO of the extension, which does not follow
that template concept).

And yes, we decided the objects that are part of the extension should
not get dumped with pg_dump. Nobody argues to change that. Note,
however, that this very decision is what raises the "intrinsic conflict"
for pg_restore, because CREATE EXTENSION in the dump depends on the
outside extension. If anything, Dimitri's patch solves that.

Now you want to take the same mechanism and apply it entirely
inside the database. But that wasn't the point of extensions! That's
how you get definitional issues like, should extensions be dumped or not.

IMO the point of extensions is to extend Postgres (with code that's not
part of core). Whether their templates (SQL sources, if you want) are
stored on the file system (outside) or within the database is irrelevant
to the concept.

Think of it that way: Take one of those FUSE-Postgres-FS things [1]for example, pgfuse: database in user-space filesystem accessing a Postgresql database: https://github.com/andreasbaumann/pgfuse,
which uses Postgres as the backend for a file system and allows you to
store arbitrary files. Mount that to the extensions directory of your
Postgres instance and make your extension templates available there
(i.e. copy them there). CREATE EXTENSION would just work, reading the
templates for the extension to create from itself, via that fuse
wrapper. (If the FUSE wrapper itself was using an extension, you'd get
into an interesting chicken and egg problem, but even that would be
resolvable, because the installed extension doesn't depend on the
template it was created from.)

Think of Dimitri's patch as a simpler and more elegant way to achieve
the very same thing. (Well, modulo our disagreement about the dependency
between extension and templates.) And as opposed to the file system or
fuse approach, you'd even gain transactional safety, consistency (i.e. a
constraint can enforce a full version exists as the basis for an upgrade
script), etc... But who am I to tell you the benefits of storing data in
a database?

Of course, you then also want to be able to backup your templates (not
the extensions) stored in the database. Just like you keep a backup of
your file-system templates. Either by simply making a copy, or maybe by
keeping an RPM or DEB package of it available. Thus, of course,
templates for extensions need to be dumped as well.

I don't believe the above use case. (Even if I did, it's marginal.)
You should always be able to arrange things so that an upgrade of an
inside-the-database-package is possible before or after pg_restore.

Dimitri's scenario assumes an old and a new version of an extension as
well as an old and a new Postgres major version. Where the old extension
is not compatible with the new Postgres major version. Which certainly
seems like a plausible scenario to me (postgis-2.0 is not compatible
with Postgres-9.3, for example - granted, it carries a DSO, so it's not
really a good example).

Given how extensions work, to upgrade to the new Postgres major version
*and* to the required new version of the extension, you don't ever need
to "upgrade an inside-the-database-package". Instead, you need to:

createdb -> provide templates -> CREATE EXTENSION -> restore data

Now, CREATE EXTENSION and restoring your data has effectively been
merged for the user, as pg_dump emits proper CREATE EXTENSION commands.
"Providing templates" so far meant installing an RPM or DEB. Or copying
the files manually.

But in fact, how and where you provide templates for the extension is
irrelevant to that order. And the possibility to merge the second step
into the 'restore data' step certainly sounds appealing to me.

pg_dump and pg_restore are interfaces between the database and the
outside. They should have nothing to do with upgrading things that live
entirely inside the database.

I don't get your point here. In my view, libpq is intended to modify the
things that live inside the database, including extensions (i.e. ALTER
EXTENSION ADD FUNCTION). Or what kind of "things that live entirely
inside the database" do you have in mind.

There would be value to inside-the-database package management, but it
should be a separate concept.

Anything that's incompatible to extensions is not gonna fly. There are
too many of them available, already. We need to ease management of
those, not come up with yet another concept.

Regards

Markus Wanner

[1]: for example, pgfuse: database in user-space filesystem accessing a Postgresql database: https://github.com/andreasbaumann/pgfuse
database in user-space filesystem accessing a Postgresql database:
https://github.com/andreasbaumann/pgfuse

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers