Package support for Postgres

Started by Bill Studenmundover 24 years ago53 messages

wrstuden@netbsd.org

over 24 years ago

3 attachment(s)

Zembu has decided to release the result of a recent Postgres developement
project to the Postgres project. This project (for which I was the lead
developer) adds Oracle-like package support to Postgres. I'm in the
process of making a version of the patch which is relative to the current
cvs tree. The change is fairly encompasing, weighing in at around
800k of unified diffs, of which about 200k are the real meat. Before I
send it in, though, I thought I'd see what people think of the idea. Oh,
this feature would definitly be a 7.3 feature, 7.2 is too close to the
door for this to go in. :-)

This message is rather long. I've divided it into sections which start
with "**".

** What are Packages

So what are packages? In Oracle, they are a feature which helps developers
make stored procedures and functions. They provide a name space for
functions local to the package, session-specific package variables, and
initialization routines which are run before any other routine in the
package. Also, all parts of a package are loaded and unloaded at once -
you can't have it partially installed.

All of these features make life much easier for stored-procedure
developers. The name space feature means that you can name the routines in
your package whatever you want, and they won't conflict with the names
either in other packages or with functions not in a package. All you need
to do is ensure that no other package has the same name as yours.

** What did I do, and what does a package declaration look like?

What I've done is impliment Oracle packages with a Postgres flair. There
is a new command, CREATE PACKAGE <name> AS which defines a package. For
those of you up on Oracle packages, this command duplicates the Oracle
CREATE PACKAGE BODY command - there is no Postgres equivalent of the
Oracle CREATE PACKAGE command.

Packages are listed in a new system table, pg_package, and are referenced
in other tables by the oid of the row in pg_package.

There are seven different components which can be present in a package,
and so a CREATE PACKAGE command contains seven stanza types. A package can
be made up of functions, types, operators, aggregates, package-global
variables, initialization routines, and functions usable for type
declarations. Four of the stanzas are easy to understand; to create a
function, a type, an aggregate, or an operator, you include a stanza which
is the relevant CREATE command without the CREATE keyword. Thus the
FUNCTION stanza creates a function, the TYPE stanza creates a type,
AGGREGATE => an aggregate, and OPERATOR => an operator.

The initializer routines and package-global variables are done a bit
differently than in Oracle, reflecting Postgres's strength at adding
languages. Postgres supports six procedural languages (plpgsql, pltcl,
pltclu, plperl, plperlu, and plpython) whereas I think Oracle only
supports two (PL/SQL and I herad they added a java PL). The main
difference is that the variables and the initializer routines are language
specific. So you can have different variables for plpgsql than for pltcl.
Likewise for initializers.

Package-global variables are defined as:
DECLARE <variable name> '<variable type>' [, <next name> '<next type>' ]
LANGUAGE 'langname'

The type is delimited by single quotes so that the postgres parser didn't
have to learn the syntax of each procedural language's variable types.

Initializer routines are declared like normal functions, except the
function name and signature (number & type of arguements and return type)
are not given. The name is automatically generated (it is __packinit_
followed by the language name) and the function signature should not be
depended on. It is to take no parameters and return an int4 for now, but
that should probably change whenever PG supports true procedures.
Initializer routines are declared as:
BODY AS 'function body' LANGUAGE 'lanname' [with <with options>]

I'm attaching a sample showing a package initialization routine and global
variable declaration. There's a syntax error in it, which I asked about in
another EMail.

The last component of a package are the functions usable for type
declarations. They are declared as:
BEFORE TYPE FUNCTION <standard package function declaration>

They are useful as the normal functions in a package are declared after
the types are declared, so that they can use a type newly-defined in a
package. Which is fine, except that to define a type, you have to give an
input and an output function. BEFORE TYPE FUNCTIONs are used to define
those functions. Other than exactly when they are created in package
loading, they are just like other functions in the package.

I'm attaching an example which defines the type 'myint4' (using the
internal int4 routines) and proceeds to declare routines using the new
type.

** So how do I use things in a package?

You don't have to do anything special to use a type or an operator defined
in a package - you just use it. Getting technical, operators and types in
packages are in the same name space as are types and operators not in
packages. To follow along with the example I attached above, the 'myint4'
type is usable in the typetest package, in tables, in other packages, and
in "normal" functions.

For functions and aggregates, things are a little more complicated. First
off, there is a package called "standard" which contains all types,
aggregates, operators, and functions which aren't in a specific package.
This includes all of the standard Postgres routines, and anything created
with CREATE FUNCTION, CREATE AGGREGATE, CREATE OPERATOR, and CREATE TYPE.

Secondly, parsing is always done in terms of a specified package context.
If we are parsing an equation in a routine inside of a package, then the
package context is that package. If we are just typing along in psql, then
the package context is "standard".

When you specify a function or aggregate, you have two choices. One is to
specify a package, and a function in that package, like
"nametest.process" to specify the "process" function in the "nametest"
package.

The other choice is to just give the function's name. The first place
Postgres will look is in the package context used for parsing. If it's not
there (and that context wasn't "standard"), then it will look in
"standard". So for example in the type declaration example attached, the
type stanza uses "myint4in" and "myint4out" as the input and output
routines, and finds the ones declared as part of the package.

I've attached a sample showing off namespaces. It has two non-package
routines, and one package named "nametest".

Here's a sample session:

testing=# select standard.process(4);
process
------------------
I am in standard
(1 row)
testing=# select nametest.process(4);
process
---------------------
I am in the package
(1 row)
testing=# select nametest.docheck();
docheck
---------------------
I am in the package
(1 row)

First we see that the standard.process() routine says it is in the
"standard" package, and that the nametest.process() routine says it is in
the package. Then we call the nametest.docheck() routine.

It evaluates "process(changer(4));" in the context of the nametest
package. We find the process() routine in the package, and use it.

The changer routine is there to test how typecasting works. It verifies
that Postgres would typecast the return of changer into a different
integer and call the process() routine in the package rather than call the
process() routine in standard. This behavior matches Oracle's.

The other routines in the package show of some examples of how sql will
parse according to the above rules.

Initialization routines:

There is only one recomended way to use them: call a function written in
the same PL in the package. That will cause the initialization routine to
be run. Assuming there are no errors, the routine you call won't be
executed until after the initialization routine finishes.

Of course the non-recomended way is to manually call __packinit_<langname>
directly. The problem with that is that you are depending on
implimentation details which might change. Like exactly how the name is
generated (which probably won't change) and the calling convention (which
hopefully will if procedures are ever suported).

Package-global variables:

Just use them. Assuming that the procedural language supports global
variables, they just work. Note that as with Oracle, each backend will get
its own set of variables. No effort is made to coordinate values across
backends. But chances are you don't want to do that, and if you did, just
make a table. :-)

** So what is the state of the diffs?

The diffs contain changes to last week's current (I'll cvs update before
sending out) which add package support to the backend, plpgsql, the SPI
interface, initdb, and pg_dump. The changes also include modifying the
system schema to support packages (pg_package which lists packages,
pg_packglobal which list global variables, and adding a package identifier
to pg_aggretage, pg_operator, pg_proc and pg_type).

The big things missing are documentation, and regression tests which
explicitly test packages.

Also, plpgsql is the only PL with package support. Adding package support
doesn't make sense for the 'C' and 'internal' languages, as you can
manually add "global" variables and initialization routines yourself. It
also doesn't make sense for 'sql' as sql doesn't support variables. The
other languages need to gain package support, and I'll appreciate help
from their authors. :-)

So I'd better wrap up here. Monday I'll send the diffs to the patches
list, and also send a message talking about more of the details of the
changes.

What do folks think?

Take care,

Bill

Jean-Michel POURE

jm.poure@freesurf.fr

about 24 years ago

In reply to: Bill Studenmund (#1)

Re: [HACKERS] Package support for Postgres

What do folks think?
Take care,
Bill

Hello Bill,

The community have been waiting for packages for a long time. I don't
believe you did it!!!

IMHO most applications do not fully benefit from the power of PostgreSQL
because transactions are performed at application lever
(PHP/asp/Java/Application server). Sometimes, libraries are mapped to
database structure, which is nonsense when a simple view with left joins
can solve a problem.

Most applications should be developed/ported at PostgreSQL level using the
full range of available tools (transactions, triggers, views, foreign keys,
rules and off course PL/pgSQL). This is much easier and powerful. Then, all
you need is to display information using a good object-oriented language
(Java/PHP).

With the help of packages, a lot of developers will probably release GPL
libraries and PostgreSQL will become the #1 database in the world.

At pgAdmin team, we were thinking of developing packages at client level.
This is nonsense when reading your paper. The ability of defining context
levels is a great feature. Question: how do you map package to PostgreSQL
objects (tables, views, triggers)? Is there any possibility of defining
templates? Can this be added to packages in the future with little impact
on PostgreSQL internals?

Now, we can only thank you for bringing Packages to PostgreSQL.

Best regards,
Jean-Michel POURE
pgAdmin Team

Tom Lane

tgl@sss.pgh.pa.us

about 24 years ago

In reply to: Bill Studenmund (#1)

Re: Package support for Postgres

Bill Studenmund <wrstuden@netbsd.org> writes:

... operators and types in
packages are in the same name space as are types and operators not in
packages.

For functions and aggregates, things are a little more complicated. First
off, there is a package called "standard" which contains all types,
aggregates, operators, and functions which aren't in a specific package.
This includes all of the standard Postgres routines, and anything created
with CREATE FUNCTION, CREATE AGGREGATE, CREATE OPERATOR, and CREATE TYPE.

Secondly, parsing is always done in terms of a specified package context.
If we are parsing an equation in a routine inside of a package, then the
package context is that package. If we are just typing along in psql, then
the package context is "standard".

When you specify a function or aggregate, you have two choices. One is to
specify a package, and a function in that package, like
"nametest.process" to specify the "process" function in the "nametest"
package.

The other choice is to just give the function's name. The first place
Postgres will look is in the package context used for parsing. If it's not
there (and that context wasn't "standard"), then it will look in
"standard".

Hmm. How does/will all of this interact with SQL-style schemas?

The reason I'm concerned is that if we want to retain the present
convention that the rowtype of a table has the same name as the table,
I think we are going to have to make type names schema-local, just
like table names will be. And if type names are local to schemas
then so must be the functions that operate on those types, and therefore
also operators (which are merely syntactic sugar for functions).

This seems like it will overlap and possibly conflict with the decisions
you've made for packages. It also seems possible that a package *is*
a schema, if schemas are defined that way --- does a package bring
anything more to the table?

I also wonder how the fixed, single-level namespace search path you
describe interacts with the SQL rules for schema search. (I don't
actually know what those rules are offhand; haven't yet read the schema
parts of the spec in any detail...)

Also, both operators and functions normally go through ambiguity
resolution based on the types of their inputs. How does the existence
of a name search path affect this --- are candidates nearer the front
of the search path preferred? Offhand I'm not sure if they should get
any preference or not.

I'd like to see schemas implemented per the spec in 7.3, so we need to
coordinate all this stuff.

regards, tom lane

Peter Eisentraut

peter_e@gmx.net

about 24 years ago

In reply to: Bill Studenmund (#1)

Re: Package support for Postgres

Bill Studenmund writes:

So what are packages? In Oracle, they are a feature which helps developers
make stored procedures and functions.

I think you have restricted yourself too much to functions and procedures.
A package could/should also be able to contain views, tables, and such.

They provide a name space for functions local to the package,

Namespacing is the task of schemas. I think of packages as a bunch of
objects that can be addressed under a common name (think RPMs).

But it seems like some of this work could be used to implement schema
support.

session-specific package variables,

I think this is assuming a little too much about how a PL might operate.
Some PLs already support this in their own language-specific way, with or
without packages. Thus, I don't think packages should touch this.
Actually, I think you could easily set up session variables in the package
initializer function.

The last component of a package are the functions usable for type
declarations. They are declared as:
BEFORE TYPE FUNCTION <standard package function declaration>

They are useful as the normal functions in a package are declared after
the types are declared, so that they can use a type newly-defined in a
package.

I think it would make much more sense to allow the creation of objects in
the CREATE PACKAGE command in any order. PostgreSQL has not so far had a
concept of "functions suitable for type declarations" and we shouldn't add
one.

--
Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter

Bill Studenmund

wrstuden@netbsd.org

about 24 years ago

In reply to: Tom Lane (#3)

Re: Package support for Postgres

On Sat, 13 Oct 2001, Tom Lane wrote:

Bill Studenmund <wrstuden@netbsd.org> writes:

For functions and aggregates, things are a little more complicated. First
off, there is a package called "standard" which contains all types,
aggregates, operators, and functions which aren't in a specific package.
This includes all of the standard Postgres routines, and anything created
with CREATE FUNCTION, CREATE AGGREGATE, CREATE OPERATOR, and CREATE TYPE.

Secondly, parsing is always done in terms of a specified package context.
If we are parsing an equation in a routine inside of a package, then the
package context is that package. If we are just typing along in psql, then
the package context is "standard".

When you specify a function or aggregate, you have two choices. One is to
specify a package, and a function in that package, like
"nametest.process" to specify the "process" function in the "nametest"
package.

The other choice is to just give the function's name. The first place
Postgres will look is in the package context used for parsing. If it's not
there (and that context wasn't "standard"), then it will look in
"standard".

Hmm. How does/will all of this interact with SQL-style schemas?

Independent as I understand it. Schemas (as I understand Oracle schemas)
operate at a level above the level where packages operate.

The reason I'm concerned is that if we want to retain the present
convention that the rowtype of a table has the same name as the table,
I think we are going to have to make type names schema-local, just
like table names will be. And if type names are local to schemas
then so must be the functions that operate on those types, and therefore
also operators (which are merely syntactic sugar for functions).

This seems like it will overlap and possibly conflict with the decisions
you've made for packages. It also seems possible that a package *is*
a schema, if schemas are defined that way --- does a package bring
anything more to the table?

I don't think it conflicts. My understanding of schemas is rather
simplistic and practical. As I understand it, they correspond roughly to
databases in PG. So with schema support, one database can essentially
reach into another one. Package support deals with the functions (and
types and in this case aggregates and operators) that schema support would
find in the other schemas/databases.

I also wonder how the fixed, single-level namespace search path you
describe interacts with the SQL rules for schema search. (I don't
actually know what those rules are offhand; haven't yet read the schema
parts of the spec in any detail...)

Should be independent. The searching only happens when you are not in the
"standard" package, and you give just a function name for a function.
The searching would only happen in the current schems. If
you give a schema name, then I'd expect PG to look in that schema, in
standard, for that function. If you give both a schema and package name,
then PG would look in that package in that schema.

Also, both operators and functions normally go through ambiguity
resolution based on the types of their inputs. How does the existence
of a name search path affect this --- are candidates nearer the front
of the search path preferred? Offhand I'm not sure if they should get
any preference or not.

There is no name spacing for operators in my implimentation as to have one
strikes me as reducing the utility of having types and operators in a
package. For functions (and aggregates), I tried to touch on that in the
latter part of my message; that's what the example with
"process(changer(4))" was about. PG will try to type coerce a function in
the current package before it looks in standard. So yes, candidates nearer
the front are prefered.

I'd like to see schemas implemented per the spec in 7.3, so we need to
coordinate all this stuff.

Sounds good. I don't think it will be that hard, though. :-)

Take care,

Bill

Bill Studenmund

wrstuden@netbsd.org

about 24 years ago

In reply to: Jean-Michel POURE (#2)

Re: [HACKERS] Package support for Postgres

On Sat, 13 Oct 2001, Jean-Michel POURE wrote:

What do folks think?
Take care,
Bill

Hello Bill,

The community have been waiting for packages for a long time. I don't
believe you did it!!!

IMHO most applications do not fully benefit from the power of PostgreSQL
because transactions are performed at application lever
(PHP/asp/Java/Application server). Sometimes, libraries are mapped to
database structure, which is nonsense when a simple view with left joins
can solve a problem.

Most applications should be developed/ported at PostgreSQL level using the
full range of available tools (transactions, triggers, views, foreign keys,
rules and off course PL/pgSQL). This is much easier and powerful. Then, all
you need is to display information using a good object-oriented language
(Java/PHP).

With the help of packages, a lot of developers will probably release GPL
libraries and PostgreSQL will become the #1 database in the world.

Yep. PostgreSQL is within reach of really challenging the commercial
databases. I think the core developers are working on the changes needed
to challenge the commercial db's in terms of speed and performance for big
datastores (WAL, working to prevent OID rollover, etc.). Packages address
a different side of what will be needed to challenge the big boys - better
stored procedure support. :-)

At pgAdmin team, we were thinking of developing packages at client level.
This is nonsense when reading your paper. The ability of defining context
levels is a great feature. Question: how do you map package to PostgreSQL
objects (tables, views, triggers)? Is there any possibility of defining
templates? Can this be added to packages in the future with little impact
on PostgreSQL internals?

Packages don't really map to DB objects (tables, views, triggers) at the
moment. Have you used Oracle much? These packages are a direct translation
of Oracle packages, with a few PostgreSQL extentions thrown in (Oracle
doesn't have PostgreSQL's ability to add aggregates, operators, and system
types AFAIK, so their packages likewise don't, and types in packages AFAIK
are package-specific).

I forget who said it, but operators (and aggregates) are basically just
sugar wrapped around functions; these packages are another form of sugar
wrapped around functions. To start adding views and tables and triggers
makes packages more than just special sugar around functions.

Also, my big concern is that if we start adding tables and views and
triggers to packages, pg_dump becomes a nightmare.

Now, we can only thank you for bringing Packages to PostgreSQL.

You're welcome.

Take care,

Bill

Bill Studenmund

wrstuden@netbsd.org

about 24 years ago

In reply to: Bill Studenmund (#5)

1 attachment(s)

Re: namespaces, was Package support for Postgres

On Sat, 13 Oct 2001, Bill Studenmund wrote:

On Sat, 13 Oct 2001, Tom Lane wrote:

I also wonder how the fixed, single-level namespace search path you
describe interacts with the SQL rules for schema search. (I don't
actually know what those rules are offhand; haven't yet read the schema
parts of the spec in any detail...)

Should be independent. The searching only happens when you are not in the
"standard" package, and you give just a function name for a function.
The searching would only happen in the current schems. If
you give a schema name, then I'd expect PG to look in that schema, in
standard, for that function. If you give both a schema and package name,
then PG would look in that package in that schema.

My description of namespaces seems to have caused a fair bit of confusion.
Let me try again.

The ability of the package changes to automatically check standard when
you give an ambiguous function name while in a package context is a
convenience for the procedure author. Nothing more.

It means that when you want to use one of the built in functions
(date_part, abs, floor, sqrt etc.) you don't have to prefix it with
"standard.". You can just say date_part(), abs(), floor(), sqrt(), etc.
The only time you need to prefix a call with "standard." is if you want to
exclude any so-named routines in your own package.

I've attached a copy of a package I wrote as part of testing package
initializers and package global variables. It is an adaptation of the
Random package described in Chapter 8 of _Oracle8 PL/SQL Programming_ by
Scott Urman. Other than adapting it to PostgreSQL, I also tweaked the
RandMax routine to give a flat probability.

Note the use of date_part() in the BODY AS section, and the use of rand()
in randmax(). Both of these uses are the ambiguous sort of function naming
which can trigger the multiple searching. Since they are in plpgsql code,
they get parsed in the context of the random package. So when each of them
gets parsed, parse_func first looks in the random package. For rand(), it
will find the rand() function and use it. But for date_part(), since there
isn't a date_part function in the package, we use the one in standard.

If we didn't have this ability, one of the two calls would need to have
had an explicit package with it. There are two choices (either "standard."
would be needed for date_part(), or "random." for rand()), but I think
both would lead to problems. Either choice makes the syntax heavy, for
little gain. Also, if we scatter the package name throughout the package,
if we ever want to change it, we have more occurences to change.

Does that make more sense?

Take care,

Bill

Bill Studenmund

wrstuden@netbsd.org

about 24 years ago

In reply to: Tom Lane (#3)

Re: schema support, was Package support for Postgres

On Sat, 13 Oct 2001, Tom Lane wrote:

Bill Studenmund <wrstuden@netbsd.org> writes:

The other choice is to just give the function's name. The first place
Postgres will look is in the package context used for parsing. If it's not
there (and that context wasn't "standard"), then it will look in
"standard".

Hmm. How does/will all of this interact with SQL-style schemas?

The reason I'm concerned is that if we want to retain the present
convention that the rowtype of a table has the same name as the table,
I think we are going to have to make type names schema-local, just
like table names will be. And if type names are local to schemas
then so must be the functions that operate on those types, and therefore
also operators (which are merely syntactic sugar for functions).

Ahhh... There's the operators == sugar comment.

I agree with you above; types and functions need to be schema-specific.

This seems like it will overlap and possibly conflict with the decisions
you've made for packages. It also seems possible that a package *is*
a schema, if schemas are defined that way --- does a package bring
anything more to the table?

I'm repeating myself a little. :-)

Packages aren't schemas. What they bring to the table is they facilitate
making stored procedures (functions). You can have twelve different
developers working on twenty different packages, with no fear of name
conflicts. The package names will have to be different, so there can be
functions with the same names in different pacakges.

This ability isn't that important in small development projects, but is
really important for big ones. Think about big db applications, like
Clarify. Any project with multiple procedure authors. Without something
like packages, you'd need to spend a lot of effort coordinating names &
such so that they didn't conflict. With packages, it's rather easy.

Also, I think PostgreSQL can challenge the commercial databases for these
applications. But to do so, changing over to PG will need to be easy.
Having packages there will greatly help.

I'd like to see schemas implemented per the spec in 7.3, so we need to
coordinate all this stuff.

For the most part, I think packages and schemas are orthogonal. I'm taking
a cue from Oracle here. Oracle considers packages to be a schema-specific
object.

Take care,

Bill

Bill Studenmund

wrstuden@netbsd.org

about 24 years ago

In reply to: Peter Eisentraut (#4)

Re: Package support for Postgres

On Sat, 13 Oct 2001, Peter Eisentraut wrote:

Bill Studenmund writes:

So what are packages? In Oracle, they are a feature which helps developers
make stored procedures and functions.

I think you have restricted yourself too much to functions and procedures.
A package could/should also be able to contain views, tables, and such.

I disagree. Views and tables are the purview of schemas, which as I
mentioned to Tom, strike me as being different from packages. Packages
basically are modules which make life easier for functions (and types and
aggregates and operators).

If we really want to make tables and views and triggers part of packages,
we can. My big concern is that it then makes pg_dump harder. I'll go into
that more below.

They provide a name space for functions local to the package,

Namespacing is the task of schemas. I think of packages as a bunch of
objects that can be addressed under a common name (think RPMs).

Regrettablely Oracle beat you to it with what "packages" are in terms of
Oracle, and I suspect also in the minds of many DBAs.

I also think that you and Tom have something different in mind about the
namespacing in packages. It is purely a convenience for the package
developer; whenever you want to use a function built into the database,
you _don't_ have to type "standard." everywhere. Think what a PITA it
would be to have to say "standard.abs(" instead of "abs(" in your
functions! I'm sorry if my explanation went abstract quickly & making that
unclear.

But it seems like some of this work could be used to implement schema
support.

I think the big boost this will have to schema support is that it shows
how to make a far-reaching change to PostgreSQL. :-) It's an internal
schema change and more, just as schema support will be.

session-specific package variables,

I think this is assuming a little too much about how a PL might operate.
Some PLs already support this in their own language-specific way, with or
without packages. Thus, I don't think packages should touch this.
Actually, I think you could easily set up session variables in the package
initializer function.

I agree that some PLs might do things their own way and so package
variables won't be as useful. If these variables are not appropriate to a
PL, it can ignore them.

PL/pgSQL is a counter-example, though, showing that something needs to be
done. It is not set up to support global variables; each code block
generates its own namespace, and removes it on the way out. Thus I can
not see a clean way to add package global variables to say the
initialization routine - this routine's exit code would need to not
destroy the context. That strikes me as a mess.

The last component of a package are the functions usable for type
declarations. They are declared as:
BEFORE TYPE FUNCTION <standard package function declaration>

They are useful as the normal functions in a package are declared after
the types are declared, so that they can use a type newly-defined in a
package.

I think it would make much more sense to allow the creation of objects in
the CREATE PACKAGE command in any order. PostgreSQL has not so far had a
concept of "functions suitable for type declarations" and we shouldn't add
one.

I think you misread me slightly. BEFORE TYPE FUNCTION functions are
"usable" for type declarations, not "suitable" for them. Also, I didn't
say one key clause, "in this package". The main difference is when in the
creation of the package the functions are created; they get created before
the types, rather than after.

This concept is new to PostgreSQL because PostgreSQL has never before
chained creations together like this.

Thinking about it though it would be feasable to scan the list of types in
the package, and see if there are references to functions declared in that
package, and if so to create them before the types get declared. That
would remove the need for BEFORE TYPE FUNCTION and also make pg_dump a
little simpler.

Take care,

Bill

#10

Bill Studenmund

wrstuden@netbsd.org

about 24 years ago

In reply to: Bill Studenmund (#8)

Re: schema support, was Package support for Postgres

On Mon, 15 Oct 2001, Tom Lane wrote:

Bill Studenmund <wrstuden@netbsd.org> writes:

For the most part, I think packages and schemas are orthogonal. I'm taking
a cue from Oracle here. Oracle considers packages to be a schema-specific
object.

Nonetheless, it's not clear to me that we need two independent concepts.
Given a name search path that can go through multiple schemas, it seems
to me that you could get all the benefits of a package from a schema.

I'm not necessarily averse to accepting Oracle's syntax for declaring
packages --- if we can make it easier for Oracle users to port to Postgres,
that's great. But I'm uncomfortable with the notion of implementing two
separate mechanisms that seem to do the exact same thing, ie, control
name visibility.

I'm at a loss as to what to say. I think that what packages do and what
schemas do are different - they are different kinds of namespaces. That's
why they should have different mechanisms. Packages are for making it
easier to write stored procedures for large programming projects or for
code reuse. Schemas, well, I need to learn more. But they strike me more
as a tool to partition entire chunks of a database.

Also, packages have a whole concept of initialization routines and global
variables, which strike me as having no place alongside tables and views.

Take care,

Bill

Import Notes

Reply to msg id not found: 6510.1003179731@sss.pgh.pa.us | Resolved by subject fallback

#11

Bill Studenmund

wrstuden@netbsd.org

about 24 years ago

In reply to: Bill Studenmund (#9)

Re: Package support for Postgres

On Sun, 14 Oct 2001, Peter Eisentraut wrote:

I have been pondering a little about something I called "package",
completely independent of anything previously implemented. What I would
like to get out of a package is the same thing I get out of package
systems on operating systems, namely that I can remove all the things that
belong to the package with one command. Typical packages on PostgreSQL
could be the PgAccess admin tables or the ODBC catalog extensions.

One might think that this could also be done with schemas. I'm thinking
using schemas for this would be analogous to installing one package per
directory. Now since we don't have to deal with command search paths or
file system mount points there might be nothing wrong with that.

Packages typically also have post-install/uninstall code, as does this
proposed implementation, so that would have to be fit in somewhere.

This is basically where my thinking has stopped... ;-)

Now I'm also confused as to what this package system really represents:
Is it a namespace mechanisms -- but Oracle does have schemas; or is it a
package manager like I had in mind -- for that it does too many things
that don't belong there; or is it a mechanism to set up global variables
-- that already exists and doesn't need "packages".

It is an implimentation of Oracle Packages for PostgreSQL, taking
advantage of some of PostgreSQL's abilities (the aggregates & operators in
a package bit is new). It is a tool to help developers create large
projects and/or reuse code.

It is not schema support; schema support operates on a level above package
support. It is also not the package support you had in mind. That support
is different. What you describe above is packaging which primarily helps
the admin, while this packaging primarily helps the procedure developer.
That difference in emphasis is why this package support does things an
administrator-focused package system wouldn't.

Also, please note that while many of PostgreSQL's procedure languages
might not need global variable support, PL/pgSQL does.

Take care,

Bill

Import Notes

Reply to msg id not found: Pine.LNX.4.30.0110141401480.849-100000@peter.localdomain | Resolved by subject fallback

#12

Bill Studenmund

wrstuden@netbsd.org

about 24 years ago

In reply to: Bill Studenmund (#10)

Re: schema support, was Package support for Postgres

On Mon, 15 Oct 2001, Tom Lane wrote:

Bill Studenmund <wrstuden@netbsd.org> writes:

For the most part, I think packages and schemas are orthogonal. I'm taking
a cue from Oracle here. Oracle considers packages to be a schema-specific
object.

Nonetheless, it's not clear to me that we need two independent concepts.
Given a name search path that can go through multiple schemas, it seems
to me that you could get all the benefits of a package from a schema.

About the best response to this I can come up with is that in its present
implimentation, types and operators are not scoped as package-specific. If
you declare a type in a package, that type is usable anywhere; you don't
have to say package.type. If we did packages via schemas, as I understand
it, you would (and should).

We both agree that types and the functions that operate on them should be
schema-specific. Thus operators should be schema-specific. If we did
packages via schemas, I don't see how we would get at operators in
packages. If you create a new integer type, would you really want to have
to type "3 packname.< table.attr" to do a comparison?

So I guess that's the reason; this package implimentation creates types
and operators in the same namespace as built-in types and operators. As I
understand schemas, user types (and thus operators) should exist in a
schema-specific space.

I can see reasons for both, thus I think there is a place for two
independent concepts.

Take care,

Bill

Import Notes

Reply to msg id not found: 6510.1003179731@sss.pgh.pa.us | Resolved by subject fallback

#13

Peter Eisentraut

peter_e@gmx.net

about 24 years ago

In reply to: Tom Lane (#3)

Re: Package support for Postgres

Tom Lane writes:

This seems like it will overlap and possibly conflict with the decisions
you've made for packages. It also seems possible that a package *is*
a schema, if schemas are defined that way --- does a package bring
anything more to the table?

I have been pondering a little about something I called "package",
completely independent of anything previously implemented. What I would
like to get out of a package is the same thing I get out of package
systems on operating systems, namely that I can remove all the things that
belong to the package with one command. Typical packages on PostgreSQL
could be the PgAccess admin tables or the ODBC catalog extensions.

One might think that this could also be done with schemas. I'm thinking
using schemas for this would be analogous to installing one package per
directory. Now since we don't have to deal with command search paths or
file system mount points there might be nothing wrong with that.

Packages typically also have post-install/uninstall code, as does this
proposed implementation, so that would have to be fit in somewhere.

This is basically where my thinking has stopped... ;-)

Now I'm also confused as to what this package system really represents:
Is it a namespace mechanisms -- but Oracle does have schemas; or is it a
package manager like I had in mind -- for that it does too many things
that don't belong there; or is it a mechanism to set up global variables
-- that already exists and doesn't need "packages".

--
Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter

#14

Bill Studenmund

wrstuden@netbsd.org

about 24 years ago

In reply to: Peter Eisentraut (#13)

Re: Package support for Postgres

On Tue, 16 Oct 2001, Peter Eisentraut wrote:

Bill Studenmund writes:

I disagree. Views and tables are the purview of schemas, which as I
mentioned to Tom, strike me as being different from packages.

Well, obviously schemas are a namespacing mechanism for tables and views.
And apparently the "packages" you propose are (among other things) a
namespacing mechanism for functions. But the fact is that schemas already
provide a namespacing mechanism for functions. (That's what SQL says and
that's how it's going to happen.) Now perhaps you want to have a
namespacing mechanism *below* schemas. But then I think this could be
done with nested schemas, since the sub-schemas would really be the same
concept as a top-level schema. That would be a much more general
mechanism.

Yes, I want a namespace below schemas.

The difference between packages and schemas is that schemas encapsulate
everything. As Tom pointed out, that includes types (and I'd assume
operators too). Packages do not encapsulate types and operators. That's
what makes them different from a sub-schema (assuming a sub-schema is a
schema within a schema).

Obviously there is a large number of ideas that "make life easier". But
I'm still missing a clear statement what exactly the design idea behind
these packages is. So far I understood namespace and global variables for
PL/pgSQL. For the namespace thing we've already got a different design.
For global variables, see below.

See above.

I agree that some PLs might do things their own way and so package
variables won't be as useful. If these variables are not appropriate to a
PL, it can ignore them.

PL/pgSQL is a counter-example, though, showing that something needs to be
done.

Then PL/pgSQL should be fixed. But that doesn't need a such a large

Why is PL/pgSQL broken?

It has a very clean design element; you enter a code block, you get a new
namespace. You can declare variables in that namespace if you want. When
you use a variable name, PL/pgSQL looks in the current namespace, then the
parent, and so on. You exit a code block, the namespace goes away. That's
how C works, for instance.

concept as "packages". It could be as easy as

DECLARE GLOBAL
...
BEGIN
...
END

It is not set up to support global variables; each code block
generates its own namespace, and removes it on the way out. Thus I can
not see a clean way to add package global variables to say the
initialization routine - this routine's exit code would need to not
destroy the context. That strikes me as a mess.

The language handler should have no problem creating persistent storage --
I don't see that as a problem. If the language is misdesigned that it
cannot be done (which I doubt, but consider the theoretical case) then the
language should be replaced by something better, but please keep in mind
that it's a PL/pgSQL problem only. Maybe if you're from an Oracle
background this separation is not quite as natural.

The problem is not creating persistent storage; the issue is that the
langyage was designed to not use it. What you're proposing could be done,
but would effectivly be shoving the change in with a hammer. Also, any
other PLs which are based on languages with strict namespaces will have
the same problem.

Look at C for instance. What you're describing is the equivalent to
letting a function or procedure in C declare global variables. That's not
now the language works, and no one seems to mind. :-)

Right, that's why I suggested allowing the CREATE statements in any order
so you could order them yourself to have the function before the types or
whatever you want.

My concern with that is that then we have to make sure to dump it in the
same order you entered it. Right now, in general, pg_dump dumps objects in
stages; all of the languages are dumped, then all of the types, then the
functions, and so on. Functions needed for types and languages get dumped
right before the type or language which needs it.

If we go with strict package order mattering, then pg_dump needs to be
able to recreate that order. That means that it has to look in pg_proc,
pg_operator, pg_type, and pg_aggreagate, sort things (in the package being
dumped) by oid, and dump things in order of increasing oid. Nothing else
in pg_dump works like that. I'd rather not start.

I have however come up with another way to make BEFORE TYPE FUNCTION go
away. I'll just scan the types in a package (I doubt there will be many),
get a set of candidate names, and scan the functions in the package for
them. If they are found, they get added before the types do. So then the
decision as to when a function should get added is implicit, rather than
explicit.

I'll see about adding this before I send in the patch (it is the only
thing left).

Take care,

Bill

Import Notes

Reply to msg id not found: Pine.LNX.4.30.0110162044040.627-100000@peter.localdomain | Resolved by subject fallback

#15

Tom Lane

tgl@sss.pgh.pa.us

about 24 years ago

In reply to: Bill Studenmund (#8)

Re: schema support, was Package support for Postgres

Bill Studenmund <wrstuden@netbsd.org> writes:

For the most part, I think packages and schemas are orthogonal. I'm taking
a cue from Oracle here. Oracle considers packages to be a schema-specific
object.

Nonetheless, it's not clear to me that we need two independent concepts.
Given a name search path that can go through multiple schemas, it seems
to me that you could get all the benefits of a package from a schema.

I'm not necessarily averse to accepting Oracle's syntax for declaring
packages --- if we can make it easier for Oracle users to port to Postgres,
that's great. But I'm uncomfortable with the notion of implementing two
separate mechanisms that seem to do the exact same thing, ie, control
name visibility.

regards, tom lane

#16

Bill Studenmund

wrstuden@netbsd.org

about 24 years ago

In reply to: Peter Eisentraut (#4)

Re: Package support for Postgres

On Sat, 13 Oct 2001, Peter Eisentraut wrote:

Bill Studenmund writes:

session-specific package variables,

I think this is assuming a little too much about how a PL might operate.
Some PLs already support this in their own language-specific way, with or
without packages. Thus, I don't think packages should touch this.
Actually, I think you could easily set up session variables in the package
initializer function.

Could you please give me an example of how to do this, say for plperl or
plpython? Just showing how two functions made with CREATE FUNCTION can use
global variables will be fine. This example will help me understand how
they work.

Take care,

Bill

#17

Bill Studenmund

wrstuden@netbsd.org

about 24 years ago

In reply to: Bill Studenmund (#16)

Re: Package support for Postgres

On Wed, 17 Oct 2001, Peter Eisentraut wrote:

Bill Studenmund writes:

Yes, I want a namespace below schemas.

The difference between packages and schemas is that schemas encapsulate
everything. As Tom pointed out, that includes types (and I'd assume
operators too). Packages do not encapsulate types and operators.

Of course nobody is forcing you to put types into subschemas. But the
user would have the freedom to spread things around as he sees fit.

???

Then PL/pgSQL should be fixed. But that doesn't need a such a large

Why is PL/pgSQL broken?

Maybe read "fixed" as "enhanced".

The problem is not creating persistent storage; the issue is that the
langyage was designed to not use it. What you're proposing could be done,
but would effectivly be shoving the change in with a hammer. Also, any
other PLs which are based on languages with strict namespaces will have
the same problem.

Other PLs have shown that storing global data in a language-typical way
*is* possible. I read your argumentation as "PL/pgSQL is not designed to
have global variables, so I'm going to implement 'packages' as a way to
make some anyway". Either PL/pgSQL is not designed for it, then there
should not be any -- at all. Or it can handle them after all, but then
it's the business of the language handler to deal with it.

Do you really think that my employer paid me for three months to come up
with an 800k diff _just_ to add global variables to PL/pgSQL? While part
of it, global variables are only one part of the work. I would actually
say it is a minor one.

Honestly, I do not understand why "global variables" have been such a sore
point for you. PLs for which they don't make sense like this don't have to
do it, and Oracle, on whom our Pl/pgSQL was based, thinks that they make
perfect sense for the language we copied.

Also, remember that this is an implimentation of Oracle packages for
Postgres. One of our goals was to make it so that you can mechanically
transform an Oracle package into a Postgres one, and vis versa. This
implimentation does a good job of that. To make the change you suggest
would not.

My concern with that is that then we have to make sure to dump it in the
same order you entered it.

pg_dump can do dependency ordering if you ask it nicely. ;-) When we
implement schemas we'll have to make sure it works anyway. Thinking about
pg_dump when designing backend features is usually not worthwhile.

The thing is what you're talking about is more than just dependency
ordering (which I taught pg_dump to do for packages). "doing things in the
order you list" to me means that things get dumped in the exact same
order. Say you added some functions and then some operators and then some
functions. If order matters, then the operators should get generated in
the dump before the functions, even though there's no dependency-reason to
do so.

Maybe I'm taking that a bit more literal than you mean, but how it comes
across to me is unnecessarily difficult. We cah achieve the same thing
other ways.

I did however take your point that BEFORE TYPE FUNCTION should go away;
the patch I sent in does not have it. In the patch, stanzas in the CREATE
PACKAGE command are gathered, and done in sequence according to kind.
First the global variables are defined, then the initialization routines,
then functions which are needed for types in the package, then types, then
functions (other than the ones already done), aggregates, and operators.

Take care,

Bill

Import Notes

Reply to msg id not found: Pine.LNX.4.30.0110172033570.628-100000@peter.localdomain | Resolved by subject fallback

#18

Peter Eisentraut

peter_e@gmx.net

about 24 years ago

In reply to: Bill Studenmund (#9)

Re: Package support for Postgres

Bill Studenmund writes:

I disagree. Views and tables are the purview of schemas, which as I
mentioned to Tom, strike me as being different from packages.

Well, obviously schemas are a namespacing mechanism for tables and views.
And apparently the "packages" you propose are (among other things) a
namespacing mechanism for functions. But the fact is that schemas already
provide a namespacing mechanism for functions. (That's what SQL says and
that's how it's going to happen.) Now perhaps you want to have a
namespacing mechanism *below* schemas. But then I think this could be
done with nested schemas, since the sub-schemas would really be the same
concept as a top-level schema. That would be a much more general
mechanism.

Packages basically are modules which make life easier for functions
(and types and aggregates and operators).

Obviously there is a large number of ideas that "make life easier". But
I'm still missing a clear statement what exactly the design idea behind
these packages is. So far I understood namespace and global variables for
PL/pgSQL. For the namespace thing we've already got a different design.
For global variables, see below.

If we really want to make tables and views and triggers part of packages,
we can. My big concern is that it then makes pg_dump harder. I'll go into
that more below.

That has never stopped us from doing anything. ;-)

Regrettablely Oracle beat you to it with what "packages" are in terms of
Oracle, and I suspect also in the minds of many DBAs.

Oracle appears to have beaten us to define the meaning of quite a few
things, but that doesn't mean we have to accept them. We don't
re-implement Oracle here. And exactly because all Oracle has is
procedures and PL/SQL, whereas PostgreSQL has operators, types, and such,
and user-defined procedural languages, designs may need to be changed or
thrown out. It wouldn't be the first time.

I agree that some PLs might do things their own way and so package
variables won't be as useful. If these variables are not appropriate to a
PL, it can ignore them.

PL/pgSQL is a counter-example, though, showing that something needs to be
done.

Then PL/pgSQL should be fixed. But that doesn't need a such a large
concept as "packages". It could be as easy as

DECLARE GLOBAL
...
BEGIN
...
END

It is not set up to support global variables; each code block
generates its own namespace, and removes it on the way out. Thus I can
not see a clean way to add package global variables to say the
initialization routine - this routine's exit code would need to not
destroy the context. That strikes me as a mess.

The language handler should have no problem creating persistent storage --
I don't see that as a problem. If the language is misdesigned that it
cannot be done (which I doubt, but consider the theoretical case) then the
language should be replaced by something better, but please keep in mind
that it's a PL/pgSQL problem only. Maybe if you're from an Oracle
background this separation is not quite as natural.

I think you misread me slightly. BEFORE TYPE FUNCTION functions are
"usable" for type declarations, not "suitable" for them. Also, I didn't
say one key clause, "in this package". The main difference is when in the
creation of the package the functions are created; they get created before
the types, rather than after.

Right, that's why I suggested allowing the CREATE statements in any order
so you could order them yourself to have the function before the types or
whatever you want.

This concept is new to PostgreSQL because PostgreSQL has never before
chained creations together like this.

Externally perhaps not, but internally these things happen all the time.

--
Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter

#19

Bill Studenmund

wrstuden@netbsd.org

about 24 years ago

In reply to: Bill Studenmund (#10)

Re: schema support, was Package support for Postgres

On Sun, 14 Oct 2001, Bill Studenmund wrote:

On Mon, 15 Oct 2001, Tom Lane wrote:

Bill Studenmund <wrstuden@netbsd.org> writes:

For the most part, I think packages and schemas are orthogonal. I'm taking
a cue from Oracle here. Oracle considers packages to be a schema-specific
object.

Nonetheless, it's not clear to me that we need two independent concepts.
Given a name search path that can go through multiple schemas, it seems
to me that you could get all the benefits of a package from a schema.

I've been thinking about this. I've changed my mind. Well, I've come to
realize that you can have multiple schemas in one db, so that multiple
schema support != one db reaching into another.

I still think that schemas and packages are different, but I now think
they are interrelated. And that it shouldn't be too hard to leverage the
package work into schema support. Still a lot of work, but the package
work has shown how to go from one to two in a number of ways. :-)

First off, do you (Tom) have a spec for schema support? I think that would
definitly help things.

Second, can you help me with gram.y? I'm trying to get gram.y to deal with
figuring out if you've typed in packagename.function name, rather than
relying on the lexer to notice you've typed ${identifier}\.${identifier}
where the first identifier is a package name & send a terminal saying so.
Twelve r/r conflicts. They involve a conflict between ColId and something
else, and focus on not knowing what reduction to take when seeing a '[',
',', or ')'. Thoughts?

Take care,

Bill

#20

Bill Studenmund

wrstuden@netbsd.org

about 24 years ago

In reply to: Peter Eisentraut (#18)

Re: Package support for Postgres

On Thu, 18 Oct 2001, Peter Eisentraut wrote:

Bill Studenmund writes:

Could you please give me an example of how to do this, say for plperl or
plpython? Just showing how two functions made with CREATE FUNCTION can use
global variables will be fine. This example will help me understand how
they work.

For PL/Tcl you use regular Tcl global variables:

create function produce(text) returns text as '
global foo; set foo $1;
' language pltcl;

create function consume() returns text as '
global foo; return $foo;
' language pltcl;

There is also a mechanism for one procedure to save private data across
calls.

For PL/Python you use a global dictionary:

create function produce(text) returns text as '
GD["key"] = args[0]
' language plpython;

create function consume() returns text as '
return GD["key"]
' language plpython;

There is also a dictionary for private data.

Private to what?

For PL/Perl I'm not sure if something has been implemented. In C you can
use shared memory, and for PL/sh you would use temp files of course. ;-)

Thank you. I can now experiment with them to see how they do.

I've never thought of adding package variables for C routines; there are
other options open. :-)

Oh, by shared memory, do you mean SYSV Shared Memory (like how the
backends talk) or just memory shared between routines? I ask as part of
the idea with these variables is that they are backend-specific. So C
routines actually should NOT used SYSV Shared Mem. :-)

Take care,

Bill

Import Notes

Reply to msg id not found: Pine.LNX.4.30.0110172329010.628-100000@peter.localdomain | Resolved by subject fallback

#21

Peter Eisentraut

peter_e@gmx.net

about 24 years ago

In reply to: Bill Studenmund (#14)

Re: Package support for Postgres

Bill Studenmund writes:

Yes, I want a namespace below schemas.

The difference between packages and schemas is that schemas encapsulate
everything. As Tom pointed out, that includes types (and I'd assume
operators too). Packages do not encapsulate types and operators.

Of course nobody is forcing you to put types into subschemas. But the
user would have the freedom to spread things around as he sees fit.

I agree that some PLs might do things their own way and so package
variables won't be as useful. If these variables are not appropriate to a
PL, it can ignore them.

PL/pgSQL is a counter-example, though, showing that something needs to be
done.

Then PL/pgSQL should be fixed. But that doesn't need a such a large

Why is PL/pgSQL broken?

Maybe read "fixed" as "enhanced".

The problem is not creating persistent storage; the issue is that the
langyage was designed to not use it. What you're proposing could be done,
but would effectivly be shoving the change in with a hammer. Also, any
other PLs which are based on languages with strict namespaces will have
the same problem.

Other PLs have shown that storing global data in a language-typical way
*is* possible. I read your argumentation as "PL/pgSQL is not designed to
have global variables, so I'm going to implement 'packages' as a way to
make some anyway". Either PL/pgSQL is not designed for it, then there
should not be any -- at all. Or it can handle them after all, but then
it's the business of the language handler to deal with it.

Look at C for instance. What you're describing is the equivalent to
letting a function or procedure in C declare global variables. That's not
now the language works, and no one seems to mind. :-)

What you're describing is the equivalent of declaring global variables in
C but inventing a whole new mechanism in the operating system for it
called "package" (which also happens to be a namespace mechanism as a
second job). Now, there are ways to exchange data between separate
programs or modules in C, such as message queues or shared memory. (If
you think about it, what PL/Python is doing with its global dictionary is
just the same as shared memory.) This works fine, but it doesn't affect
PostgreSQL proper in any way.

So, do something with PL/pgSQL. Implement a global dictionary, or shared
memory, or some other way to share data between functions. Call it
"package" if you like. (Java has packages which are somewhat like that.)

My concern with that is that then we have to make sure to dump it in the
same order you entered it.

pg_dump can do dependency ordering if you ask it nicely. ;-) When we
implement schemas we'll have to make sure it works anyway. Thinking about
pg_dump when designing backend features is usually not worthwhile.

--
Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter

#22

Bill Studenmund

wrstuden@netbsd.org

about 24 years ago

In reply to: Peter Eisentraut (#21)

Re: Package support for Postgres

On Thu, 18 Oct 2001, Peter Eisentraut wrote:

Bill Studenmund writes:

Honestly, I do not understand why "global variables" have been such a sore
point for you.

My point is that the proposed "package support" introduces two features
that are a) independent, and b) already exist, at least in design.
Schemas are already planned as a namespace mechanism. Global variables in
PLs already exist in some PLs. Others can add it if they like. There
aren't any other features introduced by "package support" that I can see
or that you have explicitly pointed out.

Then my explanations didn't click. Please let me try again.

The main feature of package support is that it greatly facilitates
developing large, complicated db applications. Like ones which require
multiple full-time developers to develop. I think PostgreSQL has the
internals to run these apps, and it should provide a developement
environment to encourage them.

That's what packages are about.

I have never developed an application that large. But I have talked to our
DBAs who have worked with such things in Oracle, and a few who have worked
on (developed) such large applications. They all have agreed that
something akin to packages is needed to make it work.

The seperate namespaces (for function names and for variables) mean that
different programmers don't have to coordinate the names of functions. Or
that the names have to have some de-ambiguating prefix to make them
different. All that has to happen is that different packages have
different names. When you through in the idea of developers releasing
libraries (packages) on the net, the minimality of coordination is even
more important.

The fact (for PostgreSQL i.e. this implimentation) that types and
operators aren't namespaced off means that they effectively leak into the
enclosing database (or schema when we have them) so that making and
supporting new types can be the aim/result of the package.

For comaprison with other languages, packages strike me as comparable to
libraries (in C) or modules (say in Perl or Python). Neither libraries nor
modules realy do anything that can't be achieved otherwise in the
language. Yet they are a prefered method of developing code, especially
reused code. When you're making a program/application, you don't need to
concern yourself with (many) details about the code; you use the module
and that's it. Likewise here, an application developer/integrator need
only load a module, and then all the routines in it are available. You
don't for instance have to worry if the routines have names which overlap
ones you were using, or ones used worse yet by another set of routines you
want to use.

I think Jean-Michael's comments were right. While I'm not sure if things
will be as overwhelming as he predicted, packages (even as implimented in
my patch) will help people develop code libraries for PostgreSQL. And that
will make PostgreSQL applications easier.

Also, as I've come to understand what schemas are and aren't, I've
realized that they can be readily leveraged to help with schema support.

Schemas, at least according to the SQL92 spec I have looked at (I'd love
to see a later spec), are namespaces only for tables and views (and
character sets and a number of other things which PostreSQL doesn't
support). They don't touch on functions. Sure, PostgreSQL could decide to
do something with functions, but if we do, we're improvising, and I see no
reason to improvise differently than other DBMSs have done. There may be
one, but I don't see it.

Also, as I understand schemas (which could be wrong), there is a
difference in emphasis between schemas and packages. Schemas are a way to
partition your database, so that different parts of an application see
only a subsection of the whole database. You can have some parts only able
to access one or another schema, while other parts can access multiple
schemas. Packages however are designed to help you build the tools to make
the applications work (providing toolchests of code for instance). It's
like schemas are a more top-down design element, and packages are
bottom-up.

Where I see the interaction is that we want to have different schemas have
schema-specific functions, we just have a package implicitly assosciated
with each schema which contains the traditional functions and aggregates
(and types and operators) of that schema.

So the two questions I ask myself are:

1. Are package namespaces "better" than schemas? The answer to that is
no, because schemas are more standard and more general.

See above; I never said packages were better than schemas (nor worse). I
said they were different parts of the puzzle. I think they are both
important and valuable.

2. Are global variables via packages "better" than the existing setups?
My answer to that is again no, because the existing setups respect
language conventions, maintain the separation of the backend and the
language handlers, and of course they are already there and used.

All package variables are to the backend are entries in a table,
pg_packglobal, provided for the convenience of the language handler. If
the handler doesn't want to do anything with them, then it doesn't and
that's no loss.

So as a consequence we have to ask ourselves,

3. Do "packages" add anything more to the table than those two elementary
features? Please educate us.

See above.

4. Would it make sense to provide "packages" alongside the existing
mechanisms that accomplish approximately the same thing. That could be
debated, in case we agree that they are approximately the same thing.

I don't agree that they are approximatly the same thing, though I agree
that many of the things packages do can be cobbled together (more
painfully) without them.

Hopefully these explanations will come across clearer.

Take care,

Bill

Import Notes

Reply to msg id not found: Pine.LNX.4.30.0110182021320.633-100000@peter.localdomain | Resolved by subject fallback

#23

Bill Studenmund

wrstuden@netbsd.org

about 24 years ago

In reply to: Bill Studenmund (#19)

Re: schema support, was Package support for Postgres

On 19 Oct 2001, Gunnar [iso-8859-1] R��nning wrote:

* Bill Studenmund <wrstuden@netbsd.org> wrote:
|
| Packages aren't schemas. What they bring to the table is they facilitate
| making stored procedures (functions). You can have twelve different
| developers working on twenty different packages, with no fear of name
| conflicts. The package names will have to be different, so there can be
| functions with the same names in different pacakges.

Hmm. But if we had schema support can't we just package those procedures
into a schema with a given name ? Maybe my stored procedures needs some other
resources as well that should not conflict with other packages, like temp
tables or such. It then seems to me that using schemas can solve everything
that packages do and more ?

Assuming that schema support covers functions (which Tom, I, evidently
you, and Oracle think it should but which isn't mentioned at least in
SQL92), you could do that. And if you're adding tables, you probably
should.

But a lot of times you don't need to go to the effort of namespacing off a
whole new schema, and I can think of some cool things to do when you
don't.

One example is a large, complicated db app with multiple programmers. For
each general area of the app, you can create a package. That way you
modularize the code into more managable pieces. But since the are all in
the same schema, they can maintain/interact with the same tables.

So that's an arguement for packages/subschemas.

| For the most part, I think packages and schemas are orthogonal. I'm taking
| a cue from Oracle here. Oracle considers packages to be a schema-specific
| object.

What is really the difference functionality wise of making a subschema and
package ? In both cases you deal with the namespace issues.

A matter of what is subspaced. I'd assume that a subschema namespaces off
everything a schema does. A package however only namespaces off functions
and aggregates. Packages, at least as I've implimented them, do *not*
namespace off types nor operators they contain.

Technically, the package oid is a key in the name index for pg_proc and
pg_aggregate, while it is not for pg_type and pg_operator.

I admit, I took a minor liberty here. Oracle packages do have types, but
Oracle types are not as rich as PostgreSQL's So when I was translating
packages, I made the types in them match PostgreSQL's. Also, since I'd
added aggregates and types, adding operators seemed like a reasonable
thing. Both from the point of view of the parser (they are all done about
the same way), and from the point of utility. PostgreSQL's ability to add
types is really cool, and the ability to add operators makes new types
convenient to use. If packages could add types and support functions but
not operators, that'd seem lame.

The reason that packages don't namespace off types and operators is I
think it makes them more useful. Think about the complex number example in
the programmer's guide. I can think of scientific applications which could
use them. But having to say package.complex for the type would be
combersome. And even worse, having to say package.+ or package.- would be
bad. And package.* might be ambiguous to the parser!

So that's why I made pacakges not be subschemas. Packages were designed to
help with writing stored procedures, and to do it well. :-)

Take care,

Bill

Import Notes

Reply to msg id not found: m2d73jren3.fsf@smaug.polygnosis.com | Resolved by subject fallback

#24

Bill Studenmund

wrstuden@netbsd.org

about 24 years ago

In reply to: Bill Studenmund (#23)

Re: schema support, was Package support for Postgres

On Fri, 19 Oct 2001, Tom Lane wrote:

Yeah. I am wondering whether we couldn't support Oracle-style packages
as a thin layer of syntactic sugar on top of schemas. I am concerned
about the prospect that "foo.bar" might mean either "object bar in
schema foo" or "object bar in package foo".

See my note to Gunnar for why I think packages should be inside of schemas
rather than renamed schemas. Types and expecially operators would be much
more useful to the enclosing schema that way (I think).

Yes, there is an ambiguity between schema foo and package foo. I can think
of a few ways to deal with this.

1) Do whatever Oracle does, assuming it's not grotesque. Yes, I've said
that a lot. But I think PostgreSQL can really take some applications away
from the commercial DBMSs, and Oracle is #1 in that market. So Oracle
represents Prior Art of least surprise. :-)

2) If there is both a schema named foo and a package named foo, then
foo.bar should always take foo to be the schema. If we let a package in
the local schema named foo be found before the schema foo, then we would
get different results in said schema and another one (which didn't have a
package named foo in it).

3) Don't let schemas and packages have the same name. I actually believe
this is what Oracle does, though I haven't checked. I _have_ checked that
packages and tables can't have the same name, and built that into the
packages patches. I think requiring schemas to have names different from
tables and packages is a good thing, and would reduce ambiguity.

As an aside the reason I suspect this is what Oracle does is that Oracle
has a system table which contains a list of named objects. Tables and
packages show up as entries in this table, and I'd expect schemas would
too.

Take care,

Bill

Import Notes

Reply to msg id not found: 12665.1003502866@sss.pgh.pa.us | Resolved by subject fallback

#25

Bill Studenmund

wrstuden@netbsd.org

about 24 years ago

In reply to: Bill Studenmund (#24)

Re: schema support, was Package support for Postgres

On 19 Oct 2001, Gunnar [iso-8859-1] R��nning wrote:

* Tom Lane <tgl@sss.pgh.pa.us> wrote:
|
| Yeah. I am wondering whether we couldn't support Oracle-style packages
| as a thin layer of syntactic sugar on top of schemas. I am concerned
| about the prospect that "foo.bar" might mean either "object bar in
| schema foo" or "object bar in package foo".

Agreed, and in Sybase you may declare a procedure in a schema(or
database which is the Sybase term). If you want it global you declare it
in the "master" schema.

Oh cool. I knew that Oracle used "standard" for the name of the built-in
package, but I didn't know a name for the built-in schema. "master" sounds
good.

Take care,

Bill

Import Notes

Reply to msg id not found: m2bsj3k7dx.fsf@smaug.polygnosis.com | Resolved by subject fallback

#26

Peter Eisentraut

peter_e@gmx.net

about 24 years ago

In reply to: Bill Studenmund (#16)

Re: Package support for Postgres

Bill Studenmund writes:

Could you please give me an example of how to do this, say for plperl or
plpython? Just showing how two functions made with CREATE FUNCTION can use
global variables will be fine. This example will help me understand how
they work.

For PL/Tcl you use regular Tcl global variables:

create function produce(text) returns text as '
global foo; set foo $1;
' language pltcl;

create function consume() returns text as '
global foo; return $foo;
' language pltcl;

There is also a mechanism for one procedure to save private data across
calls.

For PL/Python you use a global dictionary:

create function produce(text) returns text as '
GD["key"] = args[0]
' language plpython;

create function consume() returns text as '
return GD["key"]
' language plpython;

There is also a dictionary for private data.

For PL/Perl I'm not sure if something has been implemented. In C you can
use shared memory, and for PL/sh you would use temp files of course. ;-)

--
Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter

#27

Bill Studenmund

wrstuden@netbsd.org

about 24 years ago

In reply to: Bill Studenmund (#19)

Re: schema support, was Package support for Postgres

On Tue, 16 Oct 2001, Bill Studenmund wrote:

I still think that schemas and packages are different, but I now think
they are interrelated. And that it shouldn't be too hard to leverage the
package work into schema support. Still a lot of work, but the package
work has shown how to go from one to two in a number of ways. :-)

First off, do you (Tom) have a spec for schema support? I think that would
definitly help things.

I found an on-line copy of the SQL92 spec, and I've been looking at it.

I think it wouldn't be _that_ much more work to add shema support to what
I've done for packages. Not trivial, but certainly not double the work.

But I have some questions.

The big one for now is how should you log into one schema or another?
psql database.schema ?

Here's a plan for schema support. But first let me review what packages
have.

Right now (in my implimentation), packages have added a "standard" package
(oid 10) which contains all of the built-in procedures, aggregates, types,
and operators. Whenever you use the normal CREATE commands, you add a
procedure, aggregate, operator, or type in the "standard" package.

There is a new table, pg_package, which lists the name of each installed
package and its owner. "standard" is owned by PGUID. packages are
referenced by the oid of the row describing the package in this table.

Whenever you look up a function or aggregate, you give the oid of the
package to look in in addition to the name (and types). Having the package
id in the index provides the namespacing.

Whenever you look up a type or operator, you don't have to give a package
id.

Whenever you call the parser to parse a command, you pass it the package
context (oid) in which the parsing takes place. If you are typing in
commands in psql, that package id is 10, or "standard". Likewise for sql
or plpgsql routines not in a package. If you are in an sql or plpgsql
routine which is in a package, the package's oid is passed in. That's what
has package routines look in the package first.

The parser also notes if you gave a package id or not (package.foo vs
foo). If you were in a package context and were not exact (foo in a
procedure in a package for instance), then all of the places which look up
functions will try "standard" if they don't find a match.

There is a table, pg_packglobal, which contains package globals for the
different PLs. It contains 5 columns. The first three are the package oid,
the language oid, and a sequence number. They are indexed. The two others
are variable name and variable type (of PostgreSQL type name and text
respectively). PLs for which these variables don't make sense are free to
ignore them.

Extending this for schema support.

Executive summary: all of the above becomes the infrastructure to let
different schemas have schema-private functions and aggregates.

We add a new table, pg_schema, which lists the schemas in this database.
It would contain a name column, an owner column, something to indicate
character set (?), and other stuff I don't know of. Schemas are referenced
internally by the oid of the entry in this table.

There is a built-in schema, "master". It will have a fixed oid, probalby 9
or 11.

The "master" schema will own the "standard" package oid 10, which contains
all of the built-in functions, and ones added by create function/etc.

Each new schema starts life with a "standard" package of its own. This
package is the one which holds functions & aggregates made with normal
commands (create function, create aggregate) when you're logged into that
schema.

pg_package grows two more columns. One references the schema containing
the package. The other contains the oid of the "parent" package. The idea
is this oid is the next oid to look in when you are doing an inexact oid
search. It's vaguely like ".." on a file system.

For master.standard, this column is 0, indicating no further searching.
For say foo.standard (foo is a schema), it would be the oid of
master.standard (10). Likewise for a package baz in the master schema, it
would be master.standard. For a package in a schema, it would be the oid
of the "standard" package of the schema. As an example, say the foo schema
had a package named bup. For baz.bup, this column would have the oid of
baz.standard.

Right now I'm in the process of redoing the parser changes I made so that
the scanner doesn't need to recognize package names. When this is done,
the parser will be able to deal with schema.function and package.function.
Oh, also schema.table.attr too. schema.package.function won't be hard, but
it will be messy.

The only other part (which is no small one) is to add namespacing to the
rest of the backend. I expect that will mean adding a schema column to
pg_class, pg_type, and pg_operator.

Hmmm... We probably also need a command to create operator classes, and
the tables it touches would need a schema column too, and accesses will
need to be schema savy.

Well, that's a lot for now. Thoughts?

Take care,

Bill

#28

Peter Eisentraut

peter_e@gmx.net

about 24 years ago

In reply to: Bill Studenmund (#17)

Re: Package support for Postgres

Bill Studenmund writes:

Honestly, I do not understand why "global variables" have been such a sore
point for you.

My point is that the proposed "package support" introduces two features
that are a) independent, and b) already exist, at least in design.
Schemas are already planned as a namespace mechanism. Global variables in
PLs already exist in some PLs. Others can add it if they like. There
aren't any other features introduced by "package support" that I can see
or that you have explicitly pointed out.

So the two questions I ask myself are:

1. Are package namespaces "better" than schemas? The answer to that is
no, because schemas are more standard and more general.

2. Are global variables via packages "better" than the existing setups?
My answer to that is again no, because the existing setups respect
language conventions, maintain the separation of the backend and the
language handlers, and of course they are already there and used.

So as a consequence we have to ask ourselves,

3. Do "packages" add anything more to the table than those two elementary
features? Please educate us.

4. Would it make sense to provide "packages" alongside the existing
mechanisms that accomplish approximately the same thing. That could be
debated, in case we agree that they are approximately the same thing.

--
Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter

#29

Gunnar Rønning

gunnar@polygnosis.com

about 24 years ago

In reply to: Bill Studenmund (#8)

Re: schema support, was Package support for Postgres

* Bill Studenmund <wrstuden@netbsd.org> wrote:
|
| Packages aren't schemas. What they bring to the table is they facilitate
| making stored procedures (functions). You can have twelve different
| developers working on twenty different packages, with no fear of name
| conflicts. The package names will have to be different, so there can be
| functions with the same names in different pacakges.

Hmm. But if we had schema support can't we just package those procedures
into a schema with a given name ? Maybe my stored procedures needs some other
resources as well that should not conflict with other packages, like temp
tables or such. It then seems to me that using schemas can solve everything
that packages do and more ?

| For the most part, I think packages and schemas are orthogonal. I'm taking
| a cue from Oracle here. Oracle considers packages to be a schema-specific
| object.

What is really the difference functionality wise of making a subschema and
package ? In both cases you deal with the namespace issues.

--
Gunnar Rï¿½nning - gunnar@polygnosis.com
Senior Consultant, Polygnosis AS, http://www.polygnosis.com/

#30

Tom Lane

tgl@sss.pgh.pa.us

about 24 years ago

In reply to: Gunnar Rønning (#29)

Re: schema support, was Package support for Postgres

Gunnar =?iso-8859-1?q?R=F8nning?= <gunnar@polygnosis.com> writes:

Hmm. But if we had schema support can't we just package those procedures
into a schema with a given name ? Maybe my stored procedures needs some other
resources as well that should not conflict with other packages, like temp
tables or such. It then seems to me that using schemas can solve everything
that packages do and more ?

Yeah. I am wondering whether we couldn't support Oracle-style packages
as a thin layer of syntactic sugar on top of schemas. I am concerned
about the prospect that "foo.bar" might mean either "object bar in
schema foo" or "object bar in package foo".

regards, tom lane

#31

Gunnar Rønning

gunnar@polygnosis.com

about 24 years ago

In reply to: Tom Lane (#30)

Re: schema support, was Package support for Postgres

* Tom Lane <tgl@sss.pgh.pa.us> wrote:
|
| > resources as well that should not conflict with other packages, like temp
| > tables or such. It then seems to me that using schemas can solve everything
| > that packages do and more ?
|
| Yeah. I am wondering whether we couldn't support Oracle-style packages
| as a thin layer of syntactic sugar on top of schemas. I am concerned
| about the prospect that "foo.bar" might mean either "object bar in
| schema foo" or "object bar in package foo".

Agreed, and in Sybase you may declare a procedure in a schema(or
database which is the Sybase term). If you want it global you declare it
in the "master" schema.

--
Gunnar Rï¿½nning - gunnar@polygnosis.com
Senior Consultant, Polygnosis AS, http://www.polygnosis.com/

#32

Bill Studenmund

wrstuden@netbsd.org

about 24 years ago

In reply to: Peter Eisentraut (#28)

Re: Package support for Postgres

On Sat, 20 Oct 2001, Peter Eisentraut wrote:

Yes, you're right. Actually, sharing data across PostgreSQL C functions
is trivial because you can just use global variables in your dlopen
modules.

Exactly. That's why I never envisioned "C" or "internal" functions using
package global variables. :-)

Take care,

Bill

Import Notes

Reply to msg id not found: Pine.LNX.4.30.0110191909270.640-100000@peter.localdomain | Resolved by subject fallback

#33

Bill Studenmund

wrstuden@netbsd.org

about 24 years ago

In reply to: Bill Studenmund (#32)

Re: Package support for Postgres

On Sat, 20 Oct 2001, Rod Taylor wrote:

But what if you want a C function to set a variable which can be
accessed using an SQL, perl, PLpgSQL or other function type?
Shouldn't a global variable be global between all types of functions?

No. Doing that requires that all languages have the same internal storage
of variables. And it's more than just an int4 takes up 4 bytes. Look in
the plpgsql source, at struct PLpgSQL_var. There is a fair amount of into
about a variable.

While we could harmonize the info storage, making globals global across
all languages would also would mean breaking down a lot of the isolation
between PLs. Right now they are their own independent entities. To tie
them together like this would, in my opinion, make them
fragilly-interconnected.

My suggestion is to just add a get and a set routine in one language, and
have it store the global. :-)

Take care,

Bill

Import Notes

Reply to msg id not found: 02a601c159cfadfdd3808001a8c0@jester | Resolved by subject fallback

#34

Peter Eisentraut

peter_e@gmx.net

about 24 years ago

In reply to: Bill Studenmund (#20)

Re: Package support for Postgres

Bill Studenmund writes:

create function produce(text) returns text as '
GD["key"] = args[0]
' language plpython;

create function consume() returns text as '
return GD["key"]
' language plpython;

There is also a dictionary for private data.

Private to what?

Private to the procedure, but saved across calls (during one session).

Oh, by shared memory, do you mean SYSV Shared Memory (like how the
backends talk) or just memory shared between routines? I ask as part of
the idea with these variables is that they are backend-specific. So C
routines actually should NOT used SYSV Shared Mem. :-)

Yes, you're right. Actually, sharing data across PostgreSQL C functions
is trivial because you can just use global variables in your dlopen
modules.

--
Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter

#35

Jean-Michel POURE

jm.poure@freesurf.fr

about 24 years ago

In reply to: Bill Studenmund (#22)

Re: Package support for Postgres

I think Jean-Michael's comments were right. While I'm not sure if things
will be as overwhelming as he predicted, packages (even as implimented in
my patch) will help people develop code libraries for PostgreSQL. And that
will make PostgreSQL applications easier.

PostgreSQL is a fantastic tool which lacks a few features to become #1.
IMHO, these features are :

Beginners: ability to drop and reorganize columns. I know this sounds

stupid for hackers, but this is #1 need when migrating from beginner tools
such as MySQL or Access. Candidates?

Advanced users: PACKAGE support to create and distribute software

libraries. CREATE OR REPLACE VIEW, CREATE OR REPLACE TRIGGER, etc...
PL/pgSQL installation by default with infinite loop protection.

Professionnal user: PostgreSQL does not lack many things. Maybe

server-side Java would be great in terms of object/inheritence approach. I
run several databases, one being hosted on a double Pentium Linux box with
U2W discs. When using triggers, views, rules and PL/pgSQL, applications can
be optimized so much that you "hardly" reach the hardware limits.

Power users: load balancing, replication, tablespace. I can't really say.

I first discovered PostgreSQL when localizing Oracle8i to French. We asked
Oracle if I could use their software to help us during the translation
process. They answered "OK, but you have to pay $xx.xxx because you have a
double processor box". This was about twice the price we were getting paid.
That day, I understood Oracle did not care about its users and was only
interested in fast, short term profit.

Cheers,
Jean-Michel

#36

Serguei Mokhov

sa_mokho@alcor.concordia.ca

about 24 years ago

In reply to: Bill Studenmund (#7)

Re: namespaces

----- Original Message -----
From: Bill Studenmund <wrstuden@netbsd.org>
Sent: Sunday, October 14, 2001 8:22 AM

My description of namespaces seems to have caused a fair bit of confusion.
Let me try again.

The ability of the package changes to automatically check standard when
you give an ambiguous function name while in a package context is a
convenience for the procedure author. Nothing more.

It means that when you want to use one of the built in functions
(date_part, abs, floor, sqrt etc.) you don't have to prefix it with
"standard.". You can just say date_part(), abs(), floor(), sqrt(), etc.
The only time you need to prefix a call with "standard." is if you want to
exclude any so-named routines in your own package.

Quick question: would it be possible then create a 'system' package
and 'system' (or 'master' if you will) schema (when it's implemented),
move over all the system tables (pg_*) into the master schema
and functions into the 'system' package, so that no name conflicts will arise
when creating types, functions, tables, etc with the same names as system ones?

--
S.

#37

Bill Studenmund

wrstuden@netbsd.org

about 24 years ago

In reply to: Serguei Mokhov (#36)

Re: namespaces

On Sun, 21 Oct 2001, Serguei Mokhov wrote:

----- Original Message -----
From: Bill Studenmund <wrstuden@netbsd.org>
Sent: Friday, October 19, 2001 2:04 PM

Quick question: would it be possible then create a 'system' package
and 'system' (or 'master' if you will) schema (when it's implemented),
move over all the system tables (pg_*) into the master schema
and functions into the 'system' package, so that no name conflicts will arise
when creating types, functions, tables, etc with the same names as system ones?

Yes. That is part of my plan actually. :-)

Oh, one reason that needs to happen is that everything needs to be in a
package or a schema; for the tables where they do namespacing the schema
or package is part of the primary key.

Hmm. I see. Then there won't be a problem of creating any DB object
with the system name.

It will work, though if you start creating tables named "pg_class", I
think you might make your head hurt. Also, your own int4 type might not be
such a good idea...

In the patch I sent in last week,

Yeah, I remember that one. Took me a couple of minutes
to download. You know, it never hurts to compress things:
then the patch would be ~10 times less in size, and you wouldn't
have to worry about PINE messing up with your code in the message body... :)
And that would reduce the bounce rate too.

Just a kind and gentle cry to reduce the size of patches sent to
my mailbox and save some bandwidth on the way :)

Ok. :-) Next time I will either compress it or I'll mail in a URL.

all of the built-in functions and
aggregates are in the "standard" package, and you can infact reference
them as standard.foo.

When you refer to it just foo(), and you have foo() defined
in more than one package, how do you resolve this? Do you also have
a notion of a global package and sub-packages?

There is a very simple search path system. If you are in a package (in a
function that is part of a package), you look for foo in that package. If
you don't find it there, you look in stadard. If it's not there, you don't
find it. To look in other packages than the one you're in, you have to say
which one it is. With schemas, if your package is not in "master" or
whatever it is called, you look first in your package, then in
your_schema.standard, then in master.standard.

Take care,

Bill

Import Notes

Reply to msg id not found: 016501c15a5fe8710b005dd9fea9@gunn | Resolved by subject fallback

#38

Bill Studenmund

wrstuden@netbsd.org

about 24 years ago

In reply to: Gunnar Rønning (#31)

Re: schema support, was Package support for Postgres

On Sun, 21 Oct 2001, Peter Eisentraut wrote:

Bill Studenmund writes:

The big one for now is how should you log into one schema or another?
psql database.schema ?

Each user has a default schema, which is by default the schema with the
same name as the user name, or if no such schema exists, it's the DEFAULT
schema (which I believe is what Oracle calls it). Then there should be
something like set schema path. I don't think schemas should be a
connection parameter. -- That would be my ideas anyway.

I can see advantages for both; if you just connect to a database that has
schemas, you get a schema with your name if it's there, and a default
otherwise. But I can see definite advantages to being able to specify.

Whenever you look up a function or aggregate, you give the oid of the
package to look in in addition to the name (and types). Having the package
id in the index provides the namespacing.

Whenever you look up a type or operator, you don't have to give a package
id.

While I understand that package.+ is silly, anything that make operators
and functions work fundamentally differently is suspicious. A common
search mechanism that works for everything in packages (or subschemas,
which I'd prefer) would/should/could allow you to do without those
prefixes.

Why? Operators are used differently than functions. That strikes me as a
good reason to namespace them differently.

Conceptually the main determiner of what function you want is the name, at
least as far as from what I can tell from talking with all the programmers
I know. Yes, we make sure the types match (are part of the primary key),
but the name is the main concept. Operators, however, are more
intent-based. The '+' operator means I want these two things added
together. I don't care so much what types are involved, I want adding to
happen. That's a difference of intent. And that's the reason that I think
different namespacing rules make sense.

Part of it is that I only expect a package to add operators for types it
introduced. So to be considering them, you had to have done something that
ties in the type in the package. Like you had to make a column in a table
using it.

Another take on that is that I expect the main user of (direct) function
calls calling package functions will be other functions in that package,
while the main users of operators will be places which have used a type
from said package. Like queries pulling things out of tables using that
type. So the function namespacing is a concenience/tool primarily for the
package developer, while the operator and type namespacing is more a
convenience for the end application developer.

Also, you seem to be wanting a path-search ability that is something like
the PATH environment variable. This pathing is fundamentally different; to
use unix terms, it is ".:..". The fundamental difference is that there are
no "absolute" paths. The searching is totally location (of routine)
dependant.

To add something like an absolute path would totally break the whole
motivation for packages. The idea is to give a developer an area overwhich
s/he has total name control, but if s/he needs built-in routines, s/he
doesn't need to say "standard." to get at them.

If we allow something like "absolute paths" in the package namespacing,
then we totally destroy that. Because a package developer can't be sure
what pathing is going on, s/he really has no clue what packages will get
found in what order. So then you have to be explicit in the name of all
the functions you use (otherwise if a user essentially puts something
other than "." at the head of the path, then you don't get routines in
your own package), or run the risk of getting all sorts of run-time
errors. A feature designed to make writing packages easier now makes them
harder. That strikes me as a step backwards.

There is a built-in schema, "master". It will have a fixed oid, probalby 9
or 11.

The built-in schemas is called DEFINITION_SCHEMA.

Why is it different from the "DEFAULT" you get when you log into a
database which doesn't have a schema whose name matches your username?

The only other part (which is no small one) is to add namespacing to the
rest of the backend. I expect that will mean adding a schema column to
pg_class, pg_type, and pg_operator.

Yup. But you can replace the owner package with the schema column,
because the owner property will be transferred to the schema.

Not necessarily. A user other than the one who owns the schema can add a
package to it. It's the same thing as why we keep track of who added a
function. :-)

Take care,

Bill

Import Notes

Reply to msg id not found: Pine.LNX.4.30.0110202251510.827-100000@peter.localdomain | Resolved by subject fallback

#39

Rod Taylor

rbt@barchord.com

about 24 years ago

In reply to: Bill Studenmund (#32)

Re: Package support for Postgres

But what if you want a C function to set a variable which can be
accessed using an SQL, perl, PLpgSQL or other function type?
Shouldn't a global variable be global between all types of functions?
--
Rod Taylor

There are always four sides to every story: your side, their side, the
truth, and what really happened.

----- Original Message -----
From: "Bill Studenmund" <wrstuden@netbsd.org>
To: "Peter Eisentraut" <peter_e@gmx.net>
Cc: "PostgreSQL Development" <pgsql-hackers@postgresql.org>
Sent: Friday, October 19, 2001 1:59 PM
Subject: Re: [HACKERS] Package support for Postgres

On Sat, 20 Oct 2001, Peter Eisentraut wrote:

Yes, you're right. Actually, sharing data across PostgreSQL C

functions

is trivial because you can just use global variables in your

dlopen

modules.

Exactly. That's why I never envisioned "C" or "internal" functions

using

package global variables. :-)

Take care,

Bill

---------------------------(end of

broadcast)---------------------------

Show quoted text

TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

#40

Peter Eisentraut

peter_e@gmx.net

about 24 years ago

In reply to: Bill Studenmund (#27)

Re: schema support, was Package support for Postgres

Bill Studenmund writes:

The big one for now is how should you log into one schema or another?
psql database.schema ?

Each user has a default schema, which is by default the schema with the
same name as the user name, or if no such schema exists, it's the DEFAULT
schema (which I believe is what Oracle calls it). Then there should be
something like set schema path. I don't think schemas should be a
connection parameter. -- That would be my ideas anyway.

Whenever you look up a function or aggregate, you give the oid of the
package to look in in addition to the name (and types). Having the package
id in the index provides the namespacing.

Whenever you look up a type or operator, you don't have to give a package
id.

While I understand that package.+ is silly, anything that make operators
and functions work fundamentally differently is suspicious. A common
search mechanism that works for everything in packages (or subschemas,
which I'd prefer) would/should/could allow you to do without those
prefixes.

There is a built-in schema, "master". It will have a fixed oid, probalby 9
or 11.

The built-in schemas is called DEFINITION_SCHEMA.

The only other part (which is no small one) is to add namespacing to the
rest of the backend. I expect that will mean adding a schema column to
pg_class, pg_type, and pg_operator.

Yup. But you can replace the owner package with the schema column,
because the owner property will be transferred to the schema.

Hmmm... We probably also need a command to create operator classes, and
the tables it touches would need a schema column too, and accesses will
need to be schema savy.

Well, that's a lot for now. Thoughts?

That "lot" was sort of the problem with tackling this until now. ;-)

--
Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter

#41

Bill Studenmund

wrstuden@netbsd.org

about 24 years ago

In reply to: Peter Eisentraut (#40)

Re: schema support, was Package support for Postgres

On Tue, 23 Oct 2001, Thomas Lockhart wrote:

(I've been following the thread, at least casually ;)

intent-based. The '+' operator means I want these two things added
together. I don't care so much what types are involved, I want adding to
happen. That's a difference of intent. And that's the reason that I think
different namespacing rules make sense.

But operators *are* functions underneath the covers. So different
namespacing rules seem like a recipe for missed associations and
unexpected results.

Underneath the covers, yes. But "those covers" make the difference in what
I'm thinking of. An operator isn't just one function call, it can be
multiple ones. And not just multiple itterations, but multiple different
ones depending on what the optimizer is doing. That's why you can give an
operator more than just the procedure operator. You also give it a join
proc and a restrict proc, and you tie it in with a commutator, negator,
and two sort operators. When you use an operator, you're specifying an
intent, and all of these parts of an operator's definition help make that
intent happen.

The problem though is that if operators are namespaced the same as
functions, then we destroy one of the benefits of packages - a seperate
namespace for functions.

Can you think of a specific example where this namespacing causes
problems? The functions and aggregates are namespaced off of the
containing schema, but the types and operators aren't. Inside the package,
you have access to everything in the package. In the enclosing schema, you
have immediate access to the types and operators, and can get at the
functions and aggregates by "packname.".

Part of it is that I only expect a package to add operators for types it
introduced. So to be considering them, you had to have done something that
ties in the type in the package. Like you had to make a column in a table
using it.

I'd expect schemas/packages to have operators and functions for existing
types, not just new ones. That is certainly how our extensibility
features are used; we are extensible in several dimensions (types,
functions, operators) and they do not all travel together. We can't
guess at the future intent of a package developer, and placing
limitations or assumptions about what *must* be in a package just limits
future (unexpected or suprising) uses.

Please play with the patch and try it.

There is no restriction in the patch that operators (and functions &
aggregates) can only be for types new to the package. You can add
operators for built-in types, and you can even add operators for other
user-specified types too. And pg_dump will make sure that a user-defiend
type used in a package will get dumped before the package.

The "absolute path" scoping and lookup scheme is defined in SQL99. I'm
not sure I understand the issue in the last paragraph: you seem to be
making the point that absolute paths are Bad because package developers
don't know what those paths might be. But otoh allowing absolute paths
(and/or embedding them into a package) gives the developer *precise*
control over what their package calls and what the behaviors are. istm
that if a package developer needs to specify precisely the resources his
package requires, then he can do that. And if he wants to leave it
flexible and determined by scoping and pathing rules, then he can do
that too.

afaik relative pathing is not specified in the standard, but we might
want to consider how we would implement that as an extension and whether
that gives more power to the packager or developer.

I've found the spec, and am still studying it. Though what I've found so
far is a schema search path. My main interest is for the package itself to
be the first thing searched. After that, whatever search path is
appropriate for the schema seems like the right thing to do. So, besides
the fact I think we should do schemas as per the spec, I think using the
schema search path is the right thing to do.

The built-in schemas is called DEFINITION_SCHEMA.

Why is it different from the "DEFAULT" you get when you log into a
database which doesn't have a schema whose name matches your username?

It may not be. But SQL99 specifies the name.

Actually, the most interesting thing I saw was in the start of chapter 21
(the chapter on the DEFINITION_SCHEMA) at the bottom of section 21.1.

"The specification provides only a model of the base tables that are
required, and does not imply that an SQL-implimentation shall provide the
functionality in the manner described in this clause."

As I understand that, we are free to impliment things as we wish. We just
need to have the pieces/functionality described therein. We *don't* have
to use the names or exact formats used in the spec.

As a concrete example of what I mean, I believe that we are free to
pg_attribute as it is and still comply with section 21.7 ATTRIBUTES base
table. UDT_NAME we impliment with attrelid, ATTRIBUTE_NAME we do with
attname, ORDINAL_POSITION -> attnum, ATTRIBUTE_DEFAULT we do, IS_NULLABLE
-> attnotnull (we twist the sense, but the functionality is there),
ATTRIBUTES_PRIMARY_KEY we agree (to the extent we don't support schemas
or catalogs yet).

The main thing is that we eventually have an INFORMATION_SCHEMA full of
views which turn the system tables into what the standards want.

Take care,

Bill

Import Notes

Reply to msg id not found: 3BD5A642.FFE7BE2B@fourpalms.org | Resolved by subject fallback

#42

Bill Studenmund

wrstuden@netbsd.org

about 24 years ago

In reply to: Bill Studenmund (#41)

Re: schema support, was Package support for Postgres

On Tue, 23 Oct 2001, Peter Eisentraut wrote:

Bill Studenmund writes:

Why? Operators are used differently than functions.

I don't think so. Operators are a syntacticaly convenience for functions.
That's what they always have been and that's what they should stay.

How does what you say disagree with what I said?

Operators certainly have a lot more structure to them than a function call
does. That's why you give the restriction and join functions, and you hand
them commutation and negation operators. The optimizer uses all of these
tools to make what you want to have happen (adding for instance) happen as
efficiently as possible. It will re-write what you said in a different
manner, if that different manner holds the same intent yet is more
efficient.

Conceptually the main determiner of what function you want is the name, at
least as far as from what I can tell from talking with all the programmers
I know. Yes, we make sure the types match (are part of the primary key),
but the name is the main concept. Operators, however, are more
intent-based. The '+' operator means I want these two things added
together. I don't care so much what types are involved, I want adding to
happen. That's a difference of intent. And that's the reason that I think
different namespacing rules make sense.

Naive developers all program by "intent". If I invoke a + operator then I
expect it to add. If I call a sqrt() function then I expect it to
calculate the square root. If I execute an INSERT statement then I would
prefer that I did not delete anything. Designing systems to work by
"intent" can be construed as an aspect of "user-friendliness".

But the more knowledgeable programmer is mildly aware of what's going on
behind the scenes: Both "+" and "sqrt" are just names for function code
that may or may not do what you think they do. So this applies to both
functions and operators.

So I am a "naive" programmer because I mention intent above? That is very
condescending, Peter, and strikes me as inappropriate. Are you really so
out of things to say that you have to resort to condescension?

At what point have you tried to determine how experienced a programmer I
am? You've never asked me for my resume, or what projects I've worked on
before this. Your comments indicate to me that you have not yet tried the
patch I sent in, and if you have, I really doubt you've made packages with
it. So how can you judge? The fact I disagree with you?

Also, in your "naive" vs "more knowledgeable" programmer comparison, you
mention "user-friendliness" in the "naive" part, the "bad" part. Do you
really think that "user-friendliness" is a bad thing? I hope not.

I think that "user-friendliness" is an important part of programming. It
means that your tools or programmatic interfaces have (or lack) an
appropriateness to the task at hand. It's not just for command lines or
GUIs. I've worked with different libraries and programming packages, and I
have experienced the ones where the design and layout make the
library/package useful, and ones where the design and layout get in the
way.

Hmmm... Thining about it, I now think you're right, that we should have a
way to handle pathing. A package author should be able to set the path
used for routines in the package, though.

The built-in schemas is called DEFINITION_SCHEMA.

Why is it different from the "DEFAULT" you get when you log into a
database which doesn't have a schema whose name matches your username?

Because SQL says so.

Actually I'm not so sure. See the note to Thomas, especially the last
sentance of section 21.1. It seems that if we have an INFORMATION_SCHEMA
which contains all the views in the spec, and our system tables have
the right behaviors, then we are fine.

Actually, the text in section 20.1, "Introduction to Information Schema
and Definition Schema" is more direct:

"The views of the Information Schema are viewed tables defined in terms of
the base tables of the Definition Schema. The only purpose of the
Definition Schema is to provide a data model to support the Information
Schema and to assist understanding. An SQL-implementation need do no more
than simulate the existance of the Definition Schema as viewed through the
Information Schema views."

So if we have INFORMATION_SCHEMA with the right vies in it, we are fine
doing whatever we want.

Not necessarily. A user other than the one who owns the schema can add a
package to it. It's the same thing as why we keep track of who added a
function. :-)

Blech, I meant "you can replace the owner column with the schema column".

That's actually what I thought you said. :-)

I stil think we can't do that, since someone other than the schema owner
can add a package to a schema. :-) Or at least that's the assumption I'm
running on; we allow users other than PGUID to create functions (and
operators and aggregates and types) in the default (whatever it will be
called) schema, so why shouldn't they be allowed to add packages?

Take care,

Bill

Import Notes

Reply to msg id not found: Pine.LNX.4.30.0110222036130.825-100000@peter.localdomain | Resolved by subject fallback

#43

Bill Studenmund

wrstuden@netbsd.org

about 24 years ago

In reply to: Bill Studenmund (#42)

Re: schema support, was Package support for Postgres

On Wed, 24 Oct 2001, Peter Eisentraut wrote:

Bill Studenmund writes:

So I am a "naive" programmer because I mention intent above?

No.

Sorry, that's the way it came across. As you've said that was not your
intent, please disregard my response; I was responding to something you
did not mean.

So if we have INFORMATION_SCHEMA with the right vies in it, we are fine
doing whatever we want.

I think some interpretation of the SQL standard can be used to prove that
a new schema should not contain any objects. So you're going to have to
stick to the two predefined schemas to put the system catalogs in. Then
again, other interpretations may be used to prove other things. But to me
the intent of the standard is clear that system catalogs are meant to go
into the defintion schema, and I don't see a reason why this could not be
so.

I had been thining that we could have the built-in objects (functions,
types, operators, etc.) in whatever was the "default.master" package, but
it looks like SQL99 doesn't like that. You're right that built-in things
have to be in a different schema than user-added things.

Section 10.4 contains text:

ii) If RN contains a <schema name> SN, then

Case:

1) If SN is INFORMATION_SCHEMA, then the single candidate routine of RI is
the built-in function identified by <routine name>.

Actually 4.24 is more exact. It defines a built-in function as a routine
which is returned from teh query:

SELECT DISTINCT ROUTINE_NAME
FROM INFORMATION_SCHEMA.ROUTINES
WHERE SPECIFIC_SCHEMA = INFORMATION_SCHEMA

Actually, since we have to have an INFORMATION_SCHEMA, and
"INFORMATION_SCHEMA" gets thrown around a lot, I think it'd be easiest to
make "INFORMATION_SCHEMA" the schema containing built-in things. Otherwise
(among other things) we have to replace DEFINTION_SCHEMA with
INFORMATION_SCHEMA in the above-defined view (and in a lot of other
places).

Thoughts?

I stil think we can't do that, since someone other than the schema owner
can add a package to a schema. :-) Or at least that's the assumption I'm
running on; we allow users other than PGUID to create functions (and
operators and aggregates and types) in the default (whatever it will be
called) schema, so why shouldn't they be allowed to add packages?

Because SQL says so. All objects in a schema belong to the owner of the
schema. In simple setups you have one schema per user with identical
names. This has well-established use patterns in other SQL RDBMS.

Then implimenting schemas will cause a backwards-incompatabile change
regarding who can add/own functions (and operators and ..).

Mainly because when we introduce schemas, all SQL transactions will have
to be performed in the context of *some* schema. I think "DEFAULT" was the
name you mentioned for when there was no schema matching the username. As
"DEFAULT" (or whatever we call it) will be made by the PG super user (it
will actually be added as part of initdb), then that means that only the
super user will own functions. That's not how things are now, and imposing
that on upgrading users will likely cause pain.

Think about a dump/restore upgrade from 7.2 to 7.3. Right now users other
than PGUID can own functions (and triggers, etc.). When you do the
restore, though, since your dump had no schema support, it all goes into
DEFAULT. Which will be owned by PGUID. So now we either have a schema with
things owned by a user other than the schema owner, or we have a broken
restore.

Or we have to special case the DEFAULT schema. Which strikes me as a bad
thing to do.

For now, I'd suggest letting users other than a schema owner own things in
a schema, and later on add controls over who can add things to a schema.
Then when you do a "CREATE SCHEMA" command, you will implicitly be adding
restrictions prohibiting someone other than the owner from adding things
(including packages/subschemas).

I agree that this might not be what everyone would want, but it seems
extensible. However, I feel we're trying to design too many things at
once. Let's do schemas first the way they're in the SQL standard, and
then we can try to tack on ownership or subschemas or package issues.

Well, the packages changes can easily be turned into schema support for
functions and aggregates, so we are part way there. Also, the packages
changes illustrate how to make system-wide internal schema changes of the
type adding SQL schemas will need. Plus, packages as they are now are
useful w/o schema support.

And there's the fact that schemas were wanted for 7.2, and didn't happen.
Withouth external adgitation, will they happen for 7.3? Given the size of
the job, I understand why they didn't happen (the package changes so far
represent over 3 months of full-time programming). We've got some momentum
now, I'd say let's run with it. :-)

Take care,

Bill

Import Notes

Reply to msg id not found: Pine.LNX.4.30.0110232319430.642-100000@peter.localdomain | Resolved by subject fallback

#44

Thomas Lockhart

lockhart@fourpalms.org

about 24 years ago

In reply to: Bill Studenmund (#38)

Re: schema support, was Package support for Postgres

(I've been following the thread, at least casually ;)

Why? Operators are used differently than functions. That strikes me as a
good reason to namespace them differently.
Conceptually the main determiner of what function you want is the name, at
least as far as from what I can tell from talking with all the programmers
I know. Yes, we make sure the types match (are part of the primary key),
but the name is the main concept. Operators, however, are more
intent-based. The '+' operator means I want these two things added
together. I don't care so much what types are involved, I want adding to
happen. That's a difference of intent. And that's the reason that I think
different namespacing rules make sense.

But operators *are* functions underneath the covers. So different
namespacing rules seem like a recipe for missed associations and
unexpected results.

Part of it is that I only expect a package to add operators for types it
introduced. So to be considering them, you had to have done something that
ties in the type in the package. Like you had to make a column in a table
using it.

I'd expect schemas/packages to have operators and functions for existing
types, not just new ones. That is certainly how our extensibility
features are used; we are extensible in several dimensions (types,
functions, operators) and they do not all travel together. We can't
guess at the future intent of a package developer, and placing
limitations or assumptions about what *must* be in a package just limits
future (unexpected or suprising) uses.

Another take on that is that I expect the main user of (direct) function
calls calling package functions will be other functions in that package,
while the main users of operators will be places which have used a type
from said package. Like queries pulling things out of tables using that
type. So the function namespacing is a concenience/tool primarily for the
package developer, while the operator and type namespacing is more a
convenience for the end application developer.

We are probably drawing too fine a distinction here.

Also, you seem to be wanting a path-search ability that is something like
the PATH environment variable. This pathing is fundamentally different; to
use unix terms, it is ".:..". The fundamental difference is that there are
no "absolute" paths. The searching is totally location (of routine)
dependant.
To add something like an absolute path would totally break the whole
motivation for packages. The idea is to give a developer an area overwhich
s/he has total name control, but if s/he needs built-in routines, s/he
doesn't need to say "standard." to get at them.
If we allow something like "absolute paths" in the package namespacing,
then we totally destroy that. Because a package developer can't be sure
what pathing is going on, s/he really has no clue what packages will get
found in what order. So then you have to be explicit in the name of all
the functions you use (otherwise if a user essentially puts something
other than "." at the head of the path, then you don't get routines in
your own package), or run the risk of getting all sorts of run-time
errors. A feature designed to make writing packages easier now makes them
harder. That strikes me as a step backwards.

The "absolute path" scoping and lookup scheme is defined in SQL99. I'm
not sure I understand the issue in the last paragraph: you seem to be
making the point that absolute paths are Bad because package developers
don't know what those paths might be. But otoh allowing absolute paths
(and/or embedding them into a package) gives the developer *precise*
control over what their package calls and what the behaviors are. istm
that if a package developer needs to specify precisely the resources his
package requires, then he can do that. And if he wants to leave it
flexible and determined by scoping and pathing rules, then he can do
that too.

afaik relative pathing is not specified in the standard, but we might
want to consider how we would implement that as an extension and whether
that gives more power to the packager or developer.

There is a built-in schema, "master". It will have a fixed oid, probalby 9
or 11.

The built-in schemas is called DEFINITION_SCHEMA.

Why is it different from the "DEFAULT" you get when you log into a
database which doesn't have a schema whose name matches your username?

It may not be. But SQL99 specifies the name.

- Thomas

#45

Peter Eisentraut

peter_e@gmx.net

about 24 years ago

In reply to: Bill Studenmund (#38)

Re: schema support, was Package support for Postgres

Bill Studenmund writes:

Why? Operators are used differently than functions.

I don't think so. Operators are a syntacticaly convenience for functions.
That's what they always have been and that's what they should stay.

Conceptually the main determiner of what function you want is the name, at
least as far as from what I can tell from talking with all the programmers
I know. Yes, we make sure the types match (are part of the primary key),
but the name is the main concept. Operators, however, are more
intent-based. The '+' operator means I want these two things added
together. I don't care so much what types are involved, I want adding to
happen. That's a difference of intent. And that's the reason that I think
different namespacing rules make sense.

Naive developers all program by "intent". If I invoke a + operator then I
expect it to add. If I call a sqrt() function then I expect it to
calculate the square root. If I execute an INSERT statement then I would
prefer that I did not delete anything. Designing systems to work by
"intent" can be construed as an aspect of "user-friendliness".

But the more knowledgeable programmer is mildly aware of what's going on
behind the scenes: Both "+" and "sqrt" are just names for function code
that may or may not do what you think they do. So this applies to both
functions and operators.

Part of it is that I only expect a package to add operators for types it
introduced.

This is an arbitrary restriction that you might find reasonable, but other
developers might not.

Another take on that is that I expect the main user of (direct) function
calls calling package functions will be other functions in that package,
while the main users of operators will be places which have used a type
from said package.

See above.

To add something like an absolute path would totally break the whole
motivation for packages.

Yes. If we add something like subschemas then something more
sophisticated than a Unix-style path would have to be engineered.

The built-in schemas is called DEFINITION_SCHEMA.

Why is it different from the "DEFAULT" you get when you log into a
database which doesn't have a schema whose name matches your username?

Because SQL says so.

The only other part (which is no small one) is to add namespacing to the
rest of the backend. I expect that will mean adding a schema column to
pg_class, pg_type, and pg_operator.

Yup. But you can replace the owner package with the schema column,
because the owner property will be transferred to the schema.

Not necessarily. A user other than the one who owns the schema can add a
package to it. It's the same thing as why we keep track of who added a
function. :-)

Blech, I meant "you can replace the owner column with the schema column".

--
Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter

#46

Bill Studenmund

wrstuden@netbsd.org

about 24 years ago

In reply to: Peter Eisentraut (#45)

Re: schema support, was Package support for Postgres

On Thu, 25 Oct 2001, Peter Eisentraut wrote:

Bill Studenmund writes:

Mainly because when we introduce schemas, all SQL transactions will have
to be performed in the context of *some* schema. I think "DEFAULT" was the
name you mentioned for when there was no schema matching the username. As
"DEFAULT" (or whatever we call it) will be made by the PG super user (it
will actually be added as part of initdb), then that means that only the
super user will own functions.

If you want to own the function you should create it in your schema. If
you want to create a function and let someone else own it, then ask
someone else for write access to their schema. (This should be a rare
operation and I don't think SQL provides for it, so we can ignore it in
the beginning.) If there is no schema you have write access to then you
cannot create things. People have been dying for that kind of feature,
and schemas will enable us to have it.

I think I understand your descriptions of what you will be *able* to do
with schemas. And also that they may describe how you *should* do thing
with schema. I'm not disagreeing with you about that. But that's not the
angle I'm working.

I guess to get at my point, I can ask this question, "Will schema support
invalidate existing PostgreSQL database designs."

I would like the answer to be no. I would like our users to be able to
dump a pre-schema-release db, upgrade, and then restore into a
schema-aware PostgreSQL. And have their restore work.

Since the admin is restoring a db which was made before schema support,
there are no CREATE SCHEMA commands in it (or certainly not ones which do
a real schema create - right now CREATE SCHEMA is a synonym for CREATE
DATABASE). So the restore will create everything in the "DEFAULT" schema
(The schema where creates done w/o a CREATE SCHEMA go).

But right now, we can have different users owning things in one database.
So there will be restores out there which will have different users owning
things in the same restored-to schema, which will be "DEFAULT".

So we have to have (or just retail) the ability to have different users
owning things in one schema.

Think about it this way: In its simplest implementation (which is in fact
the Entry Level SQL92, AFAIR), a schema can only have the name of the user
that owns it. I suspect that this is because SQL has no CREATE USER, so
CREATE SCHEMA is sort of how you become a user that can do things. At the
same time, schemas would space off the things each user creates, and if
you want to access someone else's stuff you have to prefix it with the
user's name <user>.<table>, sort of like ~user/file. The generic
"namespace" nature of schemas only comes from the fact that in higher
SQL92 levels a user can own more than one schema with different names.

(Interesting thesis: It might be that our users are in fact schemas
(minus the parser changes) and we can forget about the whole thing.)

Hmmm... I don't think so, but hmmm..

Now what does this spell for the cooperative development environments you
described? Difficult to tell, but perhaps some of these would do, none of
which are standard, AFAIK:

* schemas owned by groups/roles

I think that schemas owned by roles are part of SQL99.

* access privileges to schemas, perhaps some sort of sticky bit
functionality

Or we have to special case the DEFAULT schema. Which strikes me as a bad
thing to do.

I don't necessarily think of the DEFAULT schemas as a real schema. It
might just be there so that *some* schema context is set if you don't have
one set otherwise, but you don't necessarily have write access to it.
But it might not be necessary at all.

While if we were starting over, we might be able to (maybe should have)
design(ed) things so we don't need it, I think a "DEFAULT" schema would
help give users of the schema-aware PostgreSQL an experience similar to
what they have now.

And getting back to where this all started, I think we do need to have the
ability to have users other than the schema owner own things in the
schema, so we should keep the owner id column in the pg_package table. I'm
not against, when things are all said and done, having the default be that
only the schema owner can add things. But that's a policy decision. :-)

Take care,

Bill

Import Notes

Reply to msg id not found: Pine.LNX.4.30.0110250020290.647-100000@peter.localdomain | Resolved by subject fallback

#47

Peter Eisentraut

peter_e@gmx.net

about 24 years ago

In reply to: Bill Studenmund (#42)

Re: schema support, was Package support for Postgres

Bill Studenmund writes:

Why? Operators are used differently than functions.

I don't think so. Operators are a syntacticaly convenience for functions.
That's what they always have been and that's what they should stay.

How does what you say disagree with what I said?

Operators certainly have a lot more structure to them than a function call
does. That's why you give the restriction and join functions, and you hand
them commutation and negation operators.

These are just hints to the optimizer; they don't affect the invocation
interface.

So I am a "naive" programmer because I mention intent above?

No.

So if we have INFORMATION_SCHEMA with the right vies in it, we are fine
doing whatever we want.

I think some interpretation of the SQL standard can be used to prove that
a new schema should not contain any objects. So you're going to have to
stick to the two predefined schemas to put the system catalogs in. Then
again, other interpretations may be used to prove other things. But to me
the intent of the standard is clear that system catalogs are meant to go
into the defintion schema, and I don't see a reason why this could not be
so.

Blech, I meant "you can replace the owner column with the schema column".

That's actually what I thought you said. :-)

I stil think we can't do that, since someone other than the schema owner
can add a package to a schema. :-) Or at least that's the assumption I'm
running on; we allow users other than PGUID to create functions (and
operators and aggregates and types) in the default (whatever it will be
called) schema, so why shouldn't they be allowed to add packages?

Because SQL says so. All objects in a schema belong to the owner of the
schema. In simple setups you have one schema per user with identical
names. This has well-established use patterns in other SQL RDBMS.

I agree that this might not be what everyone would want, but it seems
extensible. However, I feel we're trying to design too many things at
once. Let's do schemas first the way they're in the SQL standard, and
then we can try to tack on ownership or subschemas or package issues.

--
Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter

#48

Ross J. Reedstrom

reedstrm@rice.edu

about 24 years ago

In reply to: Bill Studenmund (#43)

Re: schema support, was Package support for Postgres

On Tue, Oct 23, 2001 at 08:43:32AM -0700, Bill Studenmund wrote:

And there's the fact that schemas were wanted for 7.2, and didn't happen.
Withouth external adgitation, will they happen for 7.3? Given the size of
the job, I understand why they didn't happen (the package changes so far
represent over 3 months of full-time programming). We've got some momentum
now, I'd say let's run with it. :-)

I feel much better about my unsucessfully attempt at a naive schema
implementation, last Christmas holidays: I had no where _near_ 3 months
of time in on that.

;-)
Ross

--
Ross Reedstrom, Ph.D. reedstrm@rice.edu
Executive Director phone: 713-348-6166
Gulf Coast Consortium for Bioinformatics fax: 713-348-6182
Rice University MS-39
Houston, TX 77005

#49

Bill Studenmund

wrstuden@netbsd.org

about 24 years ago

In reply to: Ross J. Reedstrom (#48)

Re: schema support, was Package support for Postgres

On Fri, 26 Oct 2001, Peter Eisentraut wrote:

Bill Studenmund writes:

I guess to get at my point, I can ask this question, "Will schema support
invalidate existing PostgreSQL database designs."

I would like the answer to be no. I would like our users to be able to
dump a pre-schema-release db, upgrade, and then restore into a
schema-aware PostgreSQL. And have their restore work.

I think this can work. Assume a database like this:

user1: CREATE TABLE foo ( );
user2: CREATE TABLE bar ( );

The dump of this would be something like:

\c - user1
CREATE TABLE foo ( );

\c - user2
CREATE TABLE bar ( );

So the tables would be created in the appropriate schema context for each
user. The remaining problem then is that the two schemas user1 and user2
would need to be created first, but we could make this implicit somewhere.
For instance, a user creation would automatically create a schema for the
user in template1. Or at least the dump could be automatically massaged
to this effect.

But right now, we can have different users owning things in one database.
So there will be restores out there which will have different users owning
things in the same restored-to schema, which will be "DEFAULT".

This would fundamentally undermine what an SQL schema is and don't help
interoperability a bit. If we want to implement our own namespace
mechanism we can call it NAMESPACE. But if we want something called
SCHEMA then we should implement it the way it's standardized, and there is
certainly a tight coupling between schemas and ownership. In fact, as
I've said already, a schema *is* the ownership; a user is just a weird
PostgreSQL invention.

Hmmm.... I've been looking into this, and you are right. All of the views
in INFORMATION_SCHEMA that I looked at contain text like

WHERE (SCHEMA_OWNER = CURRENT_USER OR SCHEMA_OWNER IN (SELECT ROLL_NAME
FROM ENABLED_ROLES) )

So then we'll need a tool to massage old-style dumps to:

1) create the schema, and

2) path all of the schemas together by default.

Well, at least a number of tables won't gain a new colum as a result of
this; the owner column will become the schema_id column. :-)

Take care,

Bill

Import Notes

Reply to msg id not found: Pine.LNX.4.30.0110262147060.1937-100000@peter.localdomain | Resolved by subject fallback

#50

Peter Eisentraut

peter_e@gmx.net

about 24 years ago

In reply to: Bill Studenmund (#43)

Re: schema support, was Package support for Postgres

Bill Studenmund writes:

Because SQL says so. All objects in a schema belong to the owner of the
schema. In simple setups you have one schema per user with identical
names. This has well-established use patterns in other SQL RDBMS.

Then implimenting schemas will cause a backwards-incompatabile change
regarding who can add/own functions (and operators and ..).

Mainly because when we introduce schemas, all SQL transactions will have
to be performed in the context of *some* schema. I think "DEFAULT" was the
name you mentioned for when there was no schema matching the username. As
"DEFAULT" (or whatever we call it) will be made by the PG super user (it
will actually be added as part of initdb), then that means that only the
super user will own functions.

If you want to own the function you should create it in your schema. If
you want to create a function and let someone else own it, then ask
someone else for write access to their schema. (This should be a rare
operation and I don't think SQL provides for it, so we can ignore it in
the beginning.) If there is no schema you have write access to then you
cannot create things. People have been dying for that kind of feature,
and schemas will enable us to have it.

Think about it this way: In its simplest implementation (which is in fact
the Entry Level SQL92, AFAIR), a schema can only have the name of the user
that owns it. I suspect that this is because SQL has no CREATE USER, so
CREATE SCHEMA is sort of how you become a user that can do things. At the
same time, schemas would space off the things each user creates, and if
you want to access someone else's stuff you have to prefix it with the
user's name <user>.<table>, sort of like ~user/file. The generic
"namespace" nature of schemas only comes from the fact that in higher
SQL92 levels a user can own more than one schema with different names.

(Interesting thesis: It might be that our users are in fact schemas
(minus the parser changes) and we can forget about the whole thing.)

Now what does this spell for the cooperative development environments you
described? Difficult to tell, but perhaps some of these would do, none of
which are standard, AFAIK:

* schemas owned by groups/roles

* access privileges to schemas, perhaps some sort of sticky bit
functionality

Or we have to special case the DEFAULT schema. Which strikes me as a bad
thing to do.

I don't necessarily think of the DEFAULT schemas as a real schema. It
might just be there so that *some* schema context is set if you don't have
one set otherwise, but you don't necessarily have write access to it.
But it might not be necessary at all.

--
Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter

#51

Gunnar Rønning

gunnar@polygnosis.com

about 24 years ago

In reply to: Bill Studenmund (#46)

Re: schema support, was Package support for Postgres

* Bill Studenmund <wrstuden@netbsd.org> wrote:

| I would like the answer to be no. I would like our users to be able to
| dump a pre-schema-release db, upgrade, and then restore into a
| schema-aware PostgreSQL. And have their restore work.

Important point. Also having a standard is fine, but by limiting ourselves
to it we are ignoring issues that might be very useful. Draw the line.

--
Gunnar Rï¿½nning - gunnar@polygnosis.com
Senior Consultant, Polygnosis AS, http://www.polygnosis.com/

#52

Peter Eisentraut

peter_e@gmx.net

about 24 years ago

In reply to: Bill Studenmund (#46)

Re: schema support, was Package support for Postgres

Bill Studenmund writes:

I guess to get at my point, I can ask this question, "Will schema support
invalidate existing PostgreSQL database designs."

I would like the answer to be no. I would like our users to be able to
dump a pre-schema-release db, upgrade, and then restore into a
schema-aware PostgreSQL. And have their restore work.

I think this can work. Assume a database like this:

user1: CREATE TABLE foo ( );
user2: CREATE TABLE bar ( );

The dump of this would be something like:

\c - user1
CREATE TABLE foo ( );

\c - user2
CREATE TABLE bar ( );

So the tables would be created in the appropriate schema context for each
user. The remaining problem then is that the two schemas user1 and user2
would need to be created first, but we could make this implicit somewhere.
For instance, a user creation would automatically create a schema for the
user in template1. Or at least the dump could be automatically massaged
to this effect.

But right now, we can have different users owning things in one database.
So there will be restores out there which will have different users owning
things in the same restored-to schema, which will be "DEFAULT".

This would fundamentally undermine what an SQL schema is and don't help
interoperability a bit. If we want to implement our own namespace
mechanism we can call it NAMESPACE. But if we want something called
SCHEMA then we should implement it the way it's standardized, and there is
certainly a tight coupling between schemas and ownership. In fact, as
I've said already, a schema *is* the ownership; a user is just a weird
PostgreSQL invention.

I think that schemas owned by roles are part of SQL99.

Correct.

--
Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter

#53

Bruce Momjian

pgman@candle.pha.pa.us

almost 24 years ago

In reply to: Peter Eisentraut (#28)

Re: Package support for Postgres

Based on this email, I am assuming we don't want to add Package support
but instead will do it with schemas, which Tom is already working on.

---------------------------------------------------------------------------

Peter Eisentraut wrote:

Bill Studenmund writes:

Honestly, I do not understand why "global variables" have been such a sore
point for you.

My point is that the proposed "package support" introduces two features
that are a) independent, and b) already exist, at least in design.
Schemas are already planned as a namespace mechanism. Global variables in
PLs already exist in some PLs. Others can add it if they like. There
aren't any other features introduced by "package support" that I can see
or that you have explicitly pointed out.

So the two questions I ask myself are:

1. Are package namespaces "better" than schemas? The answer to that is
no, because schemas are more standard and more general.

2. Are global variables via packages "better" than the existing setups?
My answer to that is again no, because the existing setups respect
language conventions, maintain the separation of the backend and the
language handlers, and of course they are already there and used.

So as a consequence we have to ask ourselves,

3. Do "packages" add anything more to the table than those two elementary
features? Please educate us.

4. Would it make sense to provide "packages" alongside the existing
mechanisms that accomplish approximately the same thing. That could be
debated, in case we agree that they are approximately the same thing.

--
Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Package support for Postgres

Attachments:

Attachments: