Data type removal

Started by Brandon Ibachalmost 28 years ago29 messages
#1Brandon Ibach
bibach@infomansol.com

I, for one, am in favor of converting some of the type packages to
loadable modules. Having those in the backend when they aren't being
used is much like compiling extra modules into my Apache web server
because they're "kinda neat", even though they won't be used. Also,
if we follow the idea that we should have as many unique features in
the backend as possible, we could end up with all sorts of features
that are only used by a subset of users. For instance, I don't use
the geometric types, but I do use a soundex type which I created. Why
isn't the soundex type a standard part of the backend? I, personally,
am glad it's not, because I like the version of this type that I
created a lot better than the one that's in contrib.
As far as the whole performance-improvement issue, I can say that
if the backend is, say, 50K smaller due to the removal of those
functions, that's just that much less swapping and that much more
memory that's available for the OS buffer cache. Isn't that an
improvement worth considering?
How about this as a compromise. We make these packages available
in the contrib or other such area as loadable modules as well as
making them available right in the main backend code, but setup
configure options to enable/disable them, so when I compile, I can say
"--without-geometry" to compile without those types and functions. If
I want to add them back in later, I can compile the loadable module
version and add them in.

-Brandon :)

#2The Hermit Hacker
scrappy@hub.org
In reply to: Brandon Ibach (#1)
Re: [HACKERS] Data type removal

On Tue, 24 Mar 1998, Brandon Ibach wrote:

I, for one, am in favor of converting some of the type packages to
loadable modules. Having those in the backend when they aren't being
used is much like compiling extra modules into my Apache web server
because they're "kinda neat", even though they won't be used. Also,

I don't know about Apache, but is there any noticeable performance
difference between having extra modules installed or not installed? It
makes the binary slightly larger, but does it change performance?

if we follow the idea that we should have as many unique features in the
backend as possible, we could end up with all sorts of features that are
only used by a subset of users. For instance, I don't use the geometric
types, but I do use a soundex type which I created. Why isn't the
soundex type a standard part of the backend? I, personally, am glad
it's not, because I like the version of this type that I created a lot
better than the one that's in contrib.

If yours is an improvement over what we have in contrib, why not
submit it?

As far as the whole performance-improvement issue, I can say that
if the backend is, say, 50K smaller due to the removal of those
functions, that's just that much less swapping and that much more
memory that's available for the OS buffer cache. Isn't that an
improvement worth considering?

Not if it removes the Postgres from PostgreSQL...I don't have the
ip_and_mac contrib stuff loaded, because I never think of it. I know I
can use it, mind you, just never think of adding it in...

How about this as a compromise. We make these packages available in
the contrib or other such area as loadable modules as well as making
them available right in the main backend code, but setup configure
options to enable/disable them, so when I compile, I can say
"--without-geometry" to compile without those types and functions. If I
want to add them back in later, I can compile the loadable module
version and add them in.

As I stated earlier, if someone wants to add a
'--without-geometry' option to configure that removes it, I have no
problem with that...but it will only be to remove the feature, not add it
in. Hell, I'm probably one that would even make use of it, since, right
now, I don't use the geometric types either...but the default is to have
everything included. I don't want to have to think about it someday when
I decide I want to use those geometric types...

#3Noname
darrenk@insightdist.com
In reply to: The Hermit Hacker (#2)
Re: [HACKERS] Data type removal

How about this as a compromise. We make these packages available in
the contrib or other such area as loadable modules as well as making
them available right in the main backend code, but setup configure
options to enable/disable them, so when I compile, I can say
"--without-geometry" to compile without those types and functions. If I
want to add them back in later, I can compile the loadable module
version and add them in.

As I stated earlier, if someone wants to add a
'--without-geometry' option to configure that removes it, I have no problem
with that...but it will only be to remove the feature, not add it in.

I can live with this. Everything is "--with-xxx" by default, but can not
be built in by using "--without-xxx".

Would it be acceptable to move the code for these to a new directory, say,
src/modules? Something along the lines of...

src/modules/geometric
src/modules/ip_and_mac

This would allow for each type to have a pg_proc.h, pg_type.h, etc. Much
cleaner than #define'ing the heck out of the existing include files. The
geometric/pg_proc.h would contain the entries from pg_proc.h. Then there
would also be a .sql file that contains the necessary commands to load
the module if it was not compiled in or was just needed in one database.

Forcing someone to re-compile to use module would seem to go completely
against the extensibility side of postgres.

I think once there is one thing there as a module, it will serve as an
example for others. A simple example would be the cash/money code. Add
an indexing method to it and it would be a brief but complete example.

darrenk

#4Noname
geek+@cmu.edu
In reply to: The Hermit Hacker (#2)
Re: [HACKERS] Data type removal

-----BEGIN PGP SIGNED MESSAGE-----

Speaking of data type removal, I was wondering if there were a better
way to handle arrays of types. From looking in the catalog, it
appears that for each type, there is also declared a similar type,
which is the array version. It seems that arrays should be considered
more flags on a field, than a field type in themselves. Does this
make sense to anybody else?

- --
=====================================================================
| JAVA must have been developed in the wilds of West Virginia. |
| After all, why else would it support only single inheritance?? |
=====================================================================
| Finger geek@andrew.cmu.edu for my public key. |
=====================================================================

-----BEGIN PGP SIGNATURE-----
Version: 2.6.2

iQBVAwUBNRfhLIdzVnzma+gdAQGUvwH8CWMmMo633do81jgZd+pPPJPW481nfwB9
awec8H9PjZ3QsShK4cSIJmC9Yg+IMBp3E+goHYssAO4X42Nf15+0EA==
=8YSU
-----END PGP SIGNATURE-----

#5The Hermit Hacker
scrappy@hub.org
In reply to: Noname (#3)
Re: [HACKERS] Data type removal

On Tue, 24 Mar 1998, Darren King wrote:

How about this as a compromise. We make these packages available in
the contrib or other such area as loadable modules as well as making
them available right in the main backend code, but setup configure
options to enable/disable them, so when I compile, I can say
"--without-geometry" to compile without those types and functions. If I
want to add them back in later, I can compile the loadable module
version and add them in.

As I stated earlier, if someone wants to add a
'--without-geometry' option to configure that removes it, I have no problem
with that...but it will only be to remove the feature, not add it in.

I can live with this. Everything is "--with-xxx" by default, but can not
be built in by using "--without-xxx".

Would it be acceptable to move the code for these to a new directory, say,
src/modules? Something along the lines of...

src/modules/geometric
src/modules/ip_and_mac

This would allow for each type to have a pg_proc.h, pg_type.h, etc. Much
cleaner than #define'ing the heck out of the existing include files. The
geometric/pg_proc.h would contain the entries from pg_proc.h. Then there
would also be a .sql file that contains the necessary commands to load
the module if it was not compiled in or was just needed in one database.

Forcing someone to re-compile to use module would seem to go completely
against the extensibility side of postgres.

I think once there is one thing there as a module, it will serve as an
example for others. A simple example would be the cash/money code. Add
an indexing method to it and it would be a brief but complete example.

Why don't we start with the ip_and_mac stuff...take that,
integrate it into the "core", build it so that --without-ip_and_mac
disables it, and let's see how that one works.

What I'm curious about right now is what different its going to
make. Does having ip_and_mac in the core, even if I don't use it, reduce
performance, or makes no difference? How much does it increase the
footprint of the binary? If it makes negligble difference, then it isn't
worth doing, you may as well just leave everything in there...

But, build us a sample with the ip_and_mac stuff, as to what you
are thinking and let's go from that...but ignore the *core* stuff for
now...

#6Brandon Ibach
bibach@infomansol.com
In reply to: The Hermit Hacker (#2)
Re: [HACKERS] Data type removal

The Hermit Hacker said:

On Tue, 24 Mar 1998, Brandon Ibach wrote:

I, for one, am in favor of converting some of the type packages to
loadable modules. Having those in the backend when they aren't being
used is much like compiling extra modules into my Apache web server
because they're "kinda neat", even though they won't be used. Also,

I don't know about Apache, but is there any noticeable performance
difference between having extra modules installed or not installed? It
makes the binary slightly larger, but does it change performance?

I can't say for sure what effect this would have on performance.
However, what example can you give of a monolithic piece of software
that is superior to a slim, well-designed core with an architecture
for expansion? In fact, if you want to talk about what puts the
Postgres in PostgreSQL, I think the ability to dynamically add types
and functions is a big part of that. I would say that a large part of
the reason that packages like MySQL exist is so that people have an
option of a light-weight, simple SQL database. Why can't we serve
that same purpose? Never consider unneeded resource consumption
lightly, as it can be a very important factor in someone's choice of a
database package.

if we follow the idea that we should have as many unique features in the
backend as possible, we could end up with all sorts of features that are
only used by a subset of users. For instance, I don't use the geometric
types, but I do use a soundex type which I created. Why isn't the
soundex type a standard part of the backend? I, personally, am glad
it's not, because I like the version of this type that I created a lot
better than the one that's in contrib.

If yours is an improvement over what we have in contrib, why not
submit it?

Perhaps I will, but my point is that I had a *choice* to do my own
implementation, rather than being forced to use one that didn't suit
my needs as well.

As far as the whole performance-improvement issue, I can say that
if the backend is, say, 50K smaller due to the removal of those
functions, that's just that much less swapping and that much more
memory that's available for the OS buffer cache. Isn't that an
improvement worth considering?

Not if it removes the Postgres from PostgreSQL...I don't have the
ip_and_mac contrib stuff loaded, because I never think of it. I know I
can use it, mind you, just never think of adding it in...

Since when do geometric types put the Postgres in PostgreSQL? Or
IP and MAC types, for that matter? I can use the improvements in 6.3,
but I haven't upgraded yet because it will be a bit of a job. Is
there something we can throw in to solve that, or is it maybe
something I have to do if I want the benefits?

How about this as a compromise. We make these packages available in
the contrib or other such area as loadable modules as well as making
them available right in the main backend code, but setup configure
options to enable/disable them, so when I compile, I can say
"--without-geometry" to compile without those types and functions. If I
want to add them back in later, I can compile the loadable module
version and add them in.

As I stated earlier, if someone wants to add a
'--without-geometry' option to configure that removes it, I have no
problem with that...but it will only be to remove the feature, not add it
in. Hell, I'm probably one that would even make use of it, since, right
now, I don't use the geometric types either...but the default is to have
everything included. I don't want to have to think about it someday when
I decide I want to use those geometric types...

You may notice that the option I suggested was one that could be
invoked to remove the feature. In other words, the default is to have
the feature included unless the user asks for it to not be. All I'm
saying is, don't force the users to run a Postgres which has unneeded
baggage.

-Brandon :)

#7Noname
dg@illustra.com
In reply to: Brandon Ibach (#1)
Re: [HACKERS] Data type removal

Brandon writes:

I, for one, am in favor of converting some of the type packages to
loadable modules. Having those in the backend when they aren't being

...

As far as the whole performance-improvement issue, I can say that
if the backend is, say, 50K smaller due to the removal of those
functions, that's just that much less swapping and that much more
memory that's available for the OS buffer cache. Isn't that an
improvement worth considering?

No. If the functions are not used, the pages of the executable
that contain them are never loaded. So there is no effective memory impact
from having these where they are.

As far as disk space, disk costs $0.03 per megabyte these days so saving
50Kb is completely insignificant.

It may be a good idea to make the system more modular, but there are
thousands of good ideas for improving it. I think we would be better
served by spending our time on something that makes a difference, not
just mindlessly pushing code around for no particular benefit.

How about this as a compromise. We make these packages available
in the contrib or other such area as loadable modules as well as
making them available right in the main backend code, but setup
configure options to enable/disable them, so when I compile, I can say
"--without-geometry" to compile without those types and functions. If
I want to add them back in later, I can compile the loadable module
version and add them in.

I don't particularly like this model of adding extensions either. It implies
that you have to rerun configure and build the whole system to add a module.
In fact, all that is needed is to build the module and run the setup
scripts and the running database picks up the new functionality.

-dg

David Gould dg@illustra.com 510.628.3783 or 510.305.9468
Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612
- Linux. Not because it is free. Because it is better.

#8Noname
dg@illustra.com
In reply to: Noname (#4)
Re: [HACKERS] Data type removal

Speaking of data type removal, I was wondering if there were a better
way to handle arrays of types. From looking in the catalog, it
appears that for each type, there is also declared a similar type,
which is the array version. It seems that arrays should be considered
more flags on a field, than a field type in themselves. Does this
make sense to anybody else?

Is an array the same thing as a scalar?

What would be the benefit of making arrays "flags on a field" instead of
a "field type in themselves". Seriously, how would this improve _anything_?

The postgres type system is very flexible and powerful as is. What is the
problem this is trying to solve?

What is the motivation for data type removal?

-dg

David Gould dg@illustra.com 510.628.3783 or 510.305.9468
Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612
- Linux. Not because it is free. Because it is better.

#9Noname
geek+@cmu.edu
In reply to: Noname (#8)
Re: [HACKERS] Data type removal

-----BEGIN PGP SIGNED MESSAGE-----

From: dg@illustra.com (David Gould)

Speaking of data type removal, I was wondering if there were a better
way to handle arrays of types. From looking in the catalog, it
appears that for each type, there is also declared a similar type,
which is the array version. It seems that arrays should be considered
more flags on a field, than a field type in themselves. Does this
make sense to anybody else?

Is an array the same thing as a scalar?

Uhm, I don't believe so.

What would be the benefit of making arrays "flags on a field" instead of
a "field type in themselves". Seriously, how would this improve _anything_?

In particular, I was thinking of the PostgreSQL module for Python. It
has a nice interface, but needs intimate knowledge of the data types.
Somehow, it seems to be excessively kludgy to have to have e.g. an
int4 type, and an int4[] type. When I query a table, if one of the
fields is an array of integers, then I either want to know that it's
an array, or it's integer. If the returned type is "array," then I
have to magically know that the array is filled with integers. If
it's an integer, then I just have to recognize that it's actually a
series of integers. With the current setup, there has to be separate
type handlers for char(2), char(2)[], int4, int4[], int2, int2[],
float8, float8[], etcetera. I'd rather have handlers for the base
type, and an iterator that's used for arrays.

What is the motivation for data type removal?

In this case, I don't want to *remove* the data type, just change the
identification method. An array of int4 should really be recognized
as both (type)int4 and (type)array. Urhm, since this is PostgreSQL, I
guess I'm arguing for type composition (how appropriate for an
object-relational database).

- --
=====================================================================
| JAVA must have been developed in the wilds of West Virginia. |
| After all, why else would it support only single inheritance?? |
=====================================================================
| Finger geek@andrew.cmu.edu for my public key. |
=====================================================================

-----BEGIN PGP SIGNATURE-----
Version: 2.6.2

iQBVAwUBNRgR1YdzVnzma+gdAQFGVgH+PX4jvMZfhFK1ZzVdYuL844gCrg5ZleDt
T4x2b6TbOOqNGn7SZVQtpuCZXG7lSDFMNDRan2ft0OmE9ZlJ69ZpkA==
=q92I
-----END PGP SIGNATURE-----

#10Noname
darrenk@insightdist.com
In reply to: Noname (#9)
Re: [HACKERS] Data type removal

What would be the benefit of making arrays "flags on a field" instead of
a "field type in themselves". Seriously, how would this improve _anything_?

In particular, I was thinking of the PostgreSQL module for Python. It
has a nice interface, but needs intimate knowledge of the data types.
Somehow, it seems to be excessively kludgy to have to have e.g. an
int4 type, and an int4[] type. When I query a table, if one of the
fields is an array of integers, then I either want to know that it's
an array, or it's integer. If the returned type is "array," then I
have to magically know that the array is filled with integers. If
it's an integer, then I just have to recognize that it's actually a
series of integers. With the current setup, there has to be separate
type handlers for char(2), char(2)[], int4, int4[], int2, int2[],
float8, float8[], etcetera. I'd rather have handlers for the base
type, and an iterator that's used for arrays.

FWIW,

Haven't followed this too closely, but there is some array_iterator
code in the contrib directory coutesy of Massimo.

darrenk

#11The Hermit Hacker
scrappy@hub.org
In reply to: Brandon Ibach (#6)
Re: [HACKERS] Data type removal

On Tue, 24 Mar 1998, Brandon Ibach wrote:

The Hermit Hacker said:

On Tue, 24 Mar 1998, Brandon Ibach wrote:

I, for one, am in favor of converting some of the type packages to
loadable modules. Having those in the backend when they aren't being
used is much like compiling extra modules into my Apache web server
because they're "kinda neat", even though they won't be used. Also,

I don't know about Apache, but is there any noticeable performance
difference between having extra modules installed or not installed? It
makes the binary slightly larger, but does it change performance?

I can't say for sure what effect this would have on performance.
However, what example can you give of a monolithic piece of software
that is superior to a slim, well-designed core with an architecture
for expansion?

PostgreSQL?

Marc G. Fournier
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org

#12The Hermit Hacker
scrappy@hub.org
In reply to: Noname (#8)
Re: [HACKERS] Data type removal

On Tue, 24 Mar 1998, David Gould wrote:

What is the motivation for data type removal?

There appear to be two "motivations" from what I'm hearing:

1. reduced memory footprint
2. reduced "disk" footprint

From what I've been able to gather, I believe that ppl are
expecting that the result is a faster server, based on less "perceived"
fat...nobody has yet been able to prove that though, which I believe
Darren is working on.

Part of this does have me thinking though, in that one of the
*big* features that I perceived PostgreSQL to have was the fact that it
was/is very easy to extend our datatypes and functions without having to
recompile the server.

So, right now, I'm kinda sitting on the fence in that I'm curious
if we *are* taking anything away by moving the geometrics from inside of
the server to a loadable module, given that the fact that we *are* able to
do that is what appears to be one of our major points of "uniqueness"...

Right now, there has been alot of speculation as to whether or not
moving the geometrics into a loadable module or not would gain (or lose,
for that matter) us anything...

Let's say we have a $prefix/modules directory, where we move the
geometrics stuff into. And, let's say we have an 'loadmod' command added
that, if someone did 'loadmod geometrics;' from psql, it loaded up the
geometrics module...would we have the *same* functionality as if it were
compiled in, like it is now? Or we would lose something?

To me, this is one of the *biggest* features of PostgreSQL, the
fact that we can easily add in new features without recompiling...but
without seeing actual numbers (performance as well as "disk"), we can
discuss this till we are blue in the face.

Darren...if you want to provide us with a working model of this,
using the geometrics, so be it...I won't say that it will or won't be
integrated without seeing this in operation, and without seeing some
*actual* benefit to it...as the saying goes "Talk is Cheap..."

So, I say, let's leave this as a "to be researched" and wait for
something concrete to look at...

Marc G. Fournier
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org

#13Jackson, DeJuan
djackson@cpsgroup.com
In reply to: The Hermit Hacker (#12)
RE: [HACKERS] Data type removal

On Tue, 24 Mar 1998, David Gould wrote:

What is the motivation for data type removal?

There appear to be two "motivations" from what I'm hearing:

1. reduced memory footprint
2. reduced "disk" footprint

3. modularization (In my experience, that in and of itself is a
benefit.)
3a. Encapsulation
3b. Re-Organization (isn't always a bad thing)

I haven't looked at enough of the postgres code to be anything even, in
any small way, an expert, but I have found that having someone approach
a problem from a new perspective, when viewed with an open mind is never
detrimental. So I for one would like to see what Darren comes up with,
could be very useful even if it isn't for what he originally intends.

Just my $0.01 ( I can no longer afford a full $0.02)
-DEJ

#14Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Brandon Ibach (#1)
Re: [HACKERS] Data type removal

As far as the whole performance-improvement issue, I can say that
if the backend is, say, 50K smaller due to the removal of those
functions, that's just that much less swapping and that much more
memory that's available for the OS buffer cache. Isn't that an
improvement worth considering?

Well, it is really only 50k per installation, not 50k per backend
because they all share the same executable in the buffer cache.

The problem with breaking it up, as I see it, is that all of a sudden
the regression tests and the nice descriptions of \do, \df, etc are much
harder to implement.

Maybe in six months, when we will have resolved all of the
performance/SQL92/feature issues, this will be a good use of our time,
but at this point, trying to keep all that straight while working on
much more important issues to end-users is hard to justify.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#15wward
wward@gwi.net
In reply to: Jackson, DeJuan (#13)
RE: [HACKERS] Data type removal

1. reduced memory footprint
2. reduced "disk" footprint

3. modularization (In my experience, that in and of itself is a
benefit.)

THIS IS RIGHT ON !!!

Wayne

Show quoted text

3a. Encapsulation
3b. Re-Organization (isn't always a bad thing)

I haven't looked at enough of the postgres code to be anything even, in
any small way, an expert, but I have found that having someone approach
a problem from a new perspective, when viewed with an open mind is never
detrimental. So I for one would like to see what Darren comes up with,
could be very useful even if it isn't for what he originally intends.

Just my $0.01 ( I can no longer afford a full $0.02)
-DEJ

#16Thomas G. Lockhart
lockhart@alumni.caltech.edu
In reply to: Noname (#8)
Re: [HACKERS] Data type removal

The postgres type system is very flexible and powerful as is. What is
the problem this is trying to solve?

What is the motivation for data type removal?

There are many motivations involved here. I brought it up originally
because the char2-16 types are not supported and do not provide any
functionality over the char(),varchar(),text string types.

Others suggested that since they do not care about the geometric types
that those should be removed too.

I regret bringing it up. Postgres has many unique features, and
stripping it to become a plain vanilla SQL92 machine is a waste of time
imo.

If any restructuring happens which removes, or makes optional, some of
the fundamental types, it should be accomplished so that the types can
be added in transparently, from a single set of source code, during
build time or after. OIDs would have to be assigned, presumably, and the
hardcoding of the function lookups for builtin types must somehow be
done incrementally. Probably needs more than this to be done right, and
without careful planning and implementation we will be taking a big step
backwards.

With the amount of time being spent in discussion, _still without any
quantitative estimates for performance improvement_, it seems like
someone should do some measurements. Even without them though it is
pretty clear that it won't benefit a large database, since the fraction
of time spent constructing a query will be small compared to the time it
takes to traverse the tables in the query.

Seems to me that Postgres' niche is at the high end of size and
capability, not at the lightweight end competing for design wins against
systems which don't even have transactions.

- Tom

#17Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Thomas G. Lockhart (#16)
Re: [HACKERS] Data type removal

If any restructuring happens which removes, or makes optional, some of
the fundamental types, it should be accomplished so that the types can
be added in transparently, from a single set of source code, during
build time or after. OIDs would have to be assigned, presumably, and the
hardcoding of the function lookups for builtin types must somehow be
done incrementally. Probably needs more than this to be done right, and
without careful planning and implementation we will be taking a big step
backwards.

This whole discussion reminds me of someone who thinks he is getting low
gas milage because his tire pressure is low, when in fact, he has a
whole in his gas tank. We have some major holes to plug, but they are
hard to see. People see the tire pressure/pg_class table, and think the
database can be improved. You could double the size of the system
tables, or the binary, and see no performance change. Make them
10-times bigger, and see if you find a difference.

Aren't there enough serious issues on the TODO list for people? These
are actual complaints/bugs or requests from users, not pie-in-the-sky
wouldn't-it-be-nice-if-we-had ideas.

I had a private e-mail conversation with someone, and they mentioned
that we seem to be mired in micro-optimizations, that really are not
going to do us any significant good. I see his point.

Look at the TCL/TK mess I got into just trying to get that to work, and
removing the stuff that checked for specific tck/tk version numbers.
Can you imagine the effort we will go through ripping out types?

Another thing. It is the type extensibility that makes us more complex,
not the types themselves.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#18Mattias Kregert
matti@algonet.se
In reply to: Noname (#7)
Dynamically loadable modules

David Gould wrote:

How about this as a compromise. We make these packages available
in the contrib or other such area as loadable modules as well as
making them available right in the main backend code, but setup
configure options to enable/disable them, so when I compile, I can say
"--without-geometry" to compile without those types and functions. If
I want to add them back in later, I can compile the loadable module
version and add them in.

I don't particularly like this model of adding extensions either. It implies
that you have to rerun configure and build the whole system to add a module.
In fact, all that is needed is to build the module and run the setup
scripts and the running database picks up the new functionality.

That's the nice thing with Linux. You can throw out the sound driver
any time and insert a new one whenever you want to. If I decide to
use the MIDI port, I can rmmod the old driver and insmod the new
one with midi support...
If I connect a Wingdogs PC to my network, I can insmod smbfs and
then I can mount the windows file resources from my Linux box. No need
to upgrade kernel or reboot.
If someone comes up with a new super-NFS, I just unmount my nfs's,
insmod super-nfs, and mount -a -t nfs...

With loadable modules in PostgreSQL, you could upgrade the ORACLE
compatability module by typing "make modules modules-install".

No need to kill all backends and postmaster, deal with angry users,
make backups, install new binaries, dump-and-restore, track down bugs,
and so on.

It would also be easier to contribute to PostgreSQL by creating
modules, than having to submit patches to the server core...
Let's say I have made a storage manager for tapes. It would be easier
to make a module which used a standard set of "modules" functions
to register the new manager in the core, than submitting patches
to be included in the core...
And if something causes troubles, just remove that module! Then it
would not be a big problem if a new feature was buggy. Perhaps it
would be easier to find bugs then (binary search - remove module
by module).

Just imagine Linux if people had to submit patches to the kernel
instead of just writing their drivers as modules!!!

/* m */

#19Mattias Kregert
matti@algonet.se
In reply to: Noname (#8)
Re: [HACKERS] Data type removal

Thomas G. Lockhart wrote:

With the amount of time being spent in discussion, _still without any
quantitative estimates for performance improvement_, it seems like
someone should do some measurements. Even without them though it is
pretty clear that it won't benefit a large database, since the fraction
of time spent constructing a query will be small compared to the time it
takes to traverse the tables in the query.

I don't think the main reason for moving things out from the
core would be performance, but for making it easier to manage
extensions.

/* m */

#20The Hermit Hacker
scrappy@hub.org
In reply to: Mattias Kregert (#19)
Re: [HACKERS] Data type removal

On Wed, 25 Mar 1998, Mattias Kregert wrote:

Thomas G. Lockhart wrote:

With the amount of time being spent in discussion, _still without any
quantitative estimates for performance improvement_, it seems like
someone should do some measurements. Even without them though it is
pretty clear that it won't benefit a large database, since the fraction
of time spent constructing a query will be small compared to the time it
takes to traverse the tables in the query.

I don't think the main reason for moving things out from the
core would be performance, but for making it easier to manage
extensions.

"Easier to manage extensions"??

#21Thomas G. Lockhart
lockhart@alumni.caltech.edu
In reply to: Noname (#8)
Re: [HACKERS] Data type removal

I don't think the main reason for moving things out from the
core would be performance, but for making it easier to manage
extensions.

??

#22Noname
dg@illustra.com
In reply to: Thomas G. Lockhart (#16)
Re: [HACKERS] Data type removal

The postgres type system is very flexible and powerful as is. What is
the problem this is trying to solve?

What is the motivation for data type removal?

There are many motivations involved here. I brought it up originally
because the char2-16 types are not supported and do not provide any
functionality over the char(),varchar(),text string types.

Others suggested that since they do not care about the geometric types
that those should be removed too.

I regret bringing it up. Postgres has many unique features, and
stripping it to become a plain vanilla SQL92 machine is a waste of time
imo.

I agree completely. This was the point I was trying to make by asking
for the motivation. If there was a clearcut proven performance gain to be
had, I would support it. But as it is just speculation, it seems kinda
pointless to push a bunch of code around (risking breaking it) for no
very good reason.

If any restructuring happens which removes, or makes optional, some of
the fundamental types, it should be accomplished so that the types can
be added in transparently, from a single set of source code, during
build time or after. OIDs would have to be assigned, presumably, and the
hardcoding of the function lookups for builtin types must somehow be
done incrementally. Probably needs more than this to be done right, and
without careful planning and implementation we will be taking a big step
backwards.

Exactly. Right now modules get installed by building the .so files and
then creating all the types, functions, rules, tables, indexes etc. This
is a bit more complicated than the Linux kernal 'insmod' operation. We could
easily make the situation worse through careless "whacking".

Seems to me that Postgres' niche is at the high end of size and
capability, not at the lightweight end competing for design wins against
systems which don't even have transactions.

And, there are already a couple of perfectly good 'toy' database systems.
What is the point of having another one? Postgres should move toward
becoming an "industrial strength" solution.

- Tom

Thank you for bringing some sense to this discussion.

-dg

David Gould dg@illustra.com 510.628.3783 or 510.305.9468
Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612
- Linux. Not because it is free. Because it is better.

#23Noname
dg@illustra.com
In reply to: Bruce Momjian (#17)
Re: [HACKERS] Data type removal

Bruce Momjian:

Aren't there enough serious issues on the TODO list for people? These
are actual complaints/bugs or requests from users, not pie-in-the-sky
wouldn't-it-be-nice-if-we-had ideas.

I had a private e-mail conversation with someone, and they mentioned
that we seem to be mired in micro-optimizations, that really are not
going to do us any significant good. I see his point.

Please feel free to quote me if I am the "someone" (I suspect I was).

Another thing. It is the type extensibility that makes us more complex,
not the types themselves.

It is also the type extensibility that makes us more valuable. There are
lots of "plain old SQL" systems out there. There are very few extendable
systems.

Perhaps we might want to start encourageing people to use the
extensibility and write a few useful application modules. I know of a
few that have been pretty sucessful in the commercial world:

- Html generation from inside the database. Store your pages in the db
with embedded <SQL ... /SQL> tags to dynamically generate web content
based on queries.

- Integrated full text search engines. Store documents and be able to
query " select abstract from papers
where contains(fulltext, 'database', 'transaction', 'recovery')".

And so on.

-dg

David Gould dg@illustra.com 510.628.3783 or 510.305.9468
Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612
- Linux. Not because it is free. Because it is better.

#24Noname
darrenk@insightdist.com
In reply to: Noname (#23)
Re: [HACKERS] Data type removal

If any restructuring happens which removes, or makes optional, some of
the fundamental types, it should be accomplished so that the types can
be added in transparently, from a single set of source code, during
build time or after. OIDs would have to be assigned, presumably, and the
hardcoding of the function lookups for builtin types must somehow be
done incrementally. Probably needs more than this to be done right, and
without careful planning and implementation we will be taking a big step
backwards.

Exactly. Right now modules get installed by building the .so files and
then creating all the types, functions, rules, tables, indexes etc. This
is a bit more complicated than the Linux kernal 'insmod' operation. We could
easily make the situation worse through careless "whacking".

Geez, Louise. What I'm proposing will _SHOWCASE_ the extensibility. I'm not
looking to remove it and hardcode everything.

Seems to me that Postgres' niche is at the high end of size and
capability, not at the lightweight end competing for design wins against
systems which don't even have transactions.

And, there are already a couple of perfectly good 'toy' database systems.
What is the point of having another one? Postgres should move toward
becoming an "industrial strength" solution.

Making some of the _mostly_unused_ data types loadable instead of always
compiled in will NOT make postgres into a 'toy'. Does "industrial strength"
imply having every possible data type compiled in? Regardless of use?

I think the opposite is true. Puttining some of these extra types into modules
will show people the greatest feature that separates us from the 'toy's.

I realize there might not be a performance hit _now_, but if someone doesn't
start this "loadable module" initiative, every Tom, Dick and Harry will want
their types in the backend and eventually there _will_ be a performance hit.
Then the problem would be big enough to be a major chore to convert the many,
many types to loadable instead of only doing a couple now.

I'm not trying to cry "Wolf" or proposing to do this to just push around some
code. I really think there are benefits to it, if not now, in the future.

And I know there are other areas that are broken or could be written better.
We all do what we can...I'm not real familiar with the workings of the cache,
indices, etc., but working on AIX has given me a great understanding of how
to make/load modules.

There, my spleen feels _much_ better now. :)

darrenk

#25The Hermit Hacker
scrappy@hub.org
In reply to: Noname (#24)
Re: [HACKERS] Data type removal

On Fri, 27 Mar 1998, Darren King wrote:

And I know there are other areas that are broken or could be written better.
We all do what we can...I'm not real familiar with the workings of the cache,
indices, etc., but working on AIX has given me a great understanding of how
to make/load modules.

This whole discussion has, IMHO, gone dry...Darren, if you can cleanly and
easily build a module for the ip_and_mac contrib types to use as a model
to work from, please do so...

I think the concept of modularization for types is a good idea, and agree
with your perspective that it *proves* our extensibility...

But, this has to be added in *perfectly* cleanly, such that there is no
extra work on anyone's part in order to make use of those types we already
have existing.

FreeBSD uses something called 'LKM's (loadable kernel modules) for doing
this in the kernel, and Linux does something similar, with the benefit
being, in most cases, that you can unload an older version and load in a
newer one relatively seamlessly...

Until a demo is produced as to how this can work, *please* kill this
thread...its gotten into a circular loop, and, quite frankly, isn't moving
anywhere...

if this is to be workable, the module has to be build when the system is
compiled, initially, for the base types, and installed into a directory
that the server can see and load from...the *base* modules have to be
transparent to the end user/adminstrator...

#26Noname
dg@illustra.com
In reply to: Noname (#24)
Re: [HACKERS] Data type removal

Darren writes:

If any restructuring happens which removes, or makes optional, some of
the fundamental types, it should be accomplished so that the types can
be added in transparently, from a single set of source code, during
build time or after. OIDs would have to be assigned, presumably, and the
hardcoding of the function lookups for builtin types must somehow be
done incrementally. Probably needs more than this to be done right, and
without careful planning and implementation we will be taking a big step
backwards.

Exactly. Right now modules get installed by building the .so files and
then creating all the types, functions, rules, tables, indexes etc. This
is a bit more complicated than the Linux kernal 'insmod' operation. We could
easily make the situation worse through careless "whacking".

Geez, Louise. What I'm proposing will _SHOWCASE_ the extensibility. I'm not
looking to remove it and hardcode everything.

Apparently I was not clear. I am sorry that you find my comment upsetting, it
was not meant to be.

What I meant is that adding a type and related functions to a database is a
much more complicated job than loading a module into a Unix kernel.

To load a module into a kernel all you need to do is read the code in,
resolve the symbols, and maybe call an intialization routine. This is
merely a variation on loading a shared object (.so) file into a program.

To add a type and related stuff to a database is really a much harder problem.

You need to be able to
- add one or more type descriptions types table
- add input and output functions types, functions tables
- add cast functions casts, functions tables
- add any datatype specific behavior functions functions table
- add access method operators (maybe) amops, functions tables
- add aggregate operators aggregates, functions
- add operators operators, functions
- provide statistics functions
- provide destroy operators
- provide .so files for C functions, SQL for sql functions
(note this is the part needed for a unix kernel module)
- do all the above within a particular schema

You may also need to create and populate data tables, rules, defaults, etc
required by the implementation of the new type.

And of course, a "module" may really implement dozens of types and hundreds
of functions.

To unload a type requires undoing all the above. But there is a wrinkle: first
you have to check if there are any dependancies. That is, if the user has
created a table with one of the new types, you have to drop that table
(including column defs, indexes, rules, triggers, defaults etc) before
you can drop the type. Of course the user may not want to drop their tables
which brings us to the the next problem.

When this gets really hard is when it is time to upgrade an existing database
to a new version. Suppose you add a new column to a type in the new version.
How does a user with lots of data in dozens of tables using the old type
install the new module?

What about restoring a dump from an old version into a system with the new
version installed?

Or how about migrating to a different platform? Can we move data from
a little endian platform (x86) to a big endian platform (sparc)? Obviously
the .so files will be different, but what about the copying the data out and
reloading it?

This is really the same problem as "schema evolution" which is (in the general
case) an open research topic.

I realize there might not be a performance hit _now_, but if someone doesn't
start this "loadable module" initiative, every Tom, Dick and Harry will want
their types in the backend and eventually there _will_ be a performance hit.
Then the problem would be big enough to be a major chore to convert the many,
many types to loadable instead of only doing a couple now.

I agree that we might want to work on making installing new functionality
easier. I have no objection to this, I just don't want to see the problem
approached without some understanding of the real issues.

I'm not trying to cry "Wolf" or proposing to do this to just push around some

code. I really think there are benefits to it, if not now, in the future.

And I know there are other areas that are broken or could be written better.
We all do what we can...I'm not real familiar with the workings of the cache,
indices, etc., but working on AIX has given me a great understanding of how
to make/load modules.

Just to belabor this, it is perfectly reasonable to add a set of types and
functions that have no 'C' implementation. The 'loadable module' analogy
misses a lot of the real requirements.

There, my spleen feels _much_ better now. :)

It looks better too... ;-)

darrenk

-dg

David Gould dg@illustra.com 510.628.3783 or 510.305.9468
Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612
- Linux. Not because it is free. Because it is better.

#27Mattias Kregert
matti@algonet.se
In reply to: Noname (#26)
Modules

David Gould wrote:

To load a module into a kernel all you need to do is read the code in,
resolve the symbols, and maybe call an intialization routine. This is
merely a variation on loading a shared object (.so) file into a program.

To add a type and related stuff to a database is really a much harder problem.

I don't agree.

You need to be able to
- add one or more type descriptions types table
- add input and output functions types, functions tables
- add cast functions casts, functions tables
- add any datatype specific behavior functions functions table
- add access method operators (maybe) amops, functions tables
- add aggregate operators aggregates, functions
- add operators operators, functions
- provide statistics functions
- provide destroy operators
- provide .so files for C functions, SQL for sql functions
(note this is the part needed for a unix kernel module)
- do all the above within a particular schema

You may also need to create and populate data tables, rules, defaults, etc
required by the implementation of the new type.

All this would be done by the init function in the module you load.
What we need is a set of functions callable by modules, like
module_register_type(name, descr, func*, textin*, textout*, whatever
...)
module_register_smgr(name, descr, .....)
module_register_command(....
Casts would be done by converting to a common format (text) and then to
the desired type. Use textin/textout. No special cast functions would
have to exist. Why doesn't it work this way already??? Would not that
solve all casting problems?

To unload a type requires undoing all the above. But there is a wrinkle: first
you have to check if there are any dependancies. That is, if the user has
created a table with one of the new types, you have to drop that table
(including column defs, indexes, rules, triggers, defaults etc) before
you can drop the type. Of course the user may not want to drop their tables
which brings us to the the next problem.

Dependencies are checked by the OS kernel when you try to unload
modules.
You cannot unload slhc without first unloading ppp, for example. What's
the
difference?
If you have Mod4X running with /dev/dsp opened, then you can't unload
the sound driver, because it is in use, and you cannot unload a.out
module
if you have a non-ELF program running, and you can see the refcount on
all
modules and so on... This would not be different in a SQL server.
If you have a cursor open, accessing IP types, then you cannot unload
the IP-types module. Close the cursor, and you can unload the module if
you want to.
You don't have to drop tables containing new types just because you
unload
the module. If you want to SELECT from it, then that module would be
loaded
automagically when it is needed.

When this gets really hard is when it is time to upgrade an existing database
to a new version. Suppose you add a new column to a type in the new version.
How does a user with lots of data in dozens of tables using the old type
install the new module?

What about restoring a dump from an old version into a system with the new
version installed?

Suppose you change TIMESTAMP to 64 bits time and 16 bits userid... how
do you
solve that problem? You would probably have to make the textin/textout
functions
for the type recognize the old format and make the appropriate
conversions.
Perhaps add zero userid, or default to postmaster userid?
This would not be any different if TIMESTAMP was in a separate module.

For the internal storage format, every type could have it's own way
of recognizing different versions of the data. For example, say you have
an IPv4 module and inserts millions of IP-addresses, then you upgrade
to IPv6 module. It would then be able to look at the data and see if
it is a IPv4 or IPv6 address. Of course, you would have problems if you
tried to downgrade and had lots of IPv6 addresses inserted.
MyOwnType could use the first few bits of the data to decide which
version it is, and later releases of MyOwnType-module would be able
to recognize the older formats.
This way, types could be upgraded without dump-and-load procedure.

Or how about migrating to a different platform? Can we move data from
a little endian platform (x86) to a big endian platform (sparc)? Obviously
the .so files will be different, but what about the copying the data out and
reloading it?

Is this a problem right now? Dump and reload, how can it fail?

Just to belabor this, it is perfectly reasonable to add a set of types and
functions that have no 'C' implementation. The 'loadable module' analogy
misses a lot of the real requirements.

Why would someone want a type without implementation?
Ok, let the module's init function register a type marked as
"non-existant"? Null
function?

/* m */

#28Noname
dg@illustra.com
In reply to: Mattias Kregert (#27)
Re: [HACKERS] Modules

Mattias Kregert writes:

David Gould wrote:

...

You need to be able to
- add one or more type descriptions types table

[big list of stuff to do deleted ]

- do all the above within a particular schema

You may also need to create and populate data tables, rules, defaults, etc
required by the implementation of the new type.

All this would be done by the init function in the module you load.
What we need is a set of functions callable by modules, like
module_register_type(name, descr, func*, textin*, textout*, whatever
...)
module_register_smgr(name, descr, .....)
module_register_command(....

Ok, now you are requiring the module to handle all this in C. How does it
register a type, a default, a rule, a column, functions, etc?

Having thought about that, consider that currently all this can already be
done using SQL. postgreSQL is a relational database system. One of the
prime attributes of relational systems is that they are reflexive. That is,
you can use SQL to query and update the system catalogs that define the
characteristics of the the system.

By and large, all the tasks I mentioned previously can be done using SQL and
taking advantage of the high semantic level and power of the complete SQL
system. Given that we already have a perfectly good high level mechanism, I
just don't see any advantage to adding a bunch of low level APIs to
duplicate existing functionality.

Casts would be done by converting to a common format (text) and then to
the desired type. Use textin/textout. No special cast functions would
have to exist. Why doesn't it work this way already??? Would not that
solve all casting problems?

No. It is usable in some cases as an implementation of a defined cast, but
you still need defined casts. Think about these problems:

First, there is a significant performance penalty. Think about a query
like:

select * from huge_table where account_balance > 1000.

The textout -> textin approach would be far slower than the current direct
int to float cast.

Second, how do you restrict the system to sensible casts or enforce a
meaningful order of attempted casts.

create type yen based float;
create type centigrade based float;

Would you allow?

select yen * centigrade from market_data, weather_data
where market_data.date = weather_data.date;

Even though the types 'yen' and 'centigrade' are implemented by float this
leaves open a few important questions:

- what is the type of the result?
- what could the result possibly mean?

Third you still can't do casts for many types:

create type motion_picture (arrayof jpeg) ...

select motion_picture * 10 from films...

There is no useful cast possible here.

To unload a type requires undoing all the above. But there is a wrinkle: first
you have to check if there are any dependancies. That is, if the user has
created a table with one of the new types, you have to drop that table
(including column defs, indexes, rules, triggers, defaults etc) before
you can drop the type. Of course the user may not want to drop their tables
which brings us to the the next problem.

Dependencies are checked by the OS kernel when you try to unload
modules.
You cannot unload slhc without first unloading ppp, for example. What's
the
difference?

I could have several million objects that might use that type. I cannot
do anything with them without the type definition. Not even delete them.

If you have Mod4X running with /dev/dsp opened, then you can't unload
the sound driver, because it is in use, and you cannot unload a.out
module
modules and so on... This would not be different in a SQL server.

But it is very different. SQL servers are much more complex than OS kernels.
Having spent a number of years maintaining the OS kernel in a SQL engine
that was originally intended to run on bare hardware, I can tell you that
that kernel was less than 10% of the complete SQL engine.

If you have a cursor open, accessing IP types, then you cannot unload
the IP-types module. Close the cursor, and you can unload the module if
you want to.
You don't have to drop tables containing new types just because you
unload
the module. If you want to SELECT from it, then that module would be
loaded
automagically when it is needed.

Ahha, I start to understand. You are saying 'module' and meaning 'loadable
object file of functions'. Given that this is what you mean, we already
handle this.

What I took you to mean by 'module' was the SCHEMA defined to make the
functions useful, and the functions.

Just to belabor this, it is perfectly reasonable to add a set of types and
functions that have no 'C' implementation. The 'loadable module' analogy
misses a lot of the real requirements.

Why would someone want a type without implementation?

Why should a type with no C functions fail to have an implementation? Right
now every table is also a type.

Many types are based on a extending an existing type, or are composites. Is
there some reason not to define the implementation (if any) in SQL?

--

I understand that modularity is good. I am asserting the postgreSQL is a
very modular and extendable system right now. There are mechanisms to add
just about any sort of extension you want. Very little is hard coded in
the core.

I think this discussion got started because someone wanted to remove the
ip and mac and money types. This is a mere matter of the current packaging,
as there is no reason to for them to be in or out except that historically
the system used some of these types before the extendibility was finished
so they went in the core code.

I don't think it matters much whether any particular type is part of the
core or not, so feel free to pull them out. Do package them up to
install in the normal way that extensions are supposed to install and
retest everything. Don't _add_ a whole new way to do the same kinds
of extensibilty that we _already_ do. Just use the mechanisms that already
exist.

This discussion has gone on too long as others are starting to point out,
so I am happy to take it to private mail if you wish to continue.

-dg

David Gould dg@illustra.com 510.628.3783 or 510.305.9468
Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612
"Of course, someone who knows more about this will correct me if I'm wrong,
and someone who knows less will correct me if I'm right."
--David Palmer (palmer@tybalt.caltech.edu)

#29Thomas G. Lockhart
lockhart@alumni.caltech.edu
In reply to: Noname (#26)
Re: [HACKERS] Modules

All this would be done by the init function in the module you load.
What we need is a set of functions callable by modules, like
module_register_type(name, descr, func*, textin*, textout*, whatever
...)
module_register_smgr(name, descr, .....)
module_register_command(....
Casts would be done by converting to a common format (text) and then to
the desired type. Use textin/textout. No special cast functions would
have to exist. Why doesn't it work this way already??? Would not that
solve all casting problems?

It does work this way already, at least in some cases. It definitely
does not solve casting problems, for several reasons, two of which are:
- textout->textin is inefficient compared to binary conversions.
- type conversion also may require value and format manipulation far
beyond what you would accept for an input function for a specific type.
That is, to convert a type to another type may require a conversion
which would not be acceptable in any case other than a conversion from
that specific type to the target type. You need to call a specialized
routine to do this.

Dependencies are checked by the OS kernel when you try to unload
modules. You cannot unload slhc without first unloading ppp, for
example. What's the difference?

Granularity. If, for example, we had a package of 6 types and 250
functions, you would need to check each of these for dependencies. David
was just pointing out that it isn't as easy, not that it is impossible.
I thought his list of issues was fairly complete, and any solution would
address these somehow...

- Tom