Would like to contribute a section to docs for 9.3. Where to start?

Started by Chris Traversover 13 years ago13 messagesdocs
Jump to latest
#1Chris Travers
chris.travers@gmail.com

Hi all;

I would like to contribute a "What is an Object Relational database?"
section to the documentation for 9.3. Where is the best place to start
tools and community-process-wise?

My thinking is that since people are often confused by this label, it would
be worth describing what it means, and describing in brief detail
object-relational features in PostgreSQL.

My thinking is to cover the following features briefly:

* Table inheritance
* Type Extensibility
* Tuples as Types, casting tuples to various other types.

I am thinking of skipping over things that may be seen as misfeatures, such
as class.function syntax although this could be useful in the case of
simulating calculated fields. What do people think there? Is this a
feature or a misfeature?

Best Wishes,
Chris Travers

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Chris Travers (#1)
Re: Would like to contribute a section to docs for 9.3. Where to start?

Chris Travers <chris.travers@gmail.com> writes:

I would like to contribute a "What is an Object Relational database?"
section to the documentation for 9.3. Where is the best place to start
tools and community-process-wise?

My thinking is that since people are often confused by this label, it would
be worth describing what it means, and describing in brief detail
object-relational features in PostgreSQL.

I think there's a discussion that has to happen before that one, which
is whether we should continue pushing that term for Postgres. It was
originally applied by the Berkeley guys, well over twenty years ago, to
code that didn't even speak the same language as now (PostQUEL vs SQL).
So it's fair to ask whether the vision of the project is still the same
as then. Simon for one thinks differently:
http://database-explorer.blogspot.com/2012/08/postgresql-multi-model-database-server.html

My thinking is to cover the following features briefly:

* Table inheritance
* Type Extensibility
* Tuples as Types, casting tuples to various other types.

I think PG's type extensibility features come out of the
abstract-data-type culture more than than the object culture. In
particular, PG data types generally don't have any notion of "IsA"
subclass relationships, though the rowtypes of inherited tables do have
that. (Well, I guess you could claim that a domain IsA subclass of its
base type, but SQL's domain feature is so impoverished that any object
hacker would laugh at you.)

So really the argument for calling PG object-relational comes down to
table inheritance and the IsA relationship between tuples of inherited
tables. Which is something I think few people even use anymore ...
it definitely doesn't seem like a key selling point.

I am thinking of skipping over things that may be seen as misfeatures, such
as class.function syntax although this could be useful in the case of
simulating calculated fields.

Agreed, that's not a major feature; it's just a notational detail that
people have got varying opinions about.

regards, tom lane

#3Chris Travers
chris.travers@gmail.com
In reply to: Tom Lane (#2)
Re: Would like to contribute a section to docs for 9.3. Where to start?

On Mon, Aug 13, 2012 at 7:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Chris Travers <chris.travers@gmail.com> writes:

I would like to contribute a "What is an Object Relational database?"
section to the documentation for 9.3. Where is the best place to start
tools and community-process-wise?

My thinking is that since people are often confused by this label, it

would

be worth describing what it means, and describing in brief detail
object-relational features in PostgreSQL.

I think there's a discussion that has to happen before that one, which
is whether we should continue pushing that term for Postgres. It was
originally applied by the Berkeley guys, well over twenty years ago, to
code that didn't even speak the same language as now (PostQUEL vs SQL).
So it's fair to ask whether the vision of the project is still the same
as then. Simon for one thinks differently:

http://database-explorer.blogspot.com/2012/08/postgresql-multi-model-database-server.html

Agreed, and actually I came here after discussing this on -advocacy, and I
recognize that there is still some controversy but everyone seems to agree
that the way the term is currently used is confusing, and PostgreSQL
doesn't really resemble the wikipedia article on Object-Relational
databases.

However, it is hard to have a discussion regarding how to position
PostgreSQL if we don't have a bunch of good alternatives, so I think it
would still be worth offering even if the community ultimately decides to
move a different direction. And of course these are not mutually exclusive
either so if nothing else we have the ability of community members to
position the database in other ways.

My thinking is to cover the following features briefly:

* Table inheritance
* Type Extensibility
* Tuples as Types, casting tuples to various other types.

I think PG's type extensibility features come out of the
abstract-data-type culture more than than the object culture.

That's probably worth noting.

In
particular, PG data types generally don't have any notion of "IsA"
subclass relationships, though the rowtypes of inherited tables do have
that.

So I noticed. You can still do some sorts of inheritance, just like you
can do object-oriented programming in C....

(Well, I guess you could claim that a domain IsA subclass of its
base type, but SQL's domain feature is so impoverished that any object
hacker would laugh at you.)

So really the argument for calling PG object-relational comes down to
table inheritance and the IsA relationship between tuples of inherited
tables. Which is something I think few people even use anymore ...
it definitely doesn't seem like a key selling point.

I was looking at it differently, namely that there are a bunch of features
that you can use together to build O-R systems. The complex types may not
support inheritance, but with casts you can get some limited polymorphism.
Moreover the fact that relations are classes means that you can create
casts of tuples to other types. For example:

create table foo (
bar text,
baz int
); -- simple union type

insert into foo (bar, baz) values ('test', '1');

create function foo_to_int (foo) returns int as
$$ select $1.baz $$ language sql;

create cast (foo as int) with function foo_to_int(foo) as implicit;

select foo + 1 as value from foo;

value
-------
2
(1 row)

This is a trivial example and I would probably include it only by
description, but the point is that the combination of casts, functions, and
tables as classes allows you to create some degree of polymorphism. For
example we could take an employee table and add a name function that
concatenates the first and last name together according to some logic. We
could then index the output of that function for full text searching.

While you can't do inheritance easily with complex types, these can still
be used to create abstract interfaces and the use of explicit casts might
give you something like it though you'd have a fair bit of work to
implement such a system.

I am thinking of skipping over things that may be seen as misfeatures,

such

as class.function syntax although this could be useful in the case of
simulating calculated fields.

Agreed, that's not a major feature; it's just a notational detail that
people have got varying opinions about.

Heck, I have varying opinions about it and my opinion on this feature is
rather fluid at any given poitn. However, I am thinking that maybe
mentioning it up front would mean fewer people get taken by surprise by it.

Best Wishes,
Chris Travers

#4Jeff Davis
pgsql@j-davis.com
In reply to: Chris Travers (#1)
Re: Would like to contribute a section to docs for 9.3. Where to start?

On Mon, 2012-08-13 at 03:41 -0700, Chris Travers wrote:

Hi all;

I would like to contribute a "What is an Object Relational database?"
section to the documentation for 9.3. Where is the best place to
start tools and community-process-wise?

My thinking is that since people are often confused by this label, it
would be worth describing what it means, and describing in brief
detail object-relational features in PostgreSQL.

A concrete example of the confusion caused by our current branding (with
no supporting documentation) recently appeared on -novice:

http://archives.postgresql.org/message-id/21709bce-0308-4e6f-9e1c-b9dc95360fe4@googlegroups.com

That's exactly the kind of (potential) user we should target with this
document.

Regards,
Jeff Davis

#5Chris Travers
chris.travers@gmail.com
In reply to: Tom Lane (#2)
Re: Would like to contribute a section to docs for 9.3. Where to start?

As a note here, I think one of the fundamental difficulties in figuring out
how to position PostgreSQL (whether using Simon's multi-model idea or
Object-Relational, something else entirely, or some combination) is that
PostgreSQL is an extraordinarily competent and full-featured database
management system. I have a very rough draft of how I'd explain it I will
send here for some feedback in terms of general message and accuracy before
I look at adapting it as a patch against the docs.

However, while I was going through this and asking "how would I build
something utilizing object-oriented approaches in PostgreSQL?" I realized
how few of the features of this sort I was currently using. I have been
using PostgreSQL since 1999, and been seriously been trying to use advanced
features for six, and I realized I have barely begun to scratch the
surface. It's really refreshing to look at this and realize that even
after 12-13 years of becoming familiar with a piece of software, a little
exercise like this provides all sorts of features that would simplify your
life.

The fact is that what PostgreSQL really is, inside the box, is a
transactional development environment where operations occur in a
relational-native way and this is largely how I am approaching it.
Object-relational in terms of PostgreSQL seems to mean "relational along
with a bunch of tools useful for building object interfaces." I think a
lot of the multi-model features that Simon talks about can be understood in
these terms as well. If I was going to coin a term to call this, I would
call it a "Transactional/relational development environment." Just as you
can do object-oriented programming in C, PostgreSQL lets you do this in SQL.

Also in my tests, I found that inherited relations do not inherit casts.
Is this intentional? Is there a reason I should be putting into the
documentation? Or is it just a gotcha that should be listed as a caveat?

Best Wishes,
Chris Travers

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Chris Travers (#5)
Re: Would like to contribute a section to docs for 9.3. Where to start?

Chris Travers <chris.travers@gmail.com> writes:

Also in my tests, I found that inherited relations do not inherit casts.
Is this intentional?

Um, in what way exactly? It's true that we don't consider that a cast
from X to Y and a cast from Y to Z should automatically give you a cast
from X to Z, but that's not specific to inherited relations. Or did
you have something else in mind?

regards, tom lane

#7Chris Travers
chris.travers@gmail.com
In reply to: Chris Travers (#1)
Re: Would like to contribute a section to docs for 9.3. Where to start?

So here is a very rough draft. I would be interested in feedback as to
inaccuracies or omissions. I would like to get the technical side right
before going into an editorial phase.

Any feedback on the technical side?

Best Wishes,
Chris Travers

How is PostgreSQL "Object-Relational?"

The term Object-Relational has been applied to databases which attempt to
bridge the relational and object-oriented worlds with varying degrees of
success. Bridging this gap is typically seen as desirable because
object-oriented and relational models are very different paradigms and
programmers often do not want to switch between them. There are, however,
fundamental differences that make this a very hard thing to do well. The
best way to think of PostgreSQL in this way is as a relational database
management system with some object-oriented features.

PostgreSQL is a development platform in a box. It supports stored
procedures written in entirely procedural languages like PL/PGSQL or Perl
without loaded modules, and more object-oriented languages like Python or
Java, often through third party modules. To be sure you can't write a
graphical interface inside PostgreSQL, and it would not be a good idea to
write additional network servers, such as web servers, directly inside the
database. However the environment allows you to create sophisticated
interfaces for managing and transforming your data. Because it is a
platform in a box the various components need to be understood as different
and yet interoperable. In fact the primary concerns of object-oriented
programming are all supported by PostgreSQL, but this is done in a way that
is almost, but not quite, entirely unlike traditional object oriented
programming. For this reason the "object-relational" label tends to be a
frequent source of confusion.

Data storage in PostgreSQL is entirely relational, although this can be
degraded using types which are not atomic, such as arrays, XML, JSON, and
hstore. Before delving into object-oriented approaches, it is important to
master the relational model of databases. For the novice, this section is
therefore entirely informational. For the advanced developer, however, it
is hoped that it will prove inspirational.

In object-oriented terms, very relation is a class, but not every class is
a relation. Operations are performed on sets of objects (an object being a
row), and new row structures can be created ad-hoc. PostgreSQL is,
however, a strictly typed environment and so in many cases, polymorphism
requires some work.

Data Abstraction and Encapsulation in PostgreSQL

The relational model itself provides some tools for data abstraction and
encapsulation, and these features are taken to quite some length in
PostgreSQL. Taken together these are very powerful tools and allow for
things like calculated fields to be simulated in relations and even indexed
for high performance.

Views are the primary tool here. With views, you can create an API for
your data which is abstracted from the physical storage. Using the rules
system, you can redirect inserts, updates, and deletes from the view into
underlying relations, preferably using user defined functions. Being
relations, views are also classes.

A second important tool here is the ability to define what appear to be
calculated fields using stored procedures. If I create a table called
"employee" with three fields (first_name, middle_name, last_name) among
others, and create a function called "name" which accepts a single employee
argument and concatenates these together as "last_name, first_name
middle_name" then if I submit a query which says:

select e.name from employee e;

it will transform this into:

select name(e) from employee e;

This gives you a way to do calculated fields in PostgreSQL without
resorting to views. Note that these can be done on views as well because
views are relations. These are not real fields though. Without the
relation reference, it will not do the transformation (so SELECT name from
employee will not have the same effect).

Messaging and Class API's in PostgreSQL

A relation is a class. The class is accessed using SQL which defines a new
data structure in its output. This data structure unless defined elsewhere
in a relation or a complex type cannot have methods attached to it and
therefore can not be used with the class.method syntax described above.
There are exceptions to this rule, of course, but they are beyond the
scope of this introduction. In general it is safest to assume that the
output of one query, particularly one with named output fields, cannot
safely be used as the input to another.

A second messaging aparatus in PostgreSQL is the LISTEN/NOTIFY framework
which can be used along with triggers to issue notifications to other
processes when a transaction commits. This approach allows you to create
queue tables, use triggers to move data into these tables (creating
'objects' in the process) and then issuing a notification to another
process when the data commits and becomes visible. This allows for very
complex and and interactive environments to be built from modular pieces.

Polymorphism in PostgreSQL

PostgreSQL is very extensible in terms of all sorts of aspects of the
database. Not only can types be created and defined, but also operators
can be defined or overloaded.

A more important polymorphism feature is the ability to cast one data type
as another. Casts can be implicit or explicit. Implicit casts, which have
largely been removed from many areas of PostgreSQL, allow for PostgreSQL to
cast data types when necessary to find functions or operators that are
applicable. Implicit casting can be dangerous because it can lead to
unexpected behavior because minor errors can lead to unexpected results.
'2012-05-31' is not 2012-05-31. The latter is an integer expression that
reduces to 1976. If you create an implicit cast that turns an integer into
a date being the first of the year, the lack of quoting will insert
incorrect dates into your database without raising an error ('1976-01-01'
instead of the intended '2012-05-31'). Implicit casts can still have some
uses.

Inheritance in PostgreSQL

In PostgreSQL tables can inherit from other tables. Their methods are
inherited but their castes are not, nor are their indexes. This allows you
develop object inheritance hierarchies in PostgreSQL. Multiple inheritance
is possible.

Table inheritance is an advanced concept and has many gotchas. Please
refer to the proper sections of the manual for more on this topic. On the
whole it is probably best to work with table inheritance first in areas
where it is more typically used, such as table partitioning, and later look
at it in terms of object-relational capabilities.

Overall the best way to look at PostgreSQL as an object-relational database
is a database which provides very good relational capabilities plus some
advanced features that allows one do create object-relational systems on
top of it. These systems can then move freely between object-oriented and
relational worldviews but are still more relational than object-oriented.
At any rate they bear little resemblance to object-oriented programming
environments today. With PostgreSQL this is very much a toolkit approach
for object-relational databases building on a solid relational foundation.
This means that these are advanced functions which are powerful in the
hands of experienced architects, but may be skipped over at first.

#8Peter Eisentraut
peter_e@gmx.net
In reply to: Chris Travers (#7)
Re: Would like to contribute a section to docs for 9.3. Where to start?

On 8/15/12 5:33 AM, Chris Travers wrote:

So here is a very rough draft. I would be interested in feedback as to
inaccuracies or omissions. I would like to get the technical side right
before going into an editorial phase.

Any feedback on the technical side?

[citation needed]

Seriously, if we are trying to justify our use of seemingly standard
academic terms, we should have some references to where those are
defined or at least discussed. Otherwise we are just begging the
question: PostgreSQL is object-relational because we say so.

#9Chris Travers
chris.travers@gmail.com
In reply to: Peter Eisentraut (#8)
Re: Would like to contribute a section to docs for 9.3. Where to start?

On Fri, Aug 17, 2012 at 1:03 PM, Peter Eisentraut <peter_e@gmx.net> wrote:

On 8/15/12 5:33 AM, Chris Travers wrote:

So here is a very rough draft. I would be interested in feedback as to
inaccuracies or omissions. I would like to get the technical side right
before going into an editorial phase.

Any feedback on the technical side?

[citation needed]

Seriously, if we are trying to justify our use of seemingly standard
academic terms, we should have some references to where those are defined
or at least discussed. Otherwise we are just begging the question:
PostgreSQL is object-relational because we say so.

Good point.

I found two interesting resources quickly which seem on point:

http://infolab.usc.edu/csci585/Spring2010/den_ar/ordb.pdf which appears to
be chapter 1 of
http://www.amazon.com/Object-Relational-Database-Development-Plumbers-CD-ROM/dp/0130194603

and

http://db.cs.berkeley.edu/papers/Informix/www.informix.com/informix/corpinfo/zines/whitpprs/illuswp/wave.htm

But this doesn't really get us beyond the "because we say so" given the
connection between Informix and PostgreSQL.

It really looks to me like Postges was given the name Object-Relational by
Stonebreaker as a way of saying "here's what I am trying to play around
with" and the databases which describe themselves in these terms seem
either inspired by or forks of Postgres ;-).

Best Wishes,
Chris Travers

#10Jeff Davis
pgsql@j-davis.com
In reply to: Peter Eisentraut (#8)
Re: Would like to contribute a section to docs for 9.3. Where to start?

On Fri, 2012-08-17 at 16:03 -0400, Peter Eisentraut wrote:

On 8/15/12 5:33 AM, Chris Travers wrote:

So here is a very rough draft. I would be interested in feedback as to
inaccuracies or omissions. I would like to get the technical side right
before going into an editorial phase.

Any feedback on the technical side?

[citation needed]

Seriously, if we are trying to justify our use of seemingly standard
academic terms, we should have some references to where those are
defined or at least discussed. Otherwise we are just begging the
question: PostgreSQL is object-relational because we say so.

I feel like the bar is becoming pretty high for this document. It must:

1. Settle on an accepted criteria for ORDBMS
2. Describe how postgres meets that criteria in a way that's:
a. compelling to users
b. connects with OOP so the users don't feel like it's a
bait-and-switch or get confused by starting with the
wrong expectation

I feel like making #1 compatible with 2(a) requires some creativity; and
#1 might be incompatible with 2(b) entirely.

Regards,
Jeff Davis

#11Chris Travers
chris.travers@gmail.com
In reply to: Jeff Davis (#10)
Re: Would like to contribute a section to docs for 9.3. Where to start?

On Sat, Aug 18, 2012 at 12:12 PM, Jeff Davis <pgsql@j-davis.com> wrote:

On Fri, 2012-08-17 at 16:03 -0400, Peter Eisentraut wrote:

On 8/15/12 5:33 AM, Chris Travers wrote:

So here is a very rough draft. I would be interested in feedback as to
inaccuracies or omissions. I would like to get the technical side

right

before going into an editorial phase.

Any feedback on the technical side?

[citation needed]

Seriously, if we are trying to justify our use of seemingly standard
academic terms, we should have some references to where those are
defined or at least discussed. Otherwise we are just begging the
question: PostgreSQL is object-relational because we say so.

I feel like the bar is becoming pretty high for this document. It must:

1. Settle on an accepted criteria for ORDBMS

Probably Mike Stonebreaker's paper can be referred to here. Also it looks
like Oracle used to have a document describing "object-relational" features
in Oracle 10. Reading through other people's views, I think Oracle might
actually be ahead of us here, but... The problem here is relatively complex
and I am afraid if I go and re-iterate everything I will end up with
another book >:-D

Not that this would be a bad thing. I did find Oracle's somewhat short
book (of 200 pages) on the subject at
http://docs.oracle.com/cd/B19306_01/appdev.102/b14260.pdf

However if I am doing a book by myself I am either going to publish it or
release it myself. A document of that scope is a little wider-range than I
would like to just hand off to the community.

I think it will be worth pointing out that Oracle is an ORDBMS as well and
is really the major non-Pg-descended ORDBMS I can find on the market today.

2. Describe how postgres meets that criteria in a way that's:
a. compelling to users
b. connects with OOP so the users don't feel like it's a
bait-and-switch or get confused by starting with the
wrong expectation

I feel like making #1 compatible with 2(a) requires some creativity; and
#1 might be incompatible with 2(b) entirely.

The more I work with this and am trying to figure out how to apply these in
my own work the more I am convinced that this does connect with OOP just,
as I said, in a way that is almost but not entirely unlike normal OOP.

The way I would describe it in simple terms is that a standard RDBMS
operates on sets of tuples. An ORDBMS operates on sets of objects. Those
objects may have methods, may be polymorphic, and may be encapsulated
behind interfaces. As Stonebreaker said in his paper, this is a marriage
between the set-oriented relational database and the primitives of object
oriented programming. Consequently the way to look at it is that you have
a relational database with object oriented features which makes this sort
of operation possible (and that is, as best as I can see, how Oracle
actually positions their product as well).

But more to the point, what do people think would be a valuable role for
this document? I was thinking initially of a *brief* description of what
was meant so that people didn't get too confused. Maybe it would be better
to save the brief description for later and write a longer document first
that could be incorporated into the brief document by reference? Maybe a
book entitled "Object-Relational Programming in PostgreSQL" since this is
something I have started to delve deeply into for LedgerSMB. Maybe by that
point we can figure out whether we are pushing Object-Relational features
as a subset of a multi-model approach or vice versa. Indeed ontologically
speaking, I am not sure what the difference between multi-model and
object-relational is since Simon seems to think that object-relational is a
subset of multi-model and I think multi-model is a feature of
object-relational ;-). This being said, of course there may be marketing
reasons to push one or the other as a primary term.

Best wishes,
Chris Travers

#12Jeff Davis
pgsql@j-davis.com
In reply to: Chris Travers (#11)
Re: Would like to contribute a section to docs for 9.3. Where to start?

On Sat, 2012-08-18 at 18:56 -0700, Chris Travers wrote:

I was thinking initially of a *brief* description of what was meant so
that people didn't get too confused.

+1.

I was imagining two contrasting examples, one using the relational
mindset and one using the O-R mindset.

Inheritance and composite types are easy enough to understand. Perhaps
those could make for a short example on the O-R side while still looking
different enough from a traditional relational approach.

Regards,
Jeff Davis

#13Chris Travers
chris.travers@gmail.com
In reply to: Jeff Davis (#12)
Re: Would like to contribute a section to docs for 9.3. Where to start?

Given the (generally helpful) feedback here what I have decided to do for
now is to run a blog series developing the use cases for O-R functionality
in PostgreSQL, and return after that with a new draft, possibly distilled
from that.

I will post again, perhaps, in a couple months.

Best Wishes,
Chris Travers