generic modelling of data models; enforcing constraints dynamically...

Started by InterRobover 16 years ago31 messagesgeneral

rob.marjot@gmail.com

over 16 years ago

Dear List,
I am trying to implement the following:

In a database I wish to implement a GENERIC datamodel, thus on a meta-level.
All RELATIONS (part of a MODEL) will be a view on some base (being a table)
JOINed with (an) extra column(s). Thus, this view consists of a number of
FIELDS. I whish to make this view editable (INSERT, UPDATE) using the RULE
system. Some constraints will apply; enforcing these is the problem I am
trying to solve by modeling these rules with a table "RELATION_CONSTRAINTS"
(see below).

Tables:

BASE(col1, col2, col3)

MODELS(name)

RELATIONS(modelname, name)

FIELDS(modelname, relation_name, name, datatype)

RELATION_CONSTRAINTS(modelname, relation_name, constraint_name,
constraining_expression)

I was thinking of implementing this using a FUNCTION that takes a
polymorphic record parameter (and the relation name); then checking this
record against the applicable constraint expression.
This whole idea may sound like a DBMS-in-a-DBMS kind of thing... What I am
trying is to write as little as table/view-specific code as would be
necessary, while still collecting all base data in one central table...

All suggestions are very much appreciated,
regards,

Rob

Ben

bench@silentmedia.com

over 16 years ago

In reply to: InterRob (#1)

Re: generic modelling of data models; enforcing constraints dynamically...

InterRob wrote:

Dear List,

I am trying to implement the following:

[snip]

All suggestions are very much appreciated,
regards,

Rob

It's not clear to me what you're asking, but I suspect the suggestion
you need is the same as if you had asked how to best implement an
Entity-Attribute-Value scheme: don't do it. Why it may be possible,
performance is going to go into the toilet, constraints are going to be
difficult to enforce, and maintenance will be difficult at best.
Spending the effort upfront to define a schema will have drastic
long-term payoffs. It can be tempting to believe an application can
define the appropriate schema for itself at runtime if you just give it
a big enough sandbox, but this rarely works out well.

Sam Mason

sam@samason.me.uk

over 16 years ago

In reply to: InterRob (#1)

Re: generic modelling of data models; enforcing constraints dynamically...

On Thu, Sep 24, 2009 at 06:28:28PM +0200, InterRob wrote:

I am trying to implement the following:

In a database I wish to implement a GENERIC datamodel, thus on a meta-level.

Sounds like you're describing an EAV design:

http://en.wikipedia.org/wiki/Entity-attribute-value_model

Designs like this tend to result in you getting very little support
from the database and get awkward as they grow. If your problem really
is suited to this then go for it, but surprisingly few actually are.
I'd highly recommend using a more traditional design until you've been
through at least one big revision and then you'll know whether EAV
really fits.

--
Sam http://samason.me.uk/

Rob Marjot

rob@marjot-multisoft.com

over 16 years ago

In reply to: Ben (#2)

Re: generic modelling of data models; enforcing constraints dynamically...

Thank you, Ben. Well, I'm afraid you got the basic idea... I intend to
implement a hybrid between a fixed schema and an Entity-Attribute-Value
scheme. The schema will be able to cover 90% of the data needs; in other
cases (specific projects) additional fields (and/or tables/relations) will
be needed; including their constraints...
I'm experienting now with some smart thought that just came up: passing a
set of key/value pairs to function that will test the new row; on insert /
update the following could then be checked (as part of a RULE-set):

SELECT doesComply('relationname', keyValues.*) FROM (VALUES('col1',
CAST(col1 AS TEXT)), VALUES('col2', CAST(col2 AS TEXT))) AS
keyValues(the_key, the_value);

The function "doesComply()" will then process the CONSTRAINTS table and
raise an Error if the new / updated row does not fit...

Any thoughts?

Rob

2009/9/24 Ben Chobot <bench@silentmedia.com>

Show quoted text

InterRob wrote:

Dear List,

I am trying to implement the following:

[snip]

All suggestions are very much appreciated,
regards,

Rob

It's not clear to me what you're asking, but I suspect the suggestion you
need is the same as if you had asked how to best implement an
Entity-Attribute-Value scheme: don't do it. Why it may be possible,
performance is going to go into the toilet, constraints are going to be
difficult to enforce, and maintenance will be difficult at best. Spending
the effort upfront to define a schema will have drastic long-term payoffs.
It can be tempting to believe an application can define the appropriate
schema for itself at runtime if you just give it a big enough sandbox, but
this rarely works out well.

Ben

bench@silentmedia.com

over 16 years ago

In reply to: Rob Marjot (#4)

Re: generic modelling of data models; enforcing constraints dynamically...

Rob Marjot wrote:

Thank you, Ben. Well, I'm afraid you got the basic idea... I intend to
implement a hybrid between a fixed schema and an
Entity-Attribute-Value scheme. The schema will be able to cover 90% of
the data needs; in other cases (specific projects) additional fields
(and/or tables/relations) will be needed; including their constraints...

If you absolutely must have a dynamic schema like this, and can't have a
DBA simply add tables as needed, then I think it would be less work,
overall, to create a schema that your application has DDL rights to, and
then let it create and modify normal tables with normal constraints there.

There certainly are some cases where an EAV solution is the proper one,
and yours may be one of them. But most aren't.

Sam Mason

sam@samason.me.uk

over 16 years ago

In reply to: Rob Marjot (#4)

Re: generic modelling of data models; enforcing constraints dynamically...

On Thu, Sep 24, 2009 at 09:23:35PM +0200, Rob Marjot wrote:

SELECT doesComply('relationname', keyValues.*) FROM (VALUES('col1',
CAST(col1 AS TEXT)), VALUES('col2', CAST(col2 AS TEXT))) AS
keyValues(the_key, the_value);

The function "doesComply()" will then process the CONSTRAINTS table and
raise an Error if the new / updated row does not fit...

I'd have a set of doesComply functions, the first two parameters
as you have them but overload a set to support different datatypes
specifically. Something like:

CREATE FUNCTION doesComply(_rel TEXT, _key TEXT, _val INT) ...
CREATE FUNCTION doesComply(_rel TEXT, _key TEXT, _val DATE) ...
CREATE FUNCTION doesComply(_rel TEXT, _key TEXT, _val TEXT) ...
CREATE FUNCTION doesComply(_rel TEXT, _key TEXT, _val NUMERIC) ...

And then have a set of "attribute" tables (one for each datatype) to
store the actual values in. At least PG can do some type checking for
you that way. Either that, or just leave them all as text to text
mappings in the database and only attempt to type things out in the
client code.

Not sure why you're doing the VALUES contortions as well, why not just:

SELECT doesComply('relationname', 'col1', col2);

--
Sam http://samason.me.uk/

InterRob

rob.marjot@gmail.com

over 16 years ago

In reply to: Ben (#5)

Re: generic modelling of data models; enforcing constraints dynamically...

SELECT doesComply('relationname', keyValues.*) FROM (VALUES('col1',
CAST(col1 AS TEXT)), VALUES('col2', CAST(col2 AS TEXT))) AS
keyValues(the_key, the_value);

The function "doesComply()" will then process the CONSTRAINTS table and
raise an Error if the new / updated row does not fit...

Any thoughts?

Rob

2009/9/24 Ben Chobot <bench@silentmedia.com>

Show quoted text

Rob Marjot wrote:

Thank you, Ben. Well, I'm afraid you got the basic idea... I intend to
implement a hybrid between a fixed schema and an Entity-Attribute-Value
scheme. The schema will be able to cover 90% of the data needs; in other
cases (specific projects) additional fields (and/or tables/relations) will
be needed; including their constraints...

If you absolutely must have a dynamic schema like this, and can't have a
DBA simply add tables as needed, then I think it would be less work,
overall, to create a schema that your application has DDL rights to, and
then let it create and modify normal tables with normal constraints there.

There certainly are some cases where an EAV solution is the proper one, and
yours may be one of them. But most aren't.

InterRob

rob.marjot@gmail.com

over 16 years ago

In reply to: Sam Mason (#6)

Re: generic modelling of data models; enforcing constraints dynamically...

Sam, Thanks for thinking along.
The thing is that a SINGLE constraint might apply to MULTIPLE fields;
therefore it seems best to build a set of key/value pairs... Multiple
doesComply()s won't do the job :(

BY THE WAY:
I came to think of another option: putting additional columns (that is:
addittional to the default set of fields) in xml, in a column that is part
of row (=object) it belongs to.
Any body has done so before? Any body has experience with XML schema
validation within PostgreSQL?

Cheerz,

Rob

2009/9/24 Sam Mason <sam@samason.me.uk>

Show quoted text

On Thu, Sep 24, 2009 at 09:23:35PM +0200, Rob Marjot wrote:

SELECT doesComply('relationname', keyValues.*) FROM (VALUES('col1',
CAST(col1 AS TEXT)), VALUES('col2', CAST(col2 AS TEXT))) AS
keyValues(the_key, the_value);

The function "doesComply()" will then process the CONSTRAINTS table and
raise an Error if the new / updated row does not fit...

I'd have a set of doesComply functions, the first two parameters
as you have them but overload a set to support different datatypes
specifically. Something like:

CREATE FUNCTION doesComply(_rel TEXT, _key TEXT, _val INT) ...
CREATE FUNCTION doesComply(_rel TEXT, _key TEXT, _val DATE) ...
CREATE FUNCTION doesComply(_rel TEXT, _key TEXT, _val TEXT) ...
CREATE FUNCTION doesComply(_rel TEXT, _key TEXT, _val NUMERIC) ...

And then have a set of "attribute" tables (one for each datatype) to
store the actual values in. At least PG can do some type checking for
you that way. Either that, or just leave them all as text to text
mappings in the database and only attempt to type things out in the
client code.

Not sure why you're doing the VALUES contortions as well, why not just:

SELECT doesComply('relationname', 'col1', col2);

?

--
Sam http://samason.me.uk/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Sam Mason

sam@samason.me.uk

over 16 years ago

In reply to: InterRob (#8)

Re: generic modelling of data models; enforcing constraints dynamically...

On Thu, Sep 24, 2009 at 10:33:37PM +0200, InterRob wrote:

I came to think of another option: putting additional columns (that is:
addittional to the default set of fields) in xml, in a column that is part
of row (=object) it belongs to.
Any body has done so before? Any body has experience with XML schema
validation within PostgreSQL?

Sorry; but was sounding a little over engineered before, it seems to be
blowing out of proportion now. By whom and how are these (immensely
complicated) rule sets going to be maintained? how is using XML going to
make this any easier than using the tools native to the database?

If they're validated inside the database then it's going to be done by a
DB admin anyway, or am I missing something? If they're done by the DB
admin, isn't it easy to just use the tools they're used to?

--
Sam http://samason.me.uk/

#10

InterRob

rob.marjot@gmail.com

over 16 years ago

In reply to: Sam Mason (#9)

Re: generic modelling of data models; enforcing constraints dynamically...

I guess it IS quite overengineered indeed...
What I'm trying to do is to facilitate different fieldwork methodologies for
archaeological research (on project basis); there is no final agreement on
data structure and semantics; however, on a meta-level all choices are
rational and can be modelled... Infact, all models can be related to each
other: that's where the "hybrid" part comes in: I wish to implement the
common denominator (90%) and then further extend this, enabing specific data
model implementations -- including checks for data integrity.

As soon as that works, it becomes possible to record changes at row-level --
providing access to data-snapshots in time. Furthermore, it becomes possible
to build upon this central database automated tools for management and
filing of information and different modes of data entry (including
webbased)...

The thing is: altering table structures (changes should be limited to adding
columns) is required on a ad hoc basis and End User should be able to do
so... I guess that requires some over engineering... ?

Rob

2009/9/24 Sam Mason <sam@samason.me.uk>

Show quoted text

On Thu, Sep 24, 2009 at 10:33:37PM +0200, InterRob wrote:

I came to think of another option: putting additional columns (that is:
addittional to the default set of fields) in xml, in a column that is

part

of row (=object) it belongs to.
Any body has done so before? Any body has experience with XML schema
validation within PostgreSQL?

Sorry; but was sounding a little over engineered before, it seems to be
blowing out of proportion now. By whom and how are these (immensely
complicated) rule sets going to be maintained? how is using XML going to
make this any easier than using the tools native to the database?

If they're validated inside the database then it's going to be done by a
DB admin anyway, or am I missing something? If they're done by the DB
admin, isn't it easy to just use the tools they're used to?

--
Sam http://samason.me.uk/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#11

Oliver Kohll - Mailing Lists

oliver.lists@gtwm.co.uk

over 16 years ago

In reply to: Sam Mason (#9)

Re: generic modelling of data models; enforcing constraints dynamically...

On 25 Sep 2009, at 07:22, InterRob <rob.marjot@gmail.com> wrote:

I guess it IS quite overengineered indeed...

What I'm trying to do is to facilitate different fieldwork
methodologies for archaeological research (on project basis); there
is no final agreement on data structure and semantics; however, on a
meta-level all choices are rational and can be modelled... Infact,
all models can be related to each other: that's where the "hybrid"
part comes in: I wish to implement the common denominator (90%) and
then further extend this, enabing specific data model
implementations -- including checks for data integrity.

Hi Rob,

Just wondering if you've considered rapid prototyping of the core of
it to try and gain consensus by giving people something they can see
and talk about, as an alternative to doing a lot of design work up
front?

Regards
Oliver Kohll
www.gtwm.co.uk - company
www.gtportalbase.com - product

Import Notes

Reply to msg id not found: 20090925062349.D01F72440C9@eul0001189.eu.verio.netReference msg id not found: 20090925062349.D01F72440C9@eul0001189.eu.verio.net | Resolved by subject fallback

#12

Johan Nel

johan.nel@xsinet.co.za

over 16 years ago

In reply to: InterRob (#1)

Re: generic modelling of data models; enforcing constraints dynamically...

Hi Rob,

In a database I wish to implement a GENERIC datamodel, thus on a
meta-level. All RELATIONS (part of a MODEL) will be a view on some base
(being a table) JOINed with (an) extra column(s). Thus, this view
consists of a number of FIELDS. I whish to make this view editable
(INSERT, UPDATE) using the RULE system. Some constraints will apply;
enforcing these is the problem I am trying to solve by modeling these
rules with a table "RELATION_CONSTRAINTS" (see below).

Although many people will shoot it down, I follow a very similar approach.
Data-driven application framework with exactly what you have described.
In short, my application will read the "Metadata" at runtime and "build"
the application so to speak on the fly.

For this I use the following two table approach:

meta_master:
master_no SERIAL NOT NULL PRIMARY KEY,
master_type VARCHAR(30) NOT NULL REFERENCE master_type(master_type),
master_id VARCHAR(30) NOT NULL,
master_property TEXT,
UNIQUE (master_type, master_id)

meta_link:
link_no SERIAL NOT NULL PRIMARY KEY
master_no REFERENCE meta_master(master_no),
link_type NOT NULL REFERENCE master_type(master_type),
member_no NOT NULL REFERENCE meta_master(master_no),
member_property TEXT,
UNIQUE (master_no, member_no)

Some explanation:
master_type and link_type have values like database, table, column etc.

Thus, at runtime a method FormLoad(nID) will make use of a recursive query
to load everything that is needed to build the "Form" at runtime and
associate it with the correct database, table, etc.

It is necessary to recurse all members via meta_master.master_no =
meta_link.master_no and meta_link.member_no = meta_master.master_no (use
connect_by() pre 8.4 or use the RECURSIVE views from 8.4)

Where applicable the %_property columns are used to define additional
information in the format 'name=value;nextname=value;' etc.

I was thinking of implementing this using a FUNCTION that takes a
polymorphic record parameter (and the relation name); then checking this
record against the applicable constraint expression.
This whole idea may sound like a DBMS-in-a-DBMS kind of thing... What I
am trying is to write as little as table/view-specific code as would be
necessary, while still collecting all base data in one central table...

I take the %_property column even further, in my business data I have a
property column again in tables where additional columns can be defined on
the fly based on the metadata, until such a time that users have a clear
picture of what they exactly need. This is also used in tables where
multi-company differences makes it almost impossible to have not more than
50% of a table's columns redundant.

If you need more info regarding this approach, feel free to contact me in
private.

Johan Nel
Pretoria, South Africa.

#13

Sam Mason

sam@samason.me.uk

over 16 years ago

In reply to: InterRob (#10)

Re: generic modelling of data models; enforcing constraints dynamically...

On Thu, Sep 24, 2009 at 11:07:31PM +0200, InterRob wrote:

What I'm trying to do is to facilitate different fieldwork methodologies for
archaeological research (on project basis); there is no final agreement on
data structure and semantics; however, on a meta-level all choices are
rational and can be modelled... Infact, all models can be related to each
other: that's where the "hybrid" part comes in: I wish to implement the
common denominator (90%) and then further extend this, enabing specific data
model implementations -- including checks for data integrity.

I'm my experience it depends on how technically competent your users
are. Most of mine are fine working with "their own" data files/formats
and only want the data in the database to keep track of the larger
structural stuff.

I therefore tend to just leave their data as opaque blobs (stored in
large objects, as they're reasonably small) of data and only pull out
the parts of it that are needed to keep the other parts of the projects
happy. That way I can make sure the bits the database takes care of can
be appropriately designed and the users get to keep their data exactly as
they want.

To support this I've written various bits of code that get automatically
run when users insert their data files to pull them out into the
appropriate tables. The detailed bits of the structure are of course
missed, but most of the time this data isn't needed and when it is they
want the rest of the original (normally proprietary binary file formats
that I've had to reverse engineer) file so that their program can figure
out what's going on.

It all depends on the problem domain of course, but this seems to work
OK for us! I really want to hack Samba around so that the users can
view the files directly from inside the database, but I'm not sure how
good an idea this really.

The bigger datasets (1GB+) tend to be nicely structured, so they get
handled specially.

As soon as that works, it becomes possible to record changes at row-level --
providing access to data-snapshots in time.

I think these are what my blobs are...

Furthermore, it becomes possible
to build upon this central database automated tools for management and
filing of information and different modes of data entry (including
webbased)...

...and this is what I'd call my structural bits.

The thing is: altering table structures (changes should be limited to adding
columns) is required on a ad hoc basis and End User should be able to do
so...

I generally find it's easier if I'm involved in that. Maybe it's just
my users!

I guess that requires some over engineering... ?

By "over engineering" I was meaning that you seem to be trying to solve
a more complicated problem than is necessary. There will be some
essential complexity inherent in any problem, but it's the job of every
engineer (software or hardware) to ensure that only minimal amounts of
incidental complexity are introduced.

In my case the "important" thing is to make sure that we know the
state of what's going on in the projects. I can do this by getting a
combination of data from the user (through traditional means) and by
pulling apart their data files. The "incidental complexity" I've added,
that of writing fiddly little programs to interpret their files, seems
to be better than getting the users to input the data twice; once in
their programs and once into the database.

In your case you've introduced this strange new EAV style design and the
constraint system on top of it. The benefits of this design may well be
better than the costs of developing it, but I have a feeling it may be
easier to "side-step" the problem somehow.

That all got a bit longer than I was expecting, but I hope it's useful!

--
Sam http://samason.me.uk/

#14

Ron Mayer

rm_pg@cheapcomplexdevices.com

over 16 years ago

In reply to: Sam Mason (#13)

Re: generic modelling of data models; enforcing constraints dynamically...

Sam Mason wrote:

It all depends on the problem domain of course, but this seems to work
OK for us! I really want to hack Samba around so that the users can
view the files directly from inside the database, but I'm not sure how
good an idea this really.

"hack Samba"? Wouldn't it be easier to use one of the database-as-a
filesystem FUSE bindings and run stock samba over that?

The perl Fuse::DBI module's example sounds pretty similar to the
system you described where he "file" seems to be a column in a table.
http://www.rot13.org/~dpavlin/fuse_dbi.html

If that doesn't suit your needs there are more elaborate ones(libferris)
that seem more complex and more flexible, and simpler ones (dumbofs) that
seem to be meant more as example code you could hack for your purposes
http://lwn.net/Articles/306860/
http://yiannnos.com/dumbofs

And then you could use unmodified samba out-of-the-box exporting
that to whatever the heck speaks SMB/CIFS these days.

#15

Sam Mason

sam@samason.me.uk

over 16 years ago

In reply to: Ron Mayer (#14)

Re: generic modelling of data models; enforcing constraints dynamically...

On Fri, Sep 25, 2009 at 11:01:02AM -0700, Ron Mayer wrote:

Sam Mason wrote:

It all depends on the problem domain of course, but this seems to work
OK for us! I really want to hack Samba around so that the users can
view the files directly from inside the database, but I'm not sure how
good an idea this really.

"hack Samba"? Wouldn't it be easier to use one of the database-as-a
filesystem FUSE bindings and run stock samba over that?

Huh, that would indeed be much easier. I hadn't thought about this for
a while and Rob's post reminded me. I don't think FUSE existed when I
started thinking about it and as all our clients are Windows boxes it
didn't matter at the time.

The perl Fuse::DBI module's example sounds pretty similar to the
system you described where he "file" seems to be a column in a table.
http://www.rot13.org/~dpavlin/fuse_dbi.html

FUSE looks pretty easy to get going and I think I'd want more control
over how files were presented than this gives so I'd probably end up
rolling my own code. Thanks for pointing out that FUSE though, not sure
why I'd not thought of it before. I'll probably still never get around
to it, but maybe I will!

--
Sam http://samason.me.uk/

#16

Ron Mayer

rm_pg@cheapcomplexdevices.com

over 16 years ago

In reply to: Sam Mason (#15)

Re: generic modelling of data models; enforcing constraints dynamically...

Drifting off topic so I'm no longer ccing the lists.

Sam Mason wrote:

The perl Fuse::DBI module's example sounds pretty similar to the
system you described where he "file" seems to be a column in a table.
http://www.rot13.org/~dpavlin/fuse_dbi.html

FUSE looks pretty easy to get going and I think I'd want more control
over how files were presented than this gives so I'd probably end up
rolling my own code. Thanks for pointing out that FUSE though, not sure

I FUSE really more the framework that wraps around your code.

There are applications using fuse that expose gmail as a filesystem.

Here's a simple example that uses FUSE to expose a
perl HASH and a few hello-world-like perl functions.
http://cpansearch.perl.org/src/NOSEYNICK/Fuse-Simple-1.00/README

why I'd not thought of it before. I'll probably still never get around
to it, but maybe I will!

It's actually easy enough that I wouldn't be surprised if you
try it, and get it working just for fun even if noone uses it.

#17

Erik Jones

ejones@engineyard.com

over 16 years ago

In reply to: InterRob (#10)

Re: generic modelling of data models; enforcing constraints dynamically...

On Sep 24, 2009, at 2:07 PM, InterRob wrote:

I guess it IS quite overengineered indeed...

What I'm trying to do is to facilitate different fieldwork
methodologies for archaeological research (on project basis); there
is no final agreement on data structure and semantics; however, on a
meta-level all choices are rational and can be modelled... Infact,
all models can be related to each other: that's where the "hybrid"
part comes in: I wish to implement the common denominator (90%) and
then further extend this, enabing specific data model
implementations -- including checks for data integrity.

Have you considered a non-relational, "schema-less" database such as
MongoDB or Cassandra? You're pretty much throwing out the relational
features of this database anyways so it seems that it would make sense
to use something more geared to that kind of work.

Erik Jones, Database Administrator
Engine Yard
Support, Scalability, Reliability
866.518.9273 x 260
Location: US/Pacific
IRC: mage2k

#18

David Fetter

david@fetter.org

over 16 years ago

In reply to: InterRob (#1)

Re: generic modelling of data models; enforcing constraints dynamically...

On Thu, Sep 24, 2009 at 06:28:28PM +0200, InterRob wrote:

Dear List,
I am trying to implement the following:

In a database I wish to implement a GENERIC datamodel, thus on a
meta-level.

That's not a very bright idea, even though it seems so when you first
think of it.

Relational database management explicitly trades flexibility for size
and speed. You won't be able to constrain the things you think you'll
be constraining, and the query complexity will go up like O(n!).

Instead of going down this dead-end road, get the stakeholders
together, try a few prototypes of your schema, get them together
again, etc.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#19

InterRob

rob.marjot@gmail.com

over 16 years ago

In reply to: David Fetter (#18)

Re: generic modelling of data models; enforcing constraints dynamically...

Dear David, dear all,
I very well understand what you are saying... However, the solution won't be
found in the direction you are suggesting: the system I am designing will be
used by archaeologists, involved in archaeological research (fieldwork).
Their research strategy (and with it their methodology and techniques) may
vary during research, depending on their findings and understanding of the
past that is reconstructed on-site... Along with these methodologies en
techniques, differing data-models may be needed to introduced...

Relationships between these models may be formalised (that's what I (will)
put stakeholders together for); in fact this is what I try to model --
trying to develop a system that will centralize and version the data
gathered. On a supra-project level, in fact.

Meanwhile, I made some progress; in another mail I sent to this same list, I
described the technological challenge at hand as follows:
-----
What I am trying to do is: building views on a base table, extended by one
or more columns, extracted (hence the naming of the function
"deserialize()") from a SINGLE column (XML) **that is in this same base
table** (see below). Instructions for deserialization (that is: which
'fields' to look for) reside in some other table. There are MULTIPLE base
tables, they basically look like this:

[table definition:]
BASETABLE(ID INT, model TEXT, field1 some_type, field2 some_type, ... fieldN
some_type, serialized_data XML)

So, I wish to define multiple VIEWs based on a BASETABLE; one for each
"model" (as stated in the above table definition: "model" is a property for
each row). This QUERY would look like this (producing a VIEW for "MODEL1";
the query below in invalid, unfortunately):

SELECT base_t.*, deserialized.* FROM "BASETABLE" base_t,

deserialize('MODEL1', base_t) as deserialized(fieldX some_type, fieldY
some_type) WHERE base_t.model = 'MODEL1';

I have no problem with the requirement to supply the table type in the
query; infact this is logical. Still, this query is impossible, obviously,
because "base_t" as a target is not known in the context of the FROM-clause,
where I whish to use it in calling "deserialize(...)". Ofcourse, I could
write a "deserialize()" function for each base table (e.g.
"deserialize_base1(...)") but I wish it to perform it's action on only rows
that will actually be part of the result set; thus I want the WHERE-clause
to apply to the function's seq scan álso. When provided, I whish to
incorporated the user's WHERE-clause as well; this is done by the PostgreSQL
RULE system...

Alternatively, the VIEW could be defined by the following query:

SELECT base_t.*, (deserialize('MODEL1', base_t) as temp(fieldX some_type,

field_Y some_type)).* FROM "BASETABLE" base_t WHERE base_t.model = 'MODEL1';

This approach does not work either: deserialize(...) will return its set of
fields as ONE field (comma separated, circumfixed by brackets); expressions
within a SELECT-list seem to be only allowed to result in ONE column, except
from the * shorthand...

** So, the question is: how can i feed my "deserialize()" function with a
record subset (e.g. filter by a WHERE-clause) of the BASETABLE, while still
returning a record?!? **

I tried the following approach also:

SELECT (SELECT fieldX FROM deserialize(base_t) deserialized(fieldX

some_type, fieldY some_type)) "fieldX", (SELECT fieldY FROM
deserialize(base_t) deserialized(fieldX some_type, fieldY some_type))
"fieldY" FROM "BASETABLE" table_t WHERE model= 'MODEL1';

Which infact worked, but caused the function to get invoked TWICE FOR EACH
ROW (even when marked IMMUTABLE STRICT; or did I have to flush cached query
plans in psql?).

Another approach would be to put all key/value pairs into a separate table
(as one would do when implementing a EAV-model within a RDBMS) which is then
to be joined (and joined again... and possibly again (!); in case of
MULTIPLE additional rows -- depending on the definition of the VIEW) onto
the BASETABLE, rather than to deserialize from XML which is stored within
the same record... How does this approach then actually translate in terms
of table scans? Will they be limited by the filter on the BASETABLE, as the
available values to join on will be limited? At any rate: this approach will
be more difficult to implement / maintain in case of EDITABLE VIEWS
(inserts, update, delete)...

Hope any of you has some useful thoughts on this... It appears to me
updating the additional (virtual) fields in the BASETABLE is much easier:
the "serialize()"-function can be fed by a list of key/value pairs,
producing some XML that can be stored in the xml field of "serialized_data",
part of this same base table...
All this needs to be implemented fully in the database back-end; client
application will not know they are talking to VIEWS rather than tables...
Thus: the hosted database must simulate to provide various tables, whereas
these are in fact stored in a limited number of base tables.

----

Thanks in advance, you guys out there!

Rob

2009/9/26 David Fetter <david@fetter.org>

Show quoted text

On Thu, Sep 24, 2009 at 06:28:28PM +0200, InterRob wrote:

Dear List,
I am trying to implement the following:

In a database I wish to implement a GENERIC datamodel, thus on a
meta-level.

That's not a very bright idea, even though it seems so when you first
think of it.

Relational database management explicitly trades flexibility for size
and speed. You won't be able to constrain the things you think you'll
be constraining, and the query complexity will go up like O(n!).

Instead of going down this dead-end road, get the stakeholders
together, try a few prototypes of your schema, get them together
again, etc.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#20

InterRob

rob.marjot@gmail.com

over 16 years ago

In reply to: Erik Jones (#17)

Re: generic modelling of data models; enforcing constraints dynamically...

In fact, I considered doing so, yes... But no luck: to complicate things, I
will need the support for spatial datatypes, as implemented by the contrib
"PostGIS"... Moreover: various applications that will make-up the front-end,
will only be able to talk with mainstraim or ODBC-compatible databases :((

Rob

2009/9/26 Erik Jones <ejones@engineyard.com>

Show quoted text

On Sep 24, 2009, at 2:07 PM, InterRob wrote:

I guess it IS quite overengineered indeed...

What I'm trying to do is to facilitate different fieldwork methodologies
for archaeological research (on project basis); there is no final agreement
on data structure and semantics; however, on a meta-level all choices are
rational and can be modelled... Infact, all models can be related to each
other: that's where the "hybrid" part comes in: I wish to implement the
common denominator (90%) and then further extend this, enabing specific data
model implementations -- including checks for data integrity.

Have you considered a non-relational, "schema-less" database such as
MongoDB or Cassandra? You're pretty much throwing out the relational
features of this database anyways so it seems that it would make sense to
use something more geared to that kind of work.

Erik Jones, Database Administrator
Engine Yard
Support, Scalability, Reliability
866.518.9273 x 260
Location: US/Pacific
IRC: mage2k