Dynamic data model, locks and performance

Started by Pierre Thibaultover 15 years ago11 messagesgeneral

pierre.thibault1@gmail.com

over 15 years ago

Hello people of the Postgresql world!

I am wondering if Postgresql would a great choice for my database needs.

I would like to create a db with dynamic data model. I would like to be able
to add tables and columns to existing tables while other queries are
running.

Will Postresql be able to manage the locks correctly? Am I pushing
Postgresql too far by wanting to do something like this? If so, what do you
suggest instead?

Also, I would like to be able to have millions of tables in my database. As
I know, Postgresql uses one file per table. So is this problem related to
performance about mapping correctly the files with the file system? Is
Posgresql able to handle this kind of charge?

--
A+

-------------
Pierre

Craig Ringer

craig@2ndquadrant.com

over 15 years ago

In reply to: Pierre Thibault (#1)

Re: Dynamic data model, locks and performance

On 29/07/10 07:06, Pierre Thibault wrote:

Hello people of the Postgresql world!

I am wondering if Postgresql would a great choice for my database needs.

I would like to create a db with dynamic data model. I would like to be
able to add tables and columns to existing tables while other queries
are running.

Will Postresql be able to manage the locks correctly? Am I pushing
Postgresql too far by wanting to do something like this? If so, what do
you suggest instead?

Also, I would like to be able to have millions of tables in my database.
As I know, Postgresql uses one file per table. So is this problem
related to performance about mapping correctly the files with the file
system? Is Posgresql able to handle this kind of charge?

From what you say, I really doubt that PostgreSQL will be well suited to
your needs. You might be able to make it work, but I doubt it'll be
great. At heart Pg is a relational database, and you want to do things
like dynamic table structures that relational databases are typically
not great at.

People here might be able to suggest alternate approaches if you can
explain *why* you want to do what you want, what your goals are, etc. Pg
might be quite suitable if you can rethink how you're doing things a
little - or it might just be a bad choice.

I doubt anyone can make any useful recommendations without a more
complete explanation of what you're trying to achieve and why you want
to do what you have described.

Re Pg-specific stuff, you should look into hstore. It might offer a way
to avoid the need for dynamic table re-structuring.

--
Craig Ringer

Tech-related writing: http://soapyfrogs.blogspot.com/

Pierre Thibault

pierre.thibault1@gmail.com

over 15 years ago

In reply to: Craig Ringer (#2)

Re: Dynamic data model, locks and performance

2010/7/28 Craig Ringer <craig@postnewspapers.com.au>

On 29/07/10 07:06, Pierre Thibault wrote:
I doubt anyone can make any useful recommendations without a more
complete explanation of what you're trying to achieve and why you want
to do what you have described.

Thank you Craig,

Yes, I was not very specific. I have an idea in mind. Everyone knows about
Craig's List and the hype about NoSql databases. This seems to be cool to do
full text searches in a lot of data. But what about doing more formal
searches based specific criteria like dates, numbers or other kind values on
specific fields? This where an SQL database shines.

What I would like to do is enable users to create their own data model.
Enable them to create a model and make it evolve. For example, it would be
cool to create a model to represent car adds. Then, the buyers can search
adds using the previous model with specific criteria like the color of the
car they want.

I thought about using a table where each row would represents a field in the
dynamic model. The row would contain a fields for each possible data type
supported but only one of them would be used. This would waste a lot space
and it would also be hard to query efficiently but I will not have to change
the schema as users create new models. Otherwise, I would create a standard
data model and apply the changes as users update their models.

I also like to support inheritance so we can have a simple model for regular
adds and more specfic models with more fields for car adds for example.

I have a really hard finding how to implement my idea efficiently. So the
advice of experienced database developers is what I am looking for.

Re Pg-specific stuff, you should look into hstore. It might offer a way
to avoid the need for dynamic table re-structuring.

I thought about using maps for the entities of my domain model. hstore seems
to be an interesting avenue.

--
A+

-------------
Pierre

Peter Hunsberger

peter.hunsberger@gmail.com

over 15 years ago

In reply to: Pierre Thibault (#3)

Re: Dynamic data model, locks and performance

On Wed, Jul 28, 2010 at 10:15 PM, Pierre Thibault
<pierre.thibault1@gmail.com> wrote:

What I would like to do is enable users to create their own data model. Enable them to create a model and make > it evolve. For example, it would be cool to create a model to represent car adds. Then, the buyers can search adds > using the previous model with specific criteria like the color of the car they want.

Sounds like a standard parts explosion or component / subcomponent
hierarchy? The most general form has two pieces:

1) a tree model of some form. I like the set / subset model that Joe
Celko popularized (search the archives for other references to it from
me);

2) an extensible set of containers to hang of the tree. Here I like
to use a strongly typed version of the Entity Attribute Value (EAV)
type model;

Those two pieces on their own are so general that you can't really
scale the results very far. However, if you've got some specific
goals in mind I might be able to provide some hints. If you're an
IEEE member I can point you at a presentation I did on the basics (I
think)...

--
Peter Hunsberger

Craig Ringer

craig@2ndquadrant.com

over 15 years ago

In reply to: Pierre Thibault (#3)

Re: Dynamic data model, locks and performance

On 29/07/10 11:15, Pierre Thibault wrote:

What I would like to do is enable users to create their own data model.

Then, really, SQL databases aren't wonderful for your needs. You can use
them for dynamic, user-defined schema, but you'll always be swimming up
hill.

I thought about using a table where each row would represents a field in
the dynamic model. The row would contain a fields for each possible data
type supported but only one of them would be used.

Google "EAV"

This has known problems. It works, but it's kind of like bolting floats
onto a car and trying to use it as a boat. Not very nice.

Really, I'd be researching dynamic schema databases, object stores,
key/value set stores, etc. Such things are outside my personal
experience, though, and I can't really advise you much on technologies.
Beware of the latest No-SQL fad platforms, though; you might land up
being better off with something older and more stable even if it's less
fashionable than CouchDB or whatever is currently "in" today.

--
Craig Ringer

Tech-related writing: http://soapyfrogs.blogspot.com/

Yeb Havinga

yebhavinga@gmail.com

over 15 years ago

In reply to: Pierre Thibault (#1)

Re: Dynamic data model, locks and performance

Pierre Thibault wrote:

Hello people of the Postgresql world!

I am wondering if Postgresql would a great choice for my database needs.

I would like to create a db with dynamic data model. I would like to
be able to add tables and columns to existing tables while other
queries are running.

It sounds a bit like the (commercial) ChronicDB add-on that was on the
announce list a while ago, http://chronicdb.com/

regards,
Yeb Havinga

Craig Ringer

craig@2ndquadrant.com

over 15 years ago

In reply to: Pierre Thibault (#1)

Re: Dynamic data model, locks and performance

On 29/07/10 22:36, Pierre Thibault wrote:

Why so? This is something expected by a database used in a constant
integration environment. Maybe I did not expressed myself very well. Users
are not changing their models all the time. They create new models which
mean create new tables and from time to time they add new fields to existing
tables just like developers do when they update a schema for new application
functionalities.

Ah, OK. I read your initial mail as implying much more frequent changes,
especially combined with "millions" of tables.

In my last job, I was working with constant database integration. We were
created DDL scripts to add new fields and tables live on a SQLSever database
in production. Most scripts were executed during to night to reduce the
impact on the db. In practice, this may means that a running query will have
to wait maybe half a second to get a lock because of such update. Usually,
not really more than that. Can I expect similar performance with
Postgressql?

With a few caveats, yes.

The main one: For columns you want to be NOT NULL, you should add new
columns as nullable. Then UPDATE the new column to hold any desired
default, before issuing an

ALTER TABLE ... ALTER COLUMN ... SET NOT NULL.

That's because an ALTER TABLE to add a nullable column doesn't have to
rewrite the table. An ALTER TABLE to add a NOT NULL column has to
immediately rewrite the table to add the default to every record. This
is slow, and during this operation ALTER TABLE holds an exclusive lock.

By contrast, if you ALTER TABLE to add a nullable column (brief
exclusive lock), UPDATE (long much lower-order lock that doesn't
conflict with SELECT, INSERT, or UPDATE to unaffected rows) and then
finally ALTER TABLE again to add the constraint (a further brief lock)
you have greatly reduced lock times.

Really, I'd be researching dynamic schema databases, object stores,
key/value set stores, etc. Such things are outside my personal
experience, though, and I can't really advise you much on technologies.
Beware of the latest No-SQL fad platforms, though; you might land up
being better off with something older and more stable even if it's less
fashionable than CouchDB or whatever is currently "in" today.

Maybe, but, as I said, using a SQL database with the one table by class
hierarchy strategy seems to be the way to go for me. I'll take a lot a these
options too.

Just beware of huge table counts. People have reported issues on the
list with truly huge numbers of tables. It's not something that turns up
in most regular relational designs, and there are a few things in Pg
(like, AFAIK, autovacuum's scanning of tables to vacuum) that scale
linearly with table counts.

I'm sure it's workable, it just might not be ideal.

--
Craig Ringer

Import Notes

Reply to msg id not found: AANLkTi=eKGTMoHczu7jbZxJObBiXTtovrF8XiDf5HZNY@mail.gmail.com

Andreas Joseph Krogh

andreak@officenet.no

over 15 years ago

In reply to: Pierre Thibault (#3)

Re: Dynamic data model, locks and performance

On 07/29/2010 05:15 AM, Pierre Thibault wrote:

2010/7/28 Craig Ringer <craig@postnewspapers.com.au
<mailto:craig@postnewspapers.com.au>>

On 29/07/10 07:06, Pierre Thibault wrote:
I doubt anyone can make any useful recommendations without a more
complete explanation of what you're trying to achieve and why you want
to do what you have described.

Thank you Craig,

Yes, I was not very specific. I have an idea in mind. Everyone knows
about Craig's List and the hype about NoSql databases. This seems to
be cool to do full text searches in a lot of data. But what about
doing more formal searches based specific criteria like dates, numbers
or other kind values on specific fields? This where an SQL database
shines.

What I would like to do is enable users to create their own data
model. Enable them to create a model and make it evolve. For example,
it would be cool to create a model to represent car adds. Then, the
buyers can search adds using the previous model with specific criteria
like the color of the car they want.

I thought about using a table where each row would represents a field
in the dynamic model. The row would contain a fields for each possible
data type supported but only one of them would be used. This would
waste a lot space and it would also be hard to query efficiently but I
will not have to change the schema as users create new models.
Otherwise, I would create a standard data model and apply the changes
as users update their models.

I also like to support inheritance so we can have a simple model for
regular adds and more specfic models with more fields for car adds for
example.

I have a really hard finding how to implement my idea efficiently. So
the advice of experienced database developers is what I am looking for.

Re Pg-specific stuff, you should look into hstore. It might offer
a way
to avoid the need for dynamic table re-structuring.

I thought about using maps for the entities of my domain model. hstore
seems to be an interesting avenue.

You might want to take a look at TopicMaps, http://www.topicmaps.org/
One of the greatest topicmap-engines is opensource: http://www.ontopia.net/
It uses, if you want, an RDBMS (PostgreSQL is fully supported) underneath.

John R Pierce

pierce@hogranch.com

over 15 years ago

In reply to: Andreas Joseph Krogh (#8)

Re: Dynamic data model, locks and performance

On 07/30/10 1:08 PM, Andreas Joseph Krogh wrote:

You might want to take a look at TopicMaps, http://www.topicmaps.org/
One of the greatest topicmap-engines is opensource:
http://www.ontopia.net/

Huh, visited both those sites, and I still have no idea what TopicMaps
actually are.

#10

Andreas Joseph Krogh

andreak@officenet.no

over 15 years ago

In reply to: John R Pierce (#9)

Re: Dynamic data model, locks and performance

On 07/30/2010 10:20 PM, John R Pierce wrote:

On 07/30/10 1:08 PM, Andreas Joseph Krogh wrote:

You might want to take a look at TopicMaps, http://www.topicmaps.org/
One of the greatest topicmap-engines is opensource:
http://www.ontopia.net/

Huh, visited both those sites, and I still have no idea what TopicMaps
actually are.

http://www.ontopia.net/topicmaps/materials/tao.html

#11

ChronicDB Community Team

community@chronicdb.com

over 15 years ago

In reply to: Craig Ringer (#7)

Re: Dynamic data model, locks and performance

This example is certainly a workable situation. However it does require
understanding the constraints of an ALTER TABLE statement and manually
developing appropriate scripts.

The update model offered my ChronicDB accounts for schema changes of
considerable complexity, such as merging fields, partitioning, renaming,
or moving fields from one table to another, as well as for reversing
schema changes live if a bug is discovered in the application.

Show quoted text

On Thu, 2010-07-29 at 23:22 +0800, Craig Ringer wrote:

On 29/07/10 22:36, Pierre Thibault wrote:

Why so? This is something expected by a database used in a constant
integration environment. Maybe I did not expressed myself very well. Users
are not changing their models all the time. They create new models which
mean create new tables and from time to time they add new fields to existing
tables just like developers do when they update a schema for new application
functionalities.

Ah, OK. I read your initial mail as implying much more frequent changes,
especially combined with "millions" of tables.

In my last job, I was working with constant database integration. We were
created DDL scripts to add new fields and tables live on a SQLSever database
in production. Most scripts were executed during to night to reduce the
impact on the db. In practice, this may means that a running query will have
to wait maybe half a second to get a lock because of such update. Usually,
not really more than that. Can I expect similar performance with
Postgressql?

With a few caveats, yes.

The main one: For columns you want to be NOT NULL, you should add new
columns as nullable. Then UPDATE the new column to hold any desired
default, before issuing an

ALTER TABLE ... ALTER COLUMN ... SET NOT NULL.

That's because an ALTER TABLE to add a nullable column doesn't have to
rewrite the table. An ALTER TABLE to add a NOT NULL column has to
immediately rewrite the table to add the default to every record. This
is slow, and during this operation ALTER TABLE holds an exclusive lock.

By contrast, if you ALTER TABLE to add a nullable column (brief
exclusive lock), UPDATE (long much lower-order lock that doesn't
conflict with SELECT, INSERT, or UPDATE to unaffected rows) and then
finally ALTER TABLE again to add the constraint (a further brief lock)
you have greatly reduced lock times.

Really, I'd be researching dynamic schema databases, object stores,
key/value set stores, etc. Such things are outside my personal
experience, though, and I can't really advise you much on technologies.
Beware of the latest No-SQL fad platforms, though; you might land up
being better off with something older and more stable even if it's less
fashionable than CouchDB or whatever is currently "in" today.

Maybe, but, as I said, using a SQL database with the one table by class
hierarchy strategy seems to be the way to go for me. I'll take a lot a these
options too.

Just beware of huge table counts. People have reported issues on the
list with truly huge numbers of tables. It's not something that turns up
in most regular relational designs, and there are a few things in Pg
(like, AFAIK, autovacuum's scanning of tables to vacuum) that scale
linearly with table counts.

I'm sure it's workable, it just might not be ideal.

--
Craig Ringer