PostgreSQL as a triple store

Started by Jimmy Thrasibuleover 11 years ago11 messagesgeneral
Jump to latest
#1Jimmy Thrasibule
thrasibule.jimmy@gmail.com

Hello,

I'm looking into gathering information about geographical locations:

- Coordinates
- Place name
- Pictures
- etc.

Since a place can be anything, a specific place may need any kind of data
type to be described.

I'm therefore looking into using the same method as the semantic Web and
trying to describe a place with triples.

1. The "triples" table will reference a subject, attribute and value.
2. The "attributes" table will have an attribute name, a type and mauve a
table name.
3. For each attribute type, a new table is created to store the values.
4. A "resources" table list all the available resources (location, picture,
...).

Using this, I can easily add a new resource and describe it by adding new
triples. Every item is identified using UUIDs as primary key.

This seems too simple to be true so I falling back to you in case you see
any pitfalls here.

The "triples" table will grow very fast in this setup and may become a
bottleneck. However, I don't see any other way to store information about
something as wide as places around the world.

Regards,
Jimmy

#2Chris Travers
chris.travers@gmail.com
In reply to: Jimmy Thrasibule (#1)
Re: PostgreSQL as a triple store

Is there a reason why hstore or json is not an option? That may work a lot
better than this approach.

Another approach I have had is a set of additional catalog tables and
dynamically adding/removing columns from an extended attributes table.

On Tue, Aug 12, 2014 at 6:19 AM, Jimmy Thrasibule <
thrasibule.jimmy@gmail.com> wrote:

Hello,

I'm looking into gathering information about geographical locations:

- Coordinates
- Place name
- Pictures
- etc.

Since a place can be anything, a specific place may need any kind of data
type to be described.

I'm therefore looking into using the same method as the semantic Web and
trying to describe a place with triples.

1. The "triples" table will reference a subject, attribute and value.
2. The "attributes" table will have an attribute name, a type and mauve a
table name.
3. For each attribute type, a new table is created to store the values.
4. A "resources" table list all the available resources (location,
picture, ...).

Using this, I can easily add a new resource and describe it by adding new
triples. Every item is identified using UUIDs as primary key.

This seems too simple to be true so I falling back to you in case you see
any pitfalls here.

The "triples" table will grow very fast in this setup and may become a
bottleneck. However, I don't see any other way to store information about
something as wide as places around the world.

Regards,
Jimmy

--
Best Wishes,
Chris Travers

Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor
lock-in.
http://www.efficito.com/learn_more

#3Jonas Xie
jonas@jxie.de
In reply to: Jimmy Thrasibule (#1)
Re: PostgreSQL as a triple store

We are currently working on in-database semantic reasoning.
However, if you do not need any T-Box or A-Box reasoning, maybe JSON is
a better choice in your case.

Am 12.08.2014 15:19, schrieb Jimmy Thrasibule:

Hello,

I'm looking into gathering information about geographical locations:

- Coordinates
- Place name
- Pictures
- etc.

Since a place can be anything, a specific place may need any kind of
data type to be described.

I'm therefore looking into using the same method as the semantic Web
and trying to describe a place with triples.

1. The "triples" table will reference a subject, attribute and value.
2. The "attributes" table will have an attribute name, a type and
mauve a table name.
3. For each attribute type, a new table is created to store the values.
4. A "resources" table list all the available resources (location,
picture, ...).

Using this, I can easily add a new resource and describe it by adding
new triples. Every item is identified using UUIDs as primary key.

This seems too simple to be true so I falling back to you in case you
see any pitfalls here.

The "triples" table will grow very fast in this setup and may become a
bottleneck. However, I don't see any other way to store information
about something as wide as places around the world.

Regards,
Jimmy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Jimmy Thrasibule
thrasibule.jimmy@gmail.com
In reply to: Chris Travers (#2)
Re: PostgreSQL as a triple store

Is there a reason why hstore or json is not an option? That may work a lot
better than this approach.

I don't want to move away from SQL common features so I can have a
test environment using SQLite and deploy on PostgreSQL. This approach
looks elegant and simple to me. Using a new table per attribute type
or even per attribute for the values, I can easily index them for
quick lookups.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

In reply to: Jimmy Thrasibule (#4)
Re: PostgreSQL as a triple store

On 12/08/2014 15:57, Jimmy Thrasibule wrote:

Is there a reason why hstore or json is not an option? That may work a lot
better than this approach.

I don't want to move away from SQL common features so I can have a
test environment using SQLite and deploy on PostgreSQL. This approach
looks elegant and simple to me. Using a new table per attribute type
or even per attribute for the values, I can easily index them for
quick lookups.

What is your test environment? Postgres works quite happily on my rather
low-powered Windows 7 laptop, and so I have all the bells and whistles
available for development.

Ray.

--
Raymond O'Donnell :: Galway :: Ireland
rod@iol.ie

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Bill Moran
wmoran@potentialtech.com
In reply to: Jimmy Thrasibule (#4)
Re: PostgreSQL as a triple store

On Tue, 12 Aug 2014 16:57:32 +0200
Jimmy Thrasibule <thrasibule.jimmy@gmail.com> wrote:

Is there a reason why hstore or json is not an option? That may work a lot
better than this approach.

I don't want to move away from SQL common features so I can have a
test environment using SQLite and deploy on PostgreSQL. This approach
looks elegant and simple to me. Using a new table per attribute type
or even per attribute for the values, I can easily index them for
quick lookups.

Couple of things to keep in mind:
* SQLLite and PostgreSQL are not 100% compatable. It's cheap and easy to
set up PostgreSQL for testing/development, so it seems silly (to me) to test
on something that might behave differently than the production environment.

* Any setup where the application can execute DDL is a potential security
concern. If the code can manipulate tables then a bug in the code can
allow an accidental or malicious user to quickly and easily destroy data.

--
Bill Moran
I need your help to succeed:
http://gamesbybill.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7John R Pierce
pierce@hogranch.com
In reply to: Jimmy Thrasibule (#1)
Re: PostgreSQL as a triple store

On 8/12/2014 6:19 AM, Jimmy Thrasibule wrote:

I'm looking into gathering information about geographical locations:

- Coordinates
- Place name
- Pictures
- etc.

Since a place can be anything, a specific place may need any kind of
data type to be described.

I'm therefore looking into using the same method as the semantic Web
and trying to describe a place with triples.

1. The "triples" table will reference a subject, attribute and value.
2. The "attributes" table will have an attribute name, a type and
mauve a table name.
3. For each attribute type, a new table is created to store the values.
4. A "resources" table list all the available resources (location,
picture, ...).

Using this, I can easily add a new resource and describe it by adding
new triples. Every item is identified using UUIDs as primary key.

that is a variation of an anti-pattern known as EAV
(Entity-Attribute-Value).

its impossible to implement well-performing queries with this as you'll
need to make several sequential queries each time, since table names
can't be 'dynamic' in a sql query by design. (query triples join
attributes, case on the attribute type, query appropriate attribute
table and possibly resources)

http://mikesmithers.wordpress.com/2013/12/22/the-anti-pattern-eavil-database-design/

--
john r pierce 37N 122W
somewhere on the middle of the left coast

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8Chris Travers
chris.travers@gmail.com
In reply to: Bill Moran (#6)
Re: PostgreSQL as a triple store

On Tue, Aug 12, 2014 at 8:33 AM, Bill Moran <wmoran@potentialtech.com>
wrote:

On Tue, 12 Aug 2014 16:57:32 +0200
Jimmy Thrasibule <thrasibule.jimmy@gmail.com> wrote:

Is there a reason why hstore or json is not an option? That may work

a lot

better than this approach.

I don't want to move away from SQL common features so I can have a
test environment using SQLite and deploy on PostgreSQL. This approach
looks elegant and simple to me. Using a new table per attribute type
or even per attribute for the values, I can easily index them for
quick lookups.

Couple of things to keep in mind:
* SQLLite and PostgreSQL are not 100% compatable. It's cheap and easy to
set up PostgreSQL for testing/development, so it seems silly (to me) to
test
on something that might behave differently than the production environment.

* Any setup where the application can execute DDL is a potential security
concern. If the code can manipulate tables then a bug in the code can
allow an accidental or malicious user to quickly and easily destroy data.

This is a good point I probably should have mentioned specifically. Making
my solution secure means restricting access to the catalog and catalog to
DDL features to a very small subset of trusted users (ideally not through a
standard application connection). In this case, one might as well restrict
it to those who are already db superusers unless there is a clear need to
broaden it.

BTW, a specific case in point... DDL can't be parameterized since it
doesn't have a query plan. This means you are doing string concatenation
to create your ddl queries. If you aren't careful someone can add an
attribute like:

'; DROP TABLE things; --

In practice this means a great deal of manual review and restriction on
which users can access this feature.

Best Wishes,
Chris Travers

--
Bill Moran
I need your help to succeed:
http://gamesbybill.com

--
Best Wishes,
Chris Travers

Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor
lock-in.
http://www.efficito.com/learn_more

#9Seref Arikan
serefarikan@gmail.com
In reply to: Jimmy Thrasibule (#1)
Re: PostgreSQL as a triple store

Hi Jimmy,
I think you're going to need to perform experiments and decide if the
flexibility you get is worth the performance you're going to lose.
As you can see from John R Pierce's response it is only matter of time
before someone (correctly) warns you about the performance issues with EAV
derivatives, but then the conversation becomes too abstract, at least in my
opinion.
Why? Because we don't know how much data you're going to process with this
design, with what kind of hardware.

I've developed solutions where I've used EAV along with a more relational
design, they were delivered on time, performed at the expected/acceptable
level, so no problems. Then again, I've tried to use EAV bluntly in a
healthcare related project and it died on me quite quickly when the data
grew.

I always use on demand Amazon instances to test performance of the db
design to test performance using automatically generated data. It is a very
good way of testing your design as data grows and I suggest you consider
the same approach.

Also, your requirements sounds related to some projects that use PostGis,
have you taken a look at that?

Best regards
Seref

On Tue, Aug 12, 2014 at 2:19 PM, Jimmy Thrasibule <
thrasibule.jimmy@gmail.com> wrote:

Show quoted text

Hello,

I'm looking into gathering information about geographical locations:

- Coordinates
- Place name
- Pictures
- etc.

Since a place can be anything, a specific place may need any kind of data
type to be described.

I'm therefore looking into using the same method as the semantic Web and
trying to describe a place with triples.

1. The "triples" table will reference a subject, attribute and value.
2. The "attributes" table will have an attribute name, a type and mauve a
table name.
3. For each attribute type, a new table is created to store the values.
4. A "resources" table list all the available resources (location,
picture, ...).

Using this, I can easily add a new resource and describe it by adding new
triples. Every item is identified using UUIDs as primary key.

This seems too simple to be true so I falling back to you in case you see
any pitfalls here.

The "triples" table will grow very fast in this setup and may become a
bottleneck. However, I don't see any other way to store information about
something as wide as places around the world.

Regards,
Jimmy

#10Chris Travers
chris.travers@gmail.com
In reply to: Seref Arikan (#9)
Re: PostgreSQL as a triple store

On Wed, Aug 13, 2014 at 12:43 AM, Seref Arikan <serefarikan@gmail.com>
wrote:

Hi Jimmy,
I think you're going to need to perform experiments and decide if the
flexibility you get is worth the performance you're going to lose.
As you can see from John R Pierce's response it is only matter of time
before someone (correctly) warns you about the performance issues with EAV
derivatives, but then the conversation becomes too abstract, at least in my
opinion.
Why? Because we don't know how much data you're going to process with this
design, with what kind of hardware.

We also don't know how the data will be processed.

Where EAV dies a screaming horrible death is when you try to search on
multiple attributes. If this data is just never going to be searched on
attributes, EAV isn't that bad (I use it for some things where that is
clearly the case).

--
Best Wishes,
Chris Travers

#11Jimmy Thrasibule
jimmy.thrasibule@gmail.com
In reply to: Seref Arikan (#9)
Re: PostgreSQL as a triple store

I think you're going to need to perform experiments and decide if the

flexibility you get is worth the performance you're going to lose.

As you can see from John R Pierce's response it is only matter of time

before someone (correctly) warns you about the performance issues with EAV
derivatives, but then the conversation becomes too abstract, at least in my
opinion.

Why? Because we don't know how much data you're going to process with

this design, with what kind of hardware.

My data model is not completely ready yet and is still being worked on. But
yes, I think I have first to think more about how I will be using the data
before looking at a storage model.

I will indeed run some experiments before but using this model, I was
planning to but some complexity in the code.

Also, your requirements sounds related to some projects that use PostGis,

have you taken a look at that?

Never heard of it, I'll give it a look.

---
Jimmy