tables with lots of columns - what alternative from performance point of view?

Started by hubert depesz lubaczewskiover 20 years ago3 messagesgeneral
Jump to latest

hi
jus recently there were some thread on postgresql list with people asying :
i have 700 columns, i have 1000 columns and so on.
some people, imediatelly responded: change your schema.
this is what forced to me ask:

i have a situation where i ahve to store a number of "objects" in database.
all objects have 3 specific attributes (which go into objects table), and
may have a lot of "custom fields".
basically - lsit of accessible custom fields for object depends on which
object-category this object belongs to.
now.
i know, i could have written it in this way:

create table object_custom_fields (id serial primary key, object_id int8,
field_id int8, field_value text);
but:
this approach has two very big drawbacks (for me):
1. the table cannot differentiate between custom fields of type "date",
"number" and so on. - everything is stored as text.
2. it is rather slow. i have to do a non-unique index scan over
object_custom_fields, get all records, and pivot it (on the client side of
curse) to make it usable.

i did it differently, definitelly not nicely, but i dont see any other way
to get this performance with unknown list of custom fields:
1. create table cf_types (id serial, codename text, representation text);
2. create table cf_definitions (id serial, category_id int8, type_id int8,
field-number int4);
3. create table cf_values (id serial, object_id int8 (unique),
...................................................);

where
cf_types store information like this:
id | codename | representation
----+------------+----------------
1 | bool | boolean
2 | integer | integer
3 | number | number
4 | text | text
5 | note | text
6 | date | date
...
basically - there might be many "types" with the same representation.
then
cf_values have a lot of (128 at the moment) fields for all possible
representations.
basically it looks like:
id, object_id, boolean_1 ... boolean_128, integer_1..integer_128, ...
the datatypes of this fields relate to their content (integer_* fields have
datatype int8, and so on).

now.
in cf_definitions i specify, category, field_type_id, and a field-number -
which relates to _NUMBER in fields in cf_values.

what i did achive is *very* fast retrieval of data for any given object.
the schema of cf_values table is absolutelly awful, and i will never say
differently.
my point is - if somebody (tom lane for example) says - redesign your schema
- whenever he reads about table with 700 column (i have more :) - then i
must have missed something absolutelyl simple, fast and elegant. what is
this?

depesz

#2Oleg Bartunov
oleg@sai.msu.su
In reply to: hubert depesz lubaczewski (#1)
Re: tables with lots of columns - what alternative from

contrib/hstore will save you.
See http://www.sai.msu.su/~megera/postgres/gist/hstore/README.hstore
for details.

Oleg
On Wed, 7 Dec 2005, hubert depesz lubaczewski wrote:

hi
jus recently there were some thread on postgresql list with people asying :
i have 700 columns, i have 1000 columns and so on.
some people, imediatelly responded: change your schema.
this is what forced to me ask:

i have a situation where i ahve to store a number of "objects" in database.
all objects have 3 specific attributes (which go into objects table), and
may have a lot of "custom fields".
basically - lsit of accessible custom fields for object depends on which
object-category this object belongs to.
now.
i know, i could have written it in this way:

create table object_custom_fields (id serial primary key, object_id int8,
field_id int8, field_value text);
but:
this approach has two very big drawbacks (for me):
1. the table cannot differentiate between custom fields of type "date",
"number" and so on. - everything is stored as text.
2. it is rather slow. i have to do a non-unique index scan over
object_custom_fields, get all records, and pivot it (on the client side of
curse) to make it usable.

i did it differently, definitelly not nicely, but i dont see any other way
to get this performance with unknown list of custom fields:
1. create table cf_types (id serial, codename text, representation text);
2. create table cf_definitions (id serial, category_id int8, type_id int8,
field-number int4);
3. create table cf_values (id serial, object_id int8 (unique),
...................................................);

where
cf_types store information like this:
id | codename | representation
----+------------+----------------
1 | bool | boolean
2 | integer | integer
3 | number | number
4 | text | text
5 | note | text
6 | date | date
...
basically - there might be many "types" with the same representation.
then
cf_values have a lot of (128 at the moment) fields for all possible
representations.
basically it looks like:
id, object_id, boolean_1 ... boolean_128, integer_1..integer_128, ...
the datatypes of this fields relate to their content (integer_* fields have
datatype int8, and so on).

now.
in cf_definitions i specify, category, field_type_id, and a field-number -
which relates to _NUMBER in fields in cf_values.

what i did achive is *very* fast retrieval of data for any given object.
the schema of cf_values table is absolutelly awful, and i will never say
differently.
my point is - if somebody (tom lane for example) says - redesign your schema
- whenever he reads about table with 700 column (i have more :) - then i
must have missed something absolutelyl simple, fast and elegant. what is
this?

depesz

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

In reply to: Oleg Bartunov (#2)
Re: tables with lots of columns - what alternative from performance point of view?

On 12/7/05, Oleg Bartunov <oleg@sai.msu.su> wrote:

contrib/hstore will save you.
See http://www.sai.msu.su/~megera/postgres/gist/hstore/README.hstore
for details.

thanks. i didn't know about it, and it looks great. but i'm not sure if we
will be able to use it - my developers use java + hibernate, and they say it
cannot work with any "fancy" datatypes (including such a base things like
"INTERVAL").
i will definitelly use is though in my other (not hibernate-dependant)
projects.

best regards

depesz