dynamic crosstab

Started by SunWuKungabout 18 years ago47 messagesgeneral
Jump to latest
#1SunWuKung
Balazs.Klein@t-online.hu

Hi,

I found this to create dynamic crosstabs (where the resulting columns
are not known beforehand): http://www.ledscripts.com/tech/article/view/5.html
(Thanks for Denis Bitouzé on
http://www.postgresonline.com/journal/index.php?/archives/14-CrossTab-Queries-in-PostgreSQL-using-tablefunc-contrib.html
for pointing it out.).
This is basically dynamically generating an SQL string with CASE ...
WHEN that will create a view.
This could work although for hundreds of columns it looks a bit scary
for me.
Isn't there a more elegant way to achieve this with tablefunc crosstab
and if there isn't don't you think it could/should be there?
There is a syntax where you could specify the columns with a SELECT
DISTINCT statement - couldn't it also generate the enumeration string
eg. presuming that all returning colums are stored as text?
Or if that is not possible instead of the enumeration part wouldn't it
be better to put a name of the view that could be created/recreated?

I know that most db people don't care much about pivot/crosstab in the
db but imagine this situation:
I am storing questionnaire results on people. Since the questionnaires
are created by users I have no other way than using an EAV model like
personID, questionID, responseValue to store responses. Now this table
gets long 300 question per questionnaire, 3000 people and we have 1m
row. Now whenever I need to download this data in my case 2/3rd of it
would be redundant if I could pivot it first - and in a 20MB csv its
significant (I know its a tradeoff between processing and storage).
Moreover my users can't do anything with this dataformat - they need
to pivot it offline anyway, which is not easy (Excel cant do it,
Access cant do it, numberGo cant do it for different reasons).
Although the application could do it I think this is a generic
functionality that the database is more suited for.

Please let me know if you know of a good db based way to create a
dynamic crosstab in Postgres - or why there shouldn't be one.
Thanks and regards.
SWK

#2Masse Jacques
jacques.masse@bordeaux.cemagref.fr
In reply to: SunWuKung (#1)
Re: dynamic crosstab

I found this to create dynamic crosstabs (where the resulting
columns are not known beforehand):
http://www.ledscripts.com/tech/article/view/5.html
(Thanks for Denis Bitouzé on
http://www.postgresonline.com/journal/index.php?/archives/14-C
rossTab-Queries-in-PostgreSQL-using-tablefunc-contrib.html
for pointing it out.).
This is basically dynamically generating an SQL string with CASE ...
WHEN that will create a view.
This could work although for hundreds of columns it looks a
bit scary for me.
Isn't there a more elegant way to achieve this with tablefunc
crosstab and if there isn't don't you think it could/should be there?
There is a syntax where you could specify the columns with a
SELECT DISTINCT statement - couldn't it also generate the
enumeration string eg. presuming that all returning colums
are stored as text?
Or if that is not possible instead of the enumeration part
wouldn't it be better to put a name of the view that could be
created/recreated?

I know that most db people don't care much about
pivot/crosstab in the db but imagine this situation:
I am storing questionnaire results on people. Since the
questionnaires are created by users I have no other way than
using an EAV model like personID, questionID, responseValue
to store responses. Now this table gets long 300 question per
questionnaire, 3000 people and we have 1m row. Now whenever I
need to download this data in my case 2/3rd of it would be
redundant if I could pivot it first - and in a 20MB csv its
significant (I know its a tradeoff between processing and storage).
Moreover my users can't do anything with this dataformat -
they need to pivot it offline anyway, which is not easy
(Excel cant do it, Access cant do it, numberGo cant do it for
different reasons).
Although the application could do it I think this is a
generic functionality that the database is more suited for.

Please let me know if you know of a good db based way to
create a dynamic crosstab in Postgres - or why there shouldn't be one.
Thanks and regards.
SWK

Have you tried this crosstab?

http://www.postgresql.org/docs/8.3/interactive/tablefunc.html

#3Tino Wildenhain
tino@wildenhain.de
In reply to: SunWuKung (#1)
Re: dynamic crosstab

Hi,

SunWuKung wrote:

Hi,

I found this to create dynamic crosstabs (where the resulting columns

...

This could work although for hundreds of columns it looks a bit scary
for me.

Well I'd say hundreds of columns are always scary, no matter how you do
it :-)

...

I know that most db people don't care much about pivot/crosstab in the
db but imagine this situation:
I am storing questionnaire results on people. Since the questionnaires
are created by users I have no other way than using an EAV model like
personID, questionID, responseValue to store responses. Now this table
gets long 300 question per questionnaire, 3000 people and we have 1m
row. Now whenever I need to download this data in my case 2/3rd of it
would be redundant if I could pivot it first - and in a 20MB csv its
significant (I know its a tradeoff between processing and storage).
Moreover my users can't do anything with this dataformat - they need
to pivot it offline anyway, which is not easy (Excel cant do it,
Access cant do it, numberGo cant do it for different reasons).

What about not pivoting it? You can run your analysis directly
against your database.

Although the application could do it I think this is a generic
functionality that the database is more suited for.

Well after all you want a CSV not a table. You could shortcut this
with a generic query which creates array out of your "columns"
and join them to a CSV line. This would just be outputted as
one single column from database.

Please let me know if you know of a good db based way to create a
dynamic crosstab in Postgres - or why there shouldn't be one.

See above :-)

Regards
Tino

#4Reece Hart
reece@harts.net
In reply to: Tino Wildenhain (#3)
Re: dynamic crosstab

On Wed, 2008-02-13 at 14:04 +0100, Tino Wildenhain wrote:

Well after all you want a CSV not a table. You could shortcut this
with a generic query which creates array out of your "columns"
and join them to a CSV line. This would just be outputted as
one single column from database.

Depending on your use case, this may be a better way:

In psql:
=> \copy (select col1,col2,col3 from data) TO data.csv CSV HEADER

or on the command line:
$ psql -c '\copy (select col1,col2,col3 from data) TO data.csv CSV HEADER'

Strictly speaking, the CSV formatting isn't being done in the database
but rather by psql.

-Reece

--
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0

#5SunWuKung
Balazs.Klein@t-online.hu
In reply to: Masse Jacques (#2)
Re: dynamic crosstab

Yes, thanks.
The problem with those function is that they all have an AS (columname type,
...) part or equivalent.

-----Original Message-----
From: Masse Jacques [mailto:jacques.masse@bordeaux.cemagref.fr]
Sent: Wednesday, February 13, 2008 10:20 AM
To: SunWuKung; pgsql-general@postgresql.org
Subject: RE: [GENERAL] dynamic crosstab

I found this to create dynamic crosstabs (where the resulting
columns are not known beforehand):
http://www.ledscripts.com/tech/article/view/5.html
(Thanks for Denis Bitouzé on
http://www.postgresonline.com/journal/index.php?/archives/14-C
rossTab-Queries-in-PostgreSQL-using-tablefunc-contrib.html
for pointing it out.).
This is basically dynamically generating an SQL string with CASE ...
WHEN that will create a view.
This could work although for hundreds of columns it looks a
bit scary for me.
Isn't there a more elegant way to achieve this with tablefunc
crosstab and if there isn't don't you think it could/should be there?
There is a syntax where you could specify the columns with a
SELECT DISTINCT statement - couldn't it also generate the
enumeration string eg. presuming that all returning colums
are stored as text?
Or if that is not possible instead of the enumeration part
wouldn't it be better to put a name of the view that could be
created/recreated?

I know that most db people don't care much about
pivot/crosstab in the db but imagine this situation:
I am storing questionnaire results on people. Since the
questionnaires are created by users I have no other way than
using an EAV model like personID, questionID, responseValue
to store responses. Now this table gets long 300 question per
questionnaire, 3000 people and we have 1m row. Now whenever I
need to download this data in my case 2/3rd of it would be
redundant if I could pivot it first - and in a 20MB csv its
significant (I know its a tradeoff between processing and storage).
Moreover my users can't do anything with this dataformat -
they need to pivot it offline anyway, which is not easy
(Excel cant do it, Access cant do it, numberGo cant do it for
different reasons).
Although the application could do it I think this is a
generic functionality that the database is more suited for.

Please let me know if you know of a good db based way to
create a dynamic crosstab in Postgres - or why there shouldn't be one.
Thanks and regards.
SWK

Have you tried this crosstab?

http://www.postgresql.org/docs/8.3/interactive/tablefunc.html

#6Balázs Klein
bklein@t-online.hu
In reply to: SunWuKung (#5)
Re: dynamic crosstab

Hi,
Yes I know that SPSS can do this - in fact that is the only way I could solve this so far, but that is a very expensive workaround for anybody not currently owning SPSS.

Thanks.
SWK

-----Original Message-----
From: jr [mailto:jorg.raskowski@tailorware.org.uk]
Sent: Wednesday, February 13, 2008 1:31 PM
To: SunWuKung
Subject: Re: dynamic crosstab

hi SWK

SunWuKung wrote:

I know that most db people don't care much about pivot/crosstab in the
db but imagine this situation:
I am storing questionnaire results on people. Since the questionnaires
are created by users I have no other way than using an EAV model like

are you using the right tool for this task?

Moreover my users can't do anything with this dataformat - they need
to pivot it offline anyway, which is not easy (Excel cant do it,
Access cant do it, numberGo cant do it for different reasons).

back at college we used SPSS - the Statistical Package for Social Sciences.

Please let me know if you know of a good db based way to create a
dynamic crosstab in Postgres - or why there shouldn't be one.

to be honest I don't; I think that a specialised product (such as SPSS)
will solve both problems in one stroke.

--

regards, jr. (jr@tailorware.org.uk)

#7Balázs Klein
bklein@t-online.hu
In reply to: Tino Wildenhain (#3)
Re: dynamic crosstab

Hi,
ye, hundreds of columns - but there is no helping it, that’s the way many questionnaire are and the representation of the responses (when not in a database) is always one person per row. I would need this for exporting, but also to show results online.

Although it’s a good idea I am afraid that an array could only help me when the info I store about all the persons in the query are exactly the same (there wouldn’t be empty cells in a crosstab) - it’s very useful for some cases but in general that sounds like a dangerous presumption for me.

I think this is a generic shortcoming of Postgres - whenever you are forced to create an EAV (Entity-Attribute-Value) model you have no generic or way of going back to the usual one entity per row model. This is something that Access has been able to do (up to 255 columns) as far as I can remember. When I google about this topic I find that the majority of people are still referring to that solution as the easiest for this purpose. Tablefunc crosstab is so close to a good solution for this with the syntax where you could specify the columns with a query - the only shortcoming is that you still have to enumerate the columns and their datatype. I always hope that somebody might have something similar but generic - eg. create those columns automatically and just treat them all as text.

Regards,
SWK

-----Original Message-----
From: Tino Wildenhain [mailto:tino@wildenhain.de]
Sent: Wednesday, February 13, 2008 2:05 PM
To: SunWuKung
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] dynamic crosstab

Hi,

SunWuKung wrote:

Hi,

I found this to create dynamic crosstabs (where the resulting columns

...

This could work although for hundreds of columns it looks a bit scary
for me.

Well I'd say hundreds of columns are always scary, no matter how you do
it :-)

...

I know that most db people don't care much about pivot/crosstab in the
db but imagine this situation:
I am storing questionnaire results on people. Since the questionnaires
are created by users I have no other way than using an EAV model like
personID, questionID, responseValue to store responses. Now this table
gets long 300 question per questionnaire, 3000 people and we have 1m
row. Now whenever I need to download this data in my case 2/3rd of it
would be redundant if I could pivot it first - and in a 20MB csv its
significant (I know its a tradeoff between processing and storage).
Moreover my users can't do anything with this dataformat - they need
to pivot it offline anyway, which is not easy (Excel cant do it,
Access cant do it, numberGo cant do it for different reasons).

What about not pivoting it? You can run your analysis directly
against your database.

Although the application could do it I think this is a generic
functionality that the database is more suited for.

Well after all you want a CSV not a table. You could shortcut this
with a generic query which creates array out of your "columns"
and join them to a CSV line. This would just be outputted as
one single column from database.

Please let me know if you know of a good db based way to create a
dynamic crosstab in Postgres - or why there shouldn't be one.

See above :-)

Regards
Tino

Internal Virus Database is out-of-date.
Checked by AVG Free Edition.
Version: 7.5.503 / Virus Database: 269.16.0/1137 - Release Date: 11/18/2007 5:15 PM

#8Balázs Klein
bklein@t-online.hu
In reply to: Reece Hart (#4)
Re: dynamic crosstab

Yes, once I have the select outputting it to CSV is not a problem. As you say PG handles that nicely.

Thx
SWK

-----Original Message-----
From: Reece Hart [mailto:reece@harts.net]
Sent: Wednesday, February 13, 2008 9:39 PM
To: Tino Wildenhain
Cc: SunWuKung; pgsql-general@postgresql.org
Subject: Re: [GENERAL] dynamic crosstab

On Wed, 2008-02-13 at 14:04 +0100, Tino Wildenhain wrote:

Well after all you want a CSV not a table. You could shortcut this
with a generic query which creates array out of your "columns"
and join them to a CSV line. This would just be outputted as
one single column from database.

Depending on your use case, this may be a better way:

In psql:
=> \copy (select col1,col2,col3 from data) TO data.csv CSV HEADER

or on the command line:
$ psql -c '\copy (select col1,col2,col3 from data) TO data.csv CSV HEADER'

Strictly speaking, the CSV formatting isn't being done in the database
but rather by psql.

-Reece

--
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0

#9Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Balázs Klein (#6)
Re: dynamic crosstab

Bal�zs Klein wrote:

Hi,
Yes I know that SPSS can do this - in fact that is the only way I
could solve this so far, but that is a very expensive workaround for
anybody not currently owning SPSS.

Huh, perhaps you could try with PSPP ... (I don't know if it can do it,
but I know it is supposed to be a replacement to SPSS).

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#10Erik Jones
erik@myemma.com
In reply to: Balázs Klein (#7)
Re: dynamic crosstab

On Feb 14, 2008, at 2:04 AM, Balázs Klein wrote:

Hi,
ye, hundreds of columns - but there is no helping it, that’s the
way many questionnaire are and the representation of the responses
(when not in a database) is always one person per row. I would need
this for exporting, but also to show results online.

Although it’s a good idea I am afraid that an array could only help
me when the info I store about all the persons in the query are
exactly the same (there wouldn’t be empty cells in a crosstab) -
it’s very useful for some cases but in general that sounds like a
dangerous presumption for me.

As of versions >= 8.2 you can store NULL values in arrays. Perhaps
you could have a Question -> Index table and then use an array per
person for their answers.

I think this is a generic shortcoming of Postgres - whenever you
are forced to create an EAV (Entity-Attribute-Value) model you have
no generic or way of going back to the usual one entity per row
model. This is something that Access has been able to do (up to 255
columns) as far as I can remember. When I google about this topic I
find that the majority of people are still referring to that
solution as the easiest for this purpose. Tablefunc crosstab is so
close to a good solution for this with the syntax where you could
specify the columns with a query - the only shortcoming is that you
still have to enumerate the columns and their datatype. I always
hope that somebody might have something similar but generic - eg.
create those columns automatically and just treat them all as text.

Have a look at http://www.varlena.com/varlena/GeneralBits/110.php for
a totally different approach to questionnaires.

Erik Jones

DBA | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

#11SunWuKung
Balazs.Klein@t-online.hu
In reply to: Erik Jones (#10)
Re: dynamic crosstab

Hi,
the part that I don't know is how to put those NULLs in.
It could well be doable I just can't do it myself.

How does the query look like that produces from this input:
PersonID AttributeID Value
1 1 aaa
1 2 bbb
1 3 ccc
2 1 ddd
2 3 eee

this output, without manually enumerating the attributeids:
1 (aaa,bbb,ccc)
2 (ddd,NULL,eee)

Thx.
B.

-----Original Message-----
From: Erik Jones [mailto:erik@myemma.com]
Sent: Thursday, February 14, 2008 5:15 PM
To: Balázs Klein
Cc: 'Tino Wildenhain'; 'SunWuKung'; pgsql-general@postgresql.org
Subject: Re: [GENERAL] dynamic crosstab

On Feb 14, 2008, at 2:04 AM, Balázs Klein wrote:

Hi,
ye, hundreds of columns - but there is no helping it, that’s the
way many questionnaire are and the representation of the responses
(when not in a database) is always one person per row. I would need
this for exporting, but also to show results online.

Although it’s a good idea I am afraid that an array could only help
me when the info I store about all the persons in the query are
exactly the same (there wouldn’t be empty cells in a crosstab) -
it’s very useful for some cases but in general that sounds like a
dangerous presumption for me.

As of versions >= 8.2 you can store NULL values in arrays. Perhaps
you could have a Question -> Index table and then use an array per
person for their answers.

I think this is a generic shortcoming of Postgres - whenever you
are forced to create an EAV (Entity-Attribute-Value) model you have
no generic or way of going back to the usual one entity per row
model. This is something that Access has been able to do (up to 255
columns) as far as I can remember. When I google about this topic I
find that the majority of people are still referring to that
solution as the easiest for this purpose. Tablefunc crosstab is so
close to a good solution for this with the syntax where you could
specify the columns with a query - the only shortcoming is that you
still have to enumerate the columns and their datatype. I always
hope that somebody might have something similar but generic - eg.
create those columns automatically and just treat them all as text.

Have a look at http://www.varlena.com/varlena/GeneralBits/110.php for
a totally different approach to questionnaires.

Erik Jones

DBA | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

#12Erik Jones
erik@myemma.com
In reply to: SunWuKung (#11)
Re: dynamic crosstab

On Feb 14, 2008, at 10:56 AM, Balázs Klein wrote:

-----Original Message-----
From: Erik Jones [mailto:erik@myemma.com]
Sent: Thursday, February 14, 2008 5:15 PM
To: Balázs Klein
Cc: 'Tino Wildenhain'; 'SunWuKung'; pgsql-general@postgresql.org
Subject: Re: [GENERAL] dynamic crosstab

On Feb 14, 2008, at 2:04 AM, Balázs Klein wrote:

Hi,
ye, hundreds of columns - but there is no helping it, that’s the
way many questionnaire are and the representation of the responses
(when not in a database) is always one person per row. I would need
this for exporting, but also to show results online.

Although it’s a good idea I am afraid that an array could only help
me when the info I store about all the persons in the query are
exactly the same (there wouldn’t be empty cells in a crosstab) -
it’s very useful for some cases but in general that sounds like a
dangerous presumption for me.

As of versions >= 8.2 you can store NULL values in arrays. Perhaps
you could have a Question -> Index table and then use an array per
person for their answers.

I think this is a generic shortcoming of Postgres - whenever you
are forced to create an EAV (Entity-Attribute-Value) model you have
no generic or way of going back to the usual one entity per row
model. This is something that Access has been able to do (up to 255
columns) as far as I can remember. When I google about this topic I
find that the majority of people are still referring to that
solution as the easiest for this purpose. Tablefunc crosstab is so
close to a good solution for this with the syntax where you could
specify the columns with a query - the only shortcoming is that you
still have to enumerate the columns and their datatype. I always
hope that somebody might have something similar but generic - eg.
create those columns automatically and just treat them all as text.

Have a look at http://www.varlena.com/varlena/GeneralBits/110.php for
a totally different approach to questionnaires.

Erik Jones

Hi,
the part that I don't know is how to put those NULLs in.
It could well be doable I just can't do it myself.

How does the query look like that produces from this input:
PersonID AttributeID Value
1 1 aaa
1 2 bbb
1 3 ccc
2 1 ddd
2 3 eee

this output, without manually enumerating the attributeids:
1 (aaa,bbb,ccc)
2 (ddd,NULL,eee)

Thx.
B.

My point was to get rid of the the EAV setup. Something like:

CREATE TABLE questions (
question_id serial primary key,
question text not null
);

CREATE TABLE people (
person_id serial primary key,
....
);

CREATE TABLE answers (
person_id integer references people,
answers text[]
);

where the indexes into answers are ids from questions. You don't get
any easy foreign keys for those indexes into the questions table,
which you definitely don't have with the EAV setup anyway, but with
this you don't need any kind of pivot/crosstab functionality.

Erik Jones

DBA | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

#13SunWuKung
Balazs.Klein@t-online.hu
In reply to: Erik Jones (#12)
Re: dynamic crosstab

I can't imagine how I could store data directly that way (beside the usual
thing that whenever I can I aim to store scalar value in a column).

To do what you suggest I could have this:
1 (aaa,bbb,ccc)
2 (ddd,NULL,eee)
but for this I would need to store a NULL for a person for all the questions
he/she didn't answer. Now answers may come from all sorts of questionnaires
so most people will only have responses on a subset, so this does not seem
feasible.

Or this:
1 (aaa,bbb,ccc)
2 (ddd,eee)
but this would be loosing the purpose - there is no longer a meaningful way
to compare the same info at different people.

So directly storing the info in this structure does not seem to be the way
for me. On the other hand a query may be able to generate the proper array
without the usual problem of outputting unknown number of columns.

thx
B.

-----Original Message-----
From: Erik Jones [mailto:erik@myemma.com]
Sent: Thursday, February 14, 2008 6:14 PM
To: Balázs Klein
Cc: 'Tino Wildenhain'; pgsql-general@postgresql.org
Subject: Re: [GENERAL] dynamic crosstab

On Feb 14, 2008, at 10:56 AM, Balázs Klein wrote:

-----Original Message-----
From: Erik Jones [mailto:erik@myemma.com]
Sent: Thursday, February 14, 2008 5:15 PM
To: Balázs Klein
Cc: 'Tino Wildenhain'; 'SunWuKung'; pgsql-general@postgresql.org
Subject: Re: [GENERAL] dynamic crosstab

On Feb 14, 2008, at 2:04 AM, Balázs Klein wrote:

Hi,
ye, hundreds of columns - but there is no helping it, that’s the
way many questionnaire are and the representation of the responses
(when not in a database) is always one person per row. I would need
this for exporting, but also to show results online.

Although it’s a good idea I am afraid that an array could only help
me when the info I store about all the persons in the query are
exactly the same (there wouldn’t be empty cells in a crosstab) -
it’s very useful for some cases but in general that sounds like a
dangerous presumption for me.

As of versions >= 8.2 you can store NULL values in arrays. Perhaps
you could have a Question -> Index table and then use an array per
person for their answers.

I think this is a generic shortcoming of Postgres - whenever you
are forced to create an EAV (Entity-Attribute-Value) model you have
no generic or way of going back to the usual one entity per row
model. This is something that Access has been able to do (up to 255
columns) as far as I can remember. When I google about this topic I
find that the majority of people are still referring to that
solution as the easiest for this purpose. Tablefunc crosstab is so
close to a good solution for this with the syntax where you could
specify the columns with a query - the only shortcoming is that you
still have to enumerate the columns and their datatype. I always
hope that somebody might have something similar but generic - eg.
create those columns automatically and just treat them all as text.

Have a look at http://www.varlena.com/varlena/GeneralBits/110.php for
a totally different approach to questionnaires.

Erik Jones

Hi,
the part that I don't know is how to put those NULLs in.
It could well be doable I just can't do it myself.

How does the query look like that produces from this input:
PersonID AttributeID Value
1 1 aaa
1 2 bbb
1 3 ccc
2 1 ddd
2 3 eee

this output, without manually enumerating the attributeids:
1 (aaa,bbb,ccc)
2 (ddd,NULL,eee)

Thx.
B.

My point was to get rid of the the EAV setup. Something like:

CREATE TABLE questions (
question_id serial primary key,
question text not null
);

CREATE TABLE people (
person_id serial primary key,
....
);

CREATE TABLE answers (
person_id integer references people,
answers text[]
);

where the indexes into answers are ids from questions. You don't get
any easy foreign keys for those indexes into the questions table,
which you definitely don't have with the EAV setup anyway, but with
this you don't need any kind of pivot/crosstab functionality.

Erik Jones

DBA | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

#14Erik Jones
erik@myemma.com
In reply to: SunWuKung (#13)
Re: dynamic crosstab

On Feb 14, 2008, at 3:49 PM, Klein Balazs wrote:

My point was to get rid of the the EAV setup. Something like:

CREATE TABLE questions (
question_id serial primary key,
question text not null
);

CREATE TABLE people (
person_id serial primary key,
....
);

CREATE TABLE answers (
person_id integer references people,
answers text[]
);

where the indexes into answers are ids from questions. You don't get
any easy foreign keys for those indexes into the questions table,
which you definitely don't have with the EAV setup anyway, but with
this you don't need any kind of pivot/crosstab functionality.

I can't imagine how I could store data directly that way (beside
the usual
thing that whenever I can I aim to store scalar value in a column).

To do what you suggest I could have this:
1 (aaa,bbb,ccc)
2 (ddd,NULL,eee)
but for this I would need to store a NULL for a person for all the
questions
he/she didn't answer. Now answers may come from all sorts of
questionnaires
so most people will only have responses on a subset, so this does
not seem
feasible.

Or this:
1 (aaa,bbb,ccc)
2 (ddd,eee)
but this would be loosing the purpose - there is no longer a
meaningful way
to compare the same info at different people.

So directly storing the info in this structure does not seem to be
the way
for me. On the other hand a query may be able to generate the
proper array
without the usual problem of outputting unknown number of columns.

First, please stop top-posting. It makes it difficult for both me
and others to know to whom/what you are replying.

Now on to the meat of the topic! When using arrays you do not need
to manually store NULLS -- they are implied by gaps in array
indices. Observe:

CREATE TABLE questions (
question_id INTEGER PRIMARY KEY,
question_text TEXT NOT NULL
);

CREATE TABLE people (
person_id SERIAL PRIMARY KEY,
answers TEXT[]
);

INSERT INTO questions (question_id, question_text) VALUES (2, 'Will
arrays work?'), (5, 'Can pigs fly?');

INSERT INTO people (person_id) VALUES (1), (2);

UPDATE people
SET answers[2] = 'yep!',
answers[5] = 'nope!',
answers[7] = 'this shouldn''t be here!'
where person_id=1;

UPDATE people
SET answers[5]='if only they had wings'
where person_id=2;

SELECT * FROM people;

  person_id |                           answers
--------------- 
+-------------------------------------------------------------
1                  | [2:7]={yep!,NULL,NULL,nope!,NULL,"this shouldn't  
be here!"}
2                  | [5:5]={"if only they had wings"}

See how postgres handles filling the NULLs for you? What you'd
really want to do with this would be to define some functions for
setting and getting a person's answers to a given question or set of
questions so that you could implement some kind of data integrity
with regards to question ids and indices into the answers arrays such
as in the example above you'd want to prevent an entry at index 7
when there is no entry in the questions table for question_id=7.
This whole thing is still wide open for adding extra layers such as
question groupings for separate questionnaires, etc.

Erik Jones

DBA | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

#15Joe Conway
mail@joeconway.com
In reply to: Erik Jones (#14)
Re: dynamic crosstab

Erik Jones wrote:

See how postgres handles filling the NULLs for you? What you'd really
want to do with this would be to define some functions for setting and
getting a person's answers to a given question or set of questions so
that you could implement some kind of data integrity with regards to
question ids and indices into the answers arrays such as in the example
above you'd want to prevent an entry at index 7 when there is no entry
in the questions table for question_id=7.

It occurs to me that it shouldn't be terribly difficult to make an
alternate version of crosstab() that returns an array rather than tuples
(back when crosstab() was first written, Postgres didn't support NULL
array elements). Is this worth considering for 8.4?

Joe

#16SunWuKung
Balazs.Klein@t-online.hu
In reply to: Joe Conway (#15)
Re: dynamic crosstab

Joe wrote

It occurs to me that it shouldn't be terribly difficult to make an
alternate version of crosstab() that returns an array rather than tuples
(back when crosstab() was first written, Postgres didn't support NULL
array elements). Is this worth considering for 8.4?

I think there should be a generic way in Postgres to return from an EAV model. Although I have no evidence on that I keep thinking that the db must be more effective at that than the application would be.

I was hoping that now with PG supporting plan invalidation it would be possible to return a recordset. If there is no generic way to return a recordset than being able to return an array is much better than nothing.

B.

#17SunWuKung
Balazs.Klein@t-online.hu
In reply to: Erik Jones (#14)
Re: dynamic crosstab

Erik Jones wrote:

First, please stop top-posting. It makes it difficult for both me
and others to know to whom/what you are replying.

Sorry, I don't know much about mailing list customs - I had to look up what top-posting is. I will behave now ...

I would prefer to keep the complications for when I retrieve the data rather then when I store it.

I could imagine something like this though to create a crosstab as an array, but I am afraid that there is no assurance that the resulting array would contain the values in the same order for each focus:

tbl(eID, aID, value)

Select eID, array_accum(value) from
(
(Select Distinct eID from tbl) e
CROSS JOIN
(Select Distinct aID from tbl) a
) ea
LEFT OUTER JOIN
tbl USING (eID, aID)
GROUP BY eID

B.

#18Joe Conway
mail@joeconway.com
In reply to: SunWuKung (#16)
Re: dynamic crosstab

Balázs Klein wrote:

I was hoping that now with PG supporting plan invalidation it would
be possible to return a recordset.

Plan invalidation has nothing to do with it. In Postgres a returned
recordset can be used as a row source in the FROM clause -- this
requires data type information to be known at parse time.

Joe

#19Erik Jones
erik@myemma.com
In reply to: SunWuKung (#17)
Re: dynamic crosstab

On Feb 15, 2008, at 6:29 AM, Balázs Klein wrote:

Erik Jones wrote:

First, please stop top-posting. It makes it difficult for both me
and others to know to whom/what you are replying.

Sorry, I don't know much about mailing list customs - I had to look
up what top-posting is. I will behave now ...

It's cool, now you know :)

I would prefer to keep the complications for when I retrieve the
data rather then when I store it.

Really? When do you think users notice performance hits the most?
I'd think, given that answers for a questionnaire are stored as a
batch, people running reports on will be the ones to notice, i.e. at
retrieval time.

I could imagine something like this though to create a crosstab as
an array, but I am afraid that there is no assurance that the
resulting array would contain the values in the same order for each
focus:

tbl(eID, aID, value)

Select eID, array_accum(value) from
(
(Select Distinct eID from tbl) e
CROSS JOIN
(Select Distinct aID from tbl) a
) ea
LEFT OUTER JOIN
tbl USING (eID, aID)
GROUP BY eID

That's cool. I still don't see why you're so set on an EAV, but it's
your setup. Watch out, though, big questionnaires will turn into
queries with an inordinate amount of joins and performance on those
will suck. If you just used arrays directly you could pull all of
the answers for a given person and/or questionnaire with pretty
simple query.

Erik Jones

DBA | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

#20SunWuKung
Balazs.Klein@t-online.hu
In reply to: Erik Jones (#19)
Re: dynamic crosstab

given that answers for a questionnaire are stored as a
batch

Not in our setup - for all sorts of reasons (preserving responses on a connection failure or restart, monitoring response latency in real time, creating adaptive/branching questionnaires) we send each response separately.

people running reports on will be the ones to notice, i.e. at
retrieval time.

I am not sure - different responses are aggregated into different attributes in different ways - those properties need to be retrieved during scoring/report generation, so being able to create a join directly on a response is a good thing for me. But report generation - in our case it must be a DTP quality PDF - is such a beast anyway that db times dwarf compared to pdf generation.

The problem comes when I need to present the responses themselves in a human-friendly way - as an export or display or report. Do you think there is a way to ensure that the order of the values in the array below is the same for each person?

tbl(eID, aID, value)

Select eID, array_accum(value) from
(
(Select Distinct eID from tbl) e
CROSS JOIN
(Select Distinct aID from tbl) a
) ea
LEFT OUTER JOIN
tbl USING (eID, aID)
GROUP BY eID

Thx for the help.
B.

#21Erik Jones
erik@myemma.com
In reply to: Joe Conway (#15)
#22SunWuKung
Balazs.Klein@t-online.hu
In reply to: Joe Conway (#18)
#23Erik Jones
erik@myemma.com
In reply to: SunWuKung (#20)
#24SunWuKung
Balazs.Klein@t-online.hu
In reply to: Erik Jones (#23)
#25Scott Marlowe
scott.marlowe@gmail.com
In reply to: SunWuKung (#20)
#26Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Joe Conway (#15)
#27Joe Conway
mail@joeconway.com
In reply to: Alvaro Herrera (#26)
#28David Fetter
david@fetter.org
In reply to: Alvaro Herrera (#26)
#29Webb Sprague
webb.sprague@gmail.com
In reply to: David Fetter (#28)
#30SunWuKung
Balazs.Klein@t-online.hu
In reply to: Balázs Klein (#7)
#31Pierre Chevalier
pierre.chevalier1967@free.fr
In reply to: SunWuKung (#5)
#32Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pierre Chevalier (#31)
#33Pierre Chevalier
pierre.chevalier1967@free.fr
In reply to: Pavel Stehule (#32)
#34Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pierre Chevalier (#33)
#35Pierre Chevalier
pierre.chevalier1967@free.fr
In reply to: Pavel Stehule (#34)
#36Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pierre Chevalier (#35)
#37Pierre Chevalier
pierre.chevalier1967@free.fr
In reply to: Pavel Stehule (#36)
#38Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pierre Chevalier (#37)
#39Scott Marlowe
scott.marlowe@gmail.com
In reply to: Pavel Stehule (#34)
#40Andy Colson
andy@squeakycode.net
In reply to: Pierre Chevalier (#35)
#41Andy Colson
andy@squeakycode.net
In reply to: Andy Colson (#40)
#42Joe Conway
mail@joeconway.com
In reply to: Andy Colson (#41)
#43Pierre Chevalier
pierre.chevalier1967@free.fr
In reply to: Andy Colson (#40)
#44Pierre Chevalier
pierre.chevalier1967@free.fr
In reply to: Andy Colson (#41)
#45Pierre Chevalier
pierre.chevalier1967@free.fr
In reply to: Pavel Stehule (#38)
#46Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pierre Chevalier (#45)
#47Andy Colson
andy@squeakycode.net
In reply to: Pierre Chevalier (#44)