CROSSTAB( .. only one column has values... )

Started by Adam Tauno Williamsover 5 years ago9 messagesgeneral

awilliam@whitemice.org

over 5 years ago

I'm using the crosstab feature and do not understand why I am only
getting values in the first column.

The query:
SELECT
date_trunc('month', t2.value_date) AS invoice_date,
t1.value_string AS invoice_type
COUNT(*)
FROM document d
LEFT OUTER JOIN obj_property t1
ON (t1.obj_id = d.document_id
AND t1.namespace_prefix = 'http://www.example.com/ctabs'
AND t1.value_key = 'invoiceType')
LEFT OUTER JOIN obj_property t2
ON (t2.obj_id = d.document_id
AND t2.namespace_prefix = 'http://www.example.com/ctabs'
AND t2.value_key = 'invoiceDate')
WHERE ((d.project_id = 1134510213) AND (d.is_folder = 0))
GROUP BY 1,2

- has results like -

invoice_date invoice_type count
------------ ------------ -----
2013-02 service 3454
2013-03 service 3512
2013-03 parts 5366
2013-04 parts 5657
2013-04 service 4612
2013-05 service 4946
2013-05 parts 5508
...

So I put this in as a crosstab:

SELECT * FROM crosstab(
$$
SELECT
SUBSTR(t2.value_string, 1, 7) AS invoice_date,
t1.value_string AS invoice_type,
COUNT(*)
FROM document d
LEFT OUTER JOIN obj_property t1
ON (t1.obj_id = d.document_id
AND t1.namespace_prefix = 'http://www.example.com/ctabs'
AND t1.value_key = 'invoiceType')
LEFT OUTER JOIN obj_property t2
ON (t2.obj_id = d.document_id
AND t2.namespace_prefix = 'http://www.example.com/ctabs'
AND t2.value_key = 'invoiceDate')
WHERE ((d.project_id = 1134510213) AND (d.is_folder = 0))
GROUP BY 1,2
$$) AS final_result (invoice_date VARCHAR, "parts" BIGINT, "rental"
BIGINT, "sales" BIGINT, "service" BIGINT);

- and I get the results of -

invoice_date parts rental sales service
------------ ----- ------ ------ -------
2001-09 1 (null) (null) (null)
2007-07 1 (null) (null) (null)
2013-02 5353 (null) (null) (null)
2013-02 3454 (null) (null) (null)
2013-03 3512 (null) (null) (null)
2013-03 5366 (null) (null) (null)
...

Only the first column has values, all the rest are NULL.
I assume I am missing something stupid.

--
Adam Tauno Williams <mailto:awilliam@whitemice.org> GPG D95ED383
Systems Administrator, Python Developer, LPI / NCLA

David G. Johnston

david.g.johnston@gmail.com

over 5 years ago

In reply to: Adam Tauno Williams (#1)

Re: CROSSTAB( .. only one column has values... )

On Tue, Jan 5, 2021 at 8:46 AM Adam Tauno Williams <awilliam@whitemice.org>
wrote:

Only the first column has values, all the rest are NULL.
I assume I am missing something stupid.

I think you are assigning the function more intelligence/effort than it
puts out.

From the documentation:
"""
The crosstab function produces one output row for each consecutive group of
input rows with the same row_name value. It fills the output value columns,
left to right, with the value fields from these rows. If there are fewer
rows in a group than there are output value columns, the extra output
columns are filled with nulls; if there are more rows, the extra input rows
are skipped.

In practice the SQL query should always specify ORDER BY 1,2 to ensure that
the input rows are properly ordered, that is, values with the same row_name
are brought together and correctly ordered within the row. Notice that
crosstab itself does not pay any attention to the second column of the
query result; it's just there to be ordered by, to control the order in
which the third-column values appear across the page.
"""

The fact you don't have an order by, and that there are not an equal number
of records per date, suggests to me that you are expecting the function to
fill in the blanks when the documentation says it doesn't do that.

David J.

Tim Clarke

tim.clarke@minerva.info

over 5 years ago

In reply to: David G. Johnston (#2)

Re: CROSSTAB( .. only one column has values... )

On 05/01/2021 16:12, David G. Johnston wrote:
On Tue, Jan 5, 2021 at 8:46 AM Adam Tauno Williams <awilliam@whitemice.org<mailto:awilliam@whitemice.org>> wrote:
Only the first column has values, all the rest are NULL.
I assume I am missing something stupid.

I think you are assigning the function more intelligence/effort than it puts out.

From the documentation:
"""
The crosstab function produces one output row for each consecutive group of input rows with the same row_name value. It fills the output value columns, left to right, with the value fields from these rows. If there are fewer rows in a group than there are output value columns, the extra output columns are filled with nulls; if there are more rows, the extra input rows are skipped.

In practice the SQL query should always specify ORDER BY 1,2 to ensure that the input rows are properly ordered, that is, values with the same row_name are brought together and correctly ordered within the row. Notice that crosstab itself does not pay any attention to the second column of the query result; it's just there to be ordered by, to control the order in which the third-column values appear across the page.
"""

The fact you don't have an order by, and that there are not an equal number of records per date, suggests to me that you are expecting the function to fill in the blanks when the documentation says it doesn't do that.

David J.

imho the crosstab() function isn't a good implementation. The biggest failure it has is that you must know exactly how many output columns you will have in the result prior to running it

Tim Clarke MBCS
IT Director
Direct: +44 (0)1376 504510 | Mobile: +44 (0)7887 563420

Telephone: Witham: +44(0)1376 503500 | London: +44 (0)20 3009 0853 | Frankfurt: +49 (0)69 7191 6000 | Hong Kong: +852 5803 1687 | Toronto: +1 647 503 2848
Web: https://www.manifest.co.uk/

________________________________

Copyright: This e-mail may contain confidential or legally privileged information. If you are not the named addressee you must not use or disclose such information, instead please report it to admin@minerva.info<mailto:admin@minerva.info>
Legal: Minerva Analytics is the trading name of: Minerva Analytics Ltd: Registered in England Number 11260966 & The Manifest Voting Agency Ltd: Registered in England Number 2920820 Registered Office at above address. Please Click Here https://www.manifest.co.uk/legal/ for further information.

Thomas Kellerer

shammat@gmx.net

over 5 years ago

In reply to: Adam Tauno Williams (#1)

Re: CROSSTAB( .. only one column has values... )

Adam Tauno Williams schrieb am 05.01.2021 um 16:46:

I'm using the crosstab feature and do not understand why I am only
getting values in the first column.

The query:
SELECT
date_trunc('month', t2.value_date) AS invoice_date,
t1.value_string AS invoice_type
COUNT(*)
FROM document d
LEFT OUTER JOIN obj_property t1
ON (t1.obj_id = d.document_id
AND t1.namespace_prefix = 'http://www.example.com/ctabs'
AND t1.value_key = 'invoiceType')
LEFT OUTER JOIN obj_property t2
ON (t2.obj_id = d.document_id
AND t2.namespace_prefix = 'http://www.example.com/ctabs'
AND t2.value_key = 'invoiceDate')
WHERE ((d.project_id = 1134510213) AND (d.is_folder = 0))
GROUP BY 1,2

- has results like -

invoice_date invoice_type count
------------ ------------ -----
2013-02 service 3454
2013-03 service 3512
2013-03 parts 5366
2013-04 parts 5657
2013-04 service 4612
2013-05 service 4946
2013-05 parts 5508
...

So I put this in as a crosstab:

I find using filtered aggregation to be way easier and more flexible than using crosstab():

SELECT date_trunc('month', t2.value_date) AS invoice_date,
count(*) filter (where value_string = 'rental') as rental,
count(*) filter (where value_string = 'sales') as sales,
count(*) filter (where value_string = 'service') as service
FROM document d
LEFT OUTER JOIN obj_property t1
ON (t1.obj_id = d.document_id
AND t1.namespace_prefix = 'http://www.example.com/ctabs'
AND t1.value_key = 'invoiceType')
LEFT OUTER JOIN obj_property t2
ON (t2.obj_id = d.document_id
AND t2.namespace_prefix = 'http://www.example.com/ctabs'
AND t2.value_key = 'invoiceDate')
WHERE ((d.project_id = 1134510213) AND (d.is_folder = 0))
GROUP BY 1

Thomas

Joe Conway

mail@joeconway.com

over 5 years ago

In reply to: Tim Clarke (#3)

Re: CROSSTAB( .. only one column has values... )

On 1/5/21 12:11 PM, Tim Clarke wrote:

imho the crosstab() function isn't a good implementation. The biggest failure it
has is that you must know exactly how many output columns you will have in the
result /_prior to running it

*That* is a function of how Postgres set returning functions work, and not
specific to crosstab(). It is not easily fixed. Patches to fix that would be
welcomed!

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

Joe Conway

mail@joeconway.com

over 5 years ago

In reply to: Adam Tauno Williams (#1)

Re: CROSSTAB( .. only one column has values... )

On 1/5/21 10:46 AM, Adam Tauno Williams wrote:

I'm using the crosstab feature and do not understand why I am only
getting values in the first column.

<snip>

So I put this in as a crosstab:

SELECT * FROM crosstab(
$$

<snip>

$$) AS final_result (invoice_date VARCHAR, "parts" BIGINT, "rental"
BIGINT, "sales" BIGINT, "service" BIGINT);

- and I get the results of -

invoice_date parts rental sales service
------------ ----- ------ ------ -------
2001-09 1 (null) (null) (null)
2007-07 1 (null) (null) (null)
2013-02 5353 (null) (null) (null)
2013-02 3454 (null) (null) (null)
2013-03 3512 (null) (null) (null)
2013-03 5366 (null) (null) (null)
...

Only the first column has values, all the rest are NULL.
I assume I am missing something stupid.

Seems you are using the wrong form of the crosstab() function. See

https://www.postgresql.org/docs/current/tablefunc.html#id-1.11.7.47.5.7.2

"The main limitation of the single-parameter form of crosstab is that it treats
all values in a group alike, inserting each value into the first available
column. If you want the value columns to correspond to specific categories of
data, and some groups might not have data for some of the categories, that
doesn't work well. The two-parameter form of crosstab handles this case by
providing an explicit list of the categories corresponding to the output columns."

HTH,

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

Pavel Stehule

pavel.stehule@gmail.com

over 5 years ago

In reply to: Joe Conway (#5)

Re: CROSSTAB( .. only one column has values... )

út 5. 1. 2021 v 19:45 odesílatel Joe Conway <mail@joeconway.com> napsal:

On 1/5/21 12:11 PM, Tim Clarke wrote:

imho the crosstab() function isn't a good implementation. The biggest

failure it

has is that you must know exactly how many output columns you will have

in the

result /_prior to running it

*That* is a function of how Postgres set returning functions work, and not
specific to crosstab(). It is not easily fixed. Patches to fix that would
be
welcomed!

/messages/by-id/CAFj8pRC+hNzpH+0bPRCnqNncUCGjEvpwX+0nbhb1F7gwjYZZNg@mail.gmail.com

Regards

Pavel

Show quoted text

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

Daniel Verite

daniel@manitou-mail.org

over 5 years ago

In reply to: Pavel Stehule (#7)

Re: CROSSTAB( .. only one column has values... )

(resent to the list, previous post was rejected)

Pavel Stehule wrote:

*That* is a function of how Postgres set returning functions work, and not
specific to crosstab(). It is not easily fixed. Patches to fix that would
be
welcomed!

/messages/by-id/CAFj8pRC+hNzpH+0bPRCnqNncUCGjEvpwX+0nbhb1F7gwjYZZNg@mail.gmail.com

Oracle has Polymorphic Table Function but still it doesn't work
seamlessly for dynamic pivots.
This blog post illustrates why:
https://blog.sqlora.com/en/dynamic-pivot-with-polymorphic-table-function/
The "related posts" at the end are also interesting about this feature.

Even before considering more dynamic SRFs in postgres, having a
variable number of columns for a DML statement is problematic
in general.

When an SQL statement is parsed/prepared, it's not supposed to
change its output structure, unless a DDL statement intervenes.
The dynamic pivot must break this promise, since a simple INSERT or
UPDATE or DELETE in the pivoting rows may cause the number of
output columns to change across invocations of the same statement.
That would mean that PREPARE would be basically unusable or
unreliable for such statements.

I think the query parser is also not supposed to read data outside
of the catalogs to determine the output structure of a query.
This promise would also need to be broken by dynamic pivots
implemented as a single-pass DML query.

On the other hand, dynamic pivots can be done by generating the SQL
dynamically and getting at the results in a second step, or returning a
resultset embedded in a scalar (json).
Speaking of features that don't exist but might someday, ISTM that
CALL crosstab_dynamic(...) was more plausible than
SELECT * FROM crosstab_dynamic(...), since CALL
doesn't have the same contingencies as SELECT.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite

Pavel Stehule

pavel.stehule@gmail.com

over 5 years ago

In reply to: Pavel Stehule (#7)

Re: CROSSTAB( .. only one column has values... )

st 6. 1. 2021 v 21:47 odesílatel Daniel Verite <daniel@manitou-mail.org>
napsal:

Pavel Stehule wrote:

*That* is a function of how Postgres set returning functions work, and

not

specific to crosstab(). It is not easily fixed. Patches to fix that

would

be
welcomed!

/messages/by-id/CAFj8pRC+hNzpH+0bPRCnqNncUCGjEvpwX+0nbhb1F7gwjYZZNg@mail.gmail.com

Oracle has Polymorphic Table Function but still it doesn't work
seamlessly for dynamic pivots.
This blog post illustrates why:
https://blog.sqlora.com/en/dynamic-pivot-with-polymorphic-table-function/
The "related posts" at the end are also interesting about this feature.

Even before considering more dynamic SRFs in postgres, having a
variable number of columns for a DML statement is problematic
in general.

When an SQL statement is parsed/prepared, it's not supposed to
change its output structure, unless a DDL statement intervenes.
The dynamic pivot must break this promise, since a simple INSERT or
UPDATE or DELETE in the pivoting rows may cause the number of
output columns to change across invocations of the same statement.
That would mean that PREPARE would be basically unusable or
unreliable for such statements.

yes

polymorphic functions need two steps. First step returns structure, second
data.

The prepared statements can be supported, but there should be a recheck if
the result has expected structure. And maybe in future, the prepared
statements can be more dynamic, and can be able to do replaning when it
will be necessary.

I think the query parser is also not supposed to read data outside
of the catalogs to determine the output structure of a query.
This promise would also need to be broken by dynamic pivots
implemented as a single-pass DML query.

On the other hand, dynamic pivots can be done by generating the SQL
dynamically and getting at the results in a second step, or returning a
resultset embedded in a scalar (json).
Speaking of features that don't exist but might someday, ISTM that
CALL crosstab_dynamic(...) was more plausible than
SELECT * FROM crosstab_dynamic(...), since CALL
doesn't have the same contingencies as SELECT.

Yes, it is the Sybase way and it can be useful. But you cannot work with
the returned result more.

For users it isn't too important if they have to use polymorphic functions
or dynamic recordset or a PIVOT clause. Important is performance and
ergometry. Polymorphic functions can be used for more tasks than pivoting
(reading some external sources, ...). Dynamic recordsets or multi
recordsets can be nice features too. I like multi recordsets for reporting.
And Oracle's PIVOT clause is just handy and doesn't require programming.

Regards

Pavel

Show quoted text

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite

Import Notes

Reply to msg id not found: b815f878-3c73-4030-9c5f-d2d72d5b1b58@manitou-mail.org