CROSSTAB( .. only one column has values... )
I'm using the crosstab feature and do not understand why I am only
getting values in the first column.
The query:
SELECT
date_trunc('month', t2.value_date) AS invoice_date,
t1.value_string AS invoice_type
COUNT(*)
FROM document d
LEFT OUTER JOIN obj_property t1
ON (t1.obj_id = d.document_id
AND t1.namespace_prefix = 'http://www.example.com/ctabs'
AND t1.value_key = 'invoiceType')
LEFT OUTER JOIN obj_property t2
ON (t2.obj_id = d.document_id
AND t2.namespace_prefix = 'http://www.example.com/ctabs'
AND t2.value_key = 'invoiceDate')
WHERE ((d.project_id = 1134510213) AND (d.is_folder = 0))
GROUP BY 1,2
- has results like -
invoice_date invoice_type count
------------ ------------ -----
2013-02 service 3454
2013-03 service 3512
2013-03 parts 5366
2013-04 parts 5657
2013-04 service 4612
2013-05 service 4946
2013-05 parts 5508
...
So I put this in as a crosstab:
SELECT * FROM crosstab(
$$
SELECT
SUBSTR(t2.value_string, 1, 7) AS invoice_date,
t1.value_string AS invoice_type,
COUNT(*)
FROM document d
LEFT OUTER JOIN obj_property t1
ON (t1.obj_id = d.document_id
AND t1.namespace_prefix = 'http://www.example.com/ctabs'
AND t1.value_key = 'invoiceType')
LEFT OUTER JOIN obj_property t2
ON (t2.obj_id = d.document_id
AND t2.namespace_prefix = 'http://www.example.com/ctabs'
AND t2.value_key = 'invoiceDate')
WHERE ((d.project_id = 1134510213) AND (d.is_folder = 0))
GROUP BY 1,2
$$) AS final_result (invoice_date VARCHAR, "parts" BIGINT, "rental"
BIGINT, "sales" BIGINT, "service" BIGINT);
- and I get the results of -
invoice_date parts rental sales service
------------ ----- ------ ------ -------
2001-09 1 (null) (null) (null)
2007-07 1 (null) (null) (null)
2013-02 5353 (null) (null) (null)
2013-02 3454 (null) (null) (null)
2013-03 3512 (null) (null) (null)
2013-03 5366 (null) (null) (null)
...
Only the first column has values, all the rest are NULL.
I assume I am missing something stupid.
--
Adam Tauno Williams <mailto:awilliam@whitemice.org> GPG D95ED383
Systems Administrator, Python Developer, LPI / NCLA
On Tue, Jan 5, 2021 at 8:46 AM Adam Tauno Williams <awilliam@whitemice.org>
wrote:
Only the first column has values, all the rest are NULL.
I assume I am missing something stupid.
I think you are assigning the function more intelligence/effort than it
puts out.
From the documentation:
"""
The crosstab function produces one output row for each consecutive group of
input rows with the same row_name value. It fills the output value columns,
left to right, with the value fields from these rows. If there are fewer
rows in a group than there are output value columns, the extra output
columns are filled with nulls; if there are more rows, the extra input rows
are skipped.
In practice the SQL query should always specify ORDER BY 1,2 to ensure that
the input rows are properly ordered, that is, values with the same row_name
are brought together and correctly ordered within the row. Notice that
crosstab itself does not pay any attention to the second column of the
query result; it's just there to be ordered by, to control the order in
which the third-column values appear across the page.
"""
The fact you don't have an order by, and that there are not an equal number
of records per date, suggests to me that you are expecting the function to
fill in the blanks when the documentation says it doesn't do that.
David J.
On 05/01/2021 16:12, David G. Johnston wrote:
On Tue, Jan 5, 2021 at 8:46 AM Adam Tauno Williams <awilliam@whitemice.org<mailto:awilliam@whitemice.org>> wrote:
Only the first column has values, all the rest are NULL.
I assume I am missing something stupid.
I think you are assigning the function more intelligence/effort than it puts out.
From the documentation:
"""
The crosstab function produces one output row for each consecutive group of input rows with the same row_name value. It fills the output value columns, left to right, with the value fields from these rows. If there are fewer rows in a group than there are output value columns, the extra output columns are filled with nulls; if there are more rows, the extra input rows are skipped.
In practice the SQL query should always specify ORDER BY 1,2 to ensure that the input rows are properly ordered, that is, values with the same row_name are brought together and correctly ordered within the row. Notice that crosstab itself does not pay any attention to the second column of the query result; it's just there to be ordered by, to control the order in which the third-column values appear across the page.
"""
The fact you don't have an order by, and that there are not an equal number of records per date, suggests to me that you are expecting the function to fill in the blanks when the documentation says it doesn't do that.
David J.
+1
imho the crosstab() function isn't a good implementation. The biggest failure it has is that you must know exactly how many output columns you will have in the result prior to running it
Tim Clarke MBCS
IT Director
Direct: +44 (0)1376 504510 | Mobile: +44 (0)7887 563420
Telephone: Witham: +44(0)1376 503500 | London: +44 (0)20 3009 0853 | Frankfurt: +49 (0)69 7191 6000 | Hong Kong: +852 5803 1687 | Toronto: +1 647 503 2848
Web: https://www.manifest.co.uk/
Minerva Analytics Ltd - A Solactive Company
9 Freebournes Court | Newland Street | Witham | Essex | CM8 2BL | United Kingdom
________________________________
Copyright: This e-mail may contain confidential or legally privileged information. If you are not the named addressee you must not use or disclose such information, instead please report it to admin@minerva.info<mailto:admin@minerva.info>
Legal: Minerva Analytics is the trading name of: Minerva Analytics Ltd: Registered in England Number 11260966 & The Manifest Voting Agency Ltd: Registered in England Number 2920820 Registered Office at above address. Please Click Here https://www.manifest.co.uk/legal/ for further information.
Adam Tauno Williams schrieb am 05.01.2021 um 16:46:
I'm using the crosstab feature and do not understand why I am only
getting values in the first column.The query:
SELECT
date_trunc('month', t2.value_date) AS invoice_date,
t1.value_string AS invoice_type
COUNT(*)
FROM document d
LEFT OUTER JOIN obj_property t1
ON (t1.obj_id = d.document_id
AND t1.namespace_prefix = 'http://www.example.com/ctabs'
AND t1.value_key = 'invoiceType')
LEFT OUTER JOIN obj_property t2
ON (t2.obj_id = d.document_id
AND t2.namespace_prefix = 'http://www.example.com/ctabs'
AND t2.value_key = 'invoiceDate')
WHERE ((d.project_id = 1134510213) AND (d.is_folder = 0))
GROUP BY 1,2- has results like -
invoice_date invoice_type count
------------ ------------ -----
2013-02 service 3454
2013-03 service 3512
2013-03 parts 5366
2013-04 parts 5657
2013-04 service 4612
2013-05 service 4946
2013-05 parts 5508
...So I put this in as a crosstab:
I find using filtered aggregation to be way easier and more flexible than using crosstab():
SELECT date_trunc('month', t2.value_date) AS invoice_date,
count(*) filter (where value_string = 'rental') as rental,
count(*) filter (where value_string = 'sales') as sales,
count(*) filter (where value_string = 'service') as service
FROM document d
LEFT OUTER JOIN obj_property t1
ON (t1.obj_id = d.document_id
AND t1.namespace_prefix = 'http://www.example.com/ctabs'
AND t1.value_key = 'invoiceType')
LEFT OUTER JOIN obj_property t2
ON (t2.obj_id = d.document_id
AND t2.namespace_prefix = 'http://www.example.com/ctabs'
AND t2.value_key = 'invoiceDate')
WHERE ((d.project_id = 1134510213) AND (d.is_folder = 0))
GROUP BY 1
Thomas
On 1/5/21 12:11 PM, Tim Clarke wrote:
imho the crosstab() function isn't a good implementation. The biggest failure it
has is that you must know exactly how many output columns you will have in the
result /_prior to running it
*That* is a function of how Postgres set returning functions work, and not
specific to crosstab(). It is not easily fixed. Patches to fix that would be
welcomed!
Joe
--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development
On 1/5/21 10:46 AM, Adam Tauno Williams wrote:
I'm using the crosstab feature and do not understand why I am only
getting values in the first column.
<snip>
So I put this in as a crosstab:
SELECT * FROM crosstab(
$$
<snip>
$$) AS final_result (invoice_date VARCHAR, "parts" BIGINT, "rental"
BIGINT, "sales" BIGINT, "service" BIGINT);- and I get the results of -
invoice_date parts rental sales service
------------ ----- ------ ------ -------
2001-09 1 (null) (null) (null)
2007-07 1 (null) (null) (null)
2013-02 5353 (null) (null) (null)
2013-02 3454 (null) (null) (null)
2013-03 3512 (null) (null) (null)
2013-03 5366 (null) (null) (null)
...Only the first column has values, all the rest are NULL.
I assume I am missing something stupid.
Seems you are using the wrong form of the crosstab() function. See
https://www.postgresql.org/docs/current/tablefunc.html#id-1.11.7.47.5.7.2
"The main limitation of the single-parameter form of crosstab is that it treats
all values in a group alike, inserting each value into the first available
column. If you want the value columns to correspond to specific categories of
data, and some groups might not have data for some of the categories, that
doesn't work well. The two-parameter form of crosstab handles this case by
providing an explicit list of the categories corresponding to the output columns."
HTH,
Joe
--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development
Hi
út 5. 1. 2021 v 19:45 odesílatel Joe Conway <mail@joeconway.com> napsal:
On 1/5/21 12:11 PM, Tim Clarke wrote:
imho the crosstab() function isn't a good implementation. The biggest
failure it
has is that you must know exactly how many output columns you will have
in the
result /_prior to running it
*That* is a function of how Postgres set returning functions work, and not
specific to crosstab(). It is not easily fixed. Patches to fix that would
be
welcomed!
/messages/by-id/CAFj8pRC+hNzpH+0bPRCnqNncUCGjEvpwX+0nbhb1F7gwjYZZNg@mail.gmail.com
Regards
Pavel
Show quoted text
Joe
--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development
(resent to the list, previous post was rejected)
Pavel Stehule wrote:
*That* is a function of how Postgres set returning functions work, and not
specific to crosstab(). It is not easily fixed. Patches to fix that would
be
welcomed!/messages/by-id/CAFj8pRC+hNzpH+0bPRCnqNncUCGjEvpwX+0nbhb1F7gwjYZZNg@mail.gmail.com
Oracle has Polymorphic Table Function but still it doesn't work
seamlessly for dynamic pivots.
This blog post illustrates why:
https://blog.sqlora.com/en/dynamic-pivot-with-polymorphic-table-function/
The "related posts" at the end are also interesting about this feature.
Even before considering more dynamic SRFs in postgres, having a
variable number of columns for a DML statement is problematic
in general.
When an SQL statement is parsed/prepared, it's not supposed to
change its output structure, unless a DDL statement intervenes.
The dynamic pivot must break this promise, since a simple INSERT or
UPDATE or DELETE in the pivoting rows may cause the number of
output columns to change across invocations of the same statement.
That would mean that PREPARE would be basically unusable or
unreliable for such statements.
I think the query parser is also not supposed to read data outside
of the catalogs to determine the output structure of a query.
This promise would also need to be broken by dynamic pivots
implemented as a single-pass DML query.
On the other hand, dynamic pivots can be done by generating the SQL
dynamically and getting at the results in a second step, or returning a
resultset embedded in a scalar (json).
Speaking of features that don't exist but might someday, ISTM that
CALL crosstab_dynamic(...) was more plausible than
SELECT * FROM crosstab_dynamic(...), since CALL
doesn't have the same contingencies as SELECT.
Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite
Hi
st 6. 1. 2021 v 21:47 odesílatel Daniel Verite <daniel@manitou-mail.org>
napsal:
Pavel Stehule wrote:
*That* is a function of how Postgres set returning functions work, and
not
specific to crosstab(). It is not easily fixed. Patches to fix that
would
be
welcomed!/messages/by-id/CAFj8pRC+hNzpH+0bPRCnqNncUCGjEvpwX+0nbhb1F7gwjYZZNg@mail.gmail.com
Oracle has Polymorphic Table Function but still it doesn't work
seamlessly for dynamic pivots.
This blog post illustrates why:
https://blog.sqlora.com/en/dynamic-pivot-with-polymorphic-table-function/
The "related posts" at the end are also interesting about this feature.Even before considering more dynamic SRFs in postgres, having a
variable number of columns for a DML statement is problematic
in general.When an SQL statement is parsed/prepared, it's not supposed to
change its output structure, unless a DDL statement intervenes.
The dynamic pivot must break this promise, since a simple INSERT or
UPDATE or DELETE in the pivoting rows may cause the number of
output columns to change across invocations of the same statement.
That would mean that PREPARE would be basically unusable or
unreliable for such statements.
yes
polymorphic functions need two steps. First step returns structure, second
data.
The prepared statements can be supported, but there should be a recheck if
the result has expected structure. And maybe in future, the prepared
statements can be more dynamic, and can be able to do replaning when it
will be necessary.
I think the query parser is also not supposed to read data outside
of the catalogs to determine the output structure of a query.
This promise would also need to be broken by dynamic pivots
implemented as a single-pass DML query.On the other hand, dynamic pivots can be done by generating the SQL
dynamically and getting at the results in a second step, or returning a
resultset embedded in a scalar (json).
Speaking of features that don't exist but might someday, ISTM that
CALL crosstab_dynamic(...) was more plausible than
SELECT * FROM crosstab_dynamic(...), since CALL
doesn't have the same contingencies as SELECT.
Yes, it is the Sybase way and it can be useful. But you cannot work with
the returned result more.
For users it isn't too important if they have to use polymorphic functions
or dynamic recordset or a PIVOT clause. Important is performance and
ergometry. Polymorphic functions can be used for more tasks than pivoting
(reading some external sources, ...). Dynamic recordsets or multi
recordsets can be nice features too. I like multi recordsets for reporting.
And Oracle's PIVOT clause is just handy and doesn't require programming.
Regards
Pavel
Show quoted text
Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite
Import Notes
Reply to msg id not found: b815f878-3c73-4030-9c5f-d2d72d5b1b58@manitou-mail.org