Count of non-null values per table column

Started by David Nelsonover 10 years ago9 messagesgeneral
Jump to latest
#1David Nelson
dlnelson77808@outlook.com

Hello list,<br><br>Apologies if this has been asked before. My search only turned up ways to list the total non-null values for all columns as a single number. I want the count for each column by column.<br><br>I have inherited a database consisting of two related huge monolithic tables that lack referential integrity between them, or even basic data constraints. One of the problems these tables have is every single non-PK column is NULLable. I am trying to understand the information that is actually stored and used so that I can implement a (hopefully) better design. Towards that end I would like to know the count of non-null values in each column per column. In other words I would like to get the following output from a table (the numbers are totally made up):<br><br>column_name | num_values<br>------------+-----------<br>col1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5787<br>------------+-----------<br>col2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 17<br>------------+-----------<br>col3&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 567<br>------------+-----------<br>col4&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5787<br>------------+-----------<br>col5&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 143<br>------------+-----------<br>col6&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1<br>------------+-----------<br>...<br><br>Is this possible through one or more of the system views, or will I need to write a function to do this? Obviously I can just issue multiple SELECT COUNT(column)... statements, but I'd rather not.<br><br>Thanks,<br>David

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2David Nelson
dlnelson77808@outlook.com
In reply to: David Nelson (#1)
Re: Count of non-null values per table column

Well it is certainly nice to see that my choice to send my question using plain text was honored by this email service. Apologies for that mess. The output I am looking for is a series of rows with two columns, one the name of the table column, and the other the count of non-null values in a table's column of that same name, for all column names in the table.<br><br>Thanks<br><br>----------------------------------------<br>&gt; From: dlnelson77808@outlook.com<br>&gt; To: pgsql-general@postgresql.org<br>&gt; Subject: [GENERAL] Count of non-null values per table column<br>&gt; Date: Fri, 14 Aug 2015 14:32:36 +0000<br>&gt;<br>&gt; Hello list,&lt;br&gt;&lt;br&gt;Apologies if this has been asked before. My search only turned up ways to list the total non-null values for all columns as a single number. I want the count for each column by column.&lt;br&gt;&lt;br&gt;I have inherited a database consisting of two related huge monolithic tables that lack referential integrity between them, or even basic data constraints. One of the problems these tables have is every single non-PK column is NULLable. I am trying to understand the information that is actually stored and used so that I can implement a (hopefully) better design. Towards that end I would like to know the count of non-null values in each column per column. In other words I would like to get the following output from a table (the numbers are totally made up):&lt;br&gt;&lt;br&gt;column_name | num_values&lt;br&gt;------------+-----------&lt;br&gt;col1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 5787&lt;br&gt;------------+-----------&lt;br&gt;col2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 17&lt;br&gt;------------+-----------&lt;br&gt;col3&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 567&lt;br&gt;------------+-----------&lt;br&gt;col4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 5787&lt;br&gt;------------+-----------&lt;br&gt;col5&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 143&lt;br&gt;------------+-----------&lt;br&gt;col6&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&lt;br&gt;------------+-----------&lt;br&gt;...&lt;br&gt;&lt;br&gt;Is this possible through one or more of the system views, or will I need to write a function to do this? Obviously I can just issue multiple SELECT COUNT(column)... statements, but I'd rather not.&lt;br&gt;&lt;br&gt;Thanks,&lt;br&gt;David<br>&gt;<br>&gt; --<br>&gt; Sent via pgsql-general mailing list (pgsql-general@postgresql.org)<br>&gt; To make changes to your subscription:<br>&gt; http://www.postgresql.org/mailpref/pgsql-general&lt;br&gt;

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3David Rowley
dgrowleyml@gmail.com
In reply to: David Nelson (#1)
Re: Count of non-null values per table column

On 15 August 2015 at 02:32, David Nelson <dlnelson77808@outlook.com> wrote:

Hello list,<br><br>Apologies if this has been asked before. My search only
turned up ways to list the total non-null values for all columns as a
single number. I want the count for each column by column.<br><br>I have
inherited a database consisting of two related huge monolithic tables that
lack referential integrity between them, or even basic data constraints.
One of the problems these tables have is every single non-PK column is
NULLable. I am trying to understand the information that is actually stored
and used so that I can implement a (hopefully) better design. Towards that
end I would like to know the count of non-null values in each column per
column. In other words I would like to get the following output from a
table (the numbers are totally made up):<br><br>column_name |
num_values<br>------------+-----------<br>col1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
5787<br>------------+-----------<br>col2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
17<br>------------+-----------<br>col3&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
567<br>------------+-----------<br>col4&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
5787<br>------------+-----------<br>col5&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
143<br>------------+-----------<br>col6&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
1<br>------------+-----------<br>...<br><br>Is this possible through one or
more of the system views, or will I need to write a function to do this?
Obviously I can just issue multiple SELECT COUNT(column)... statements, but
I'd rather not.<br><br>Thanks,<br>David

I didn't quite catch a Postgres version number in that mess :)

I assume the tables are quite large if you don't want to just issue a:
SELECT COUNT(a) FILTER(WHERE a IS NULL),COUNT(b) FILTER(WHERE b IS NULL)
... (assuming you're on a version new enough to support agg FILTER)

On the other hand if you were happy with just an approximation then you
could look at pg_stats;

create table abc(a int, b int,c int);
insert into abc values(1, 1, NULL),(1, NULL, NULL),(NULL, NULL, NULL);
analyze abc;

select attname, null_frac from pg_stats where tablename = 'abc';

attname | null_frac
---------+-----------
a | 0.333333
b | 0.666667
c | 1

Keep in mind though that this is an *approximation* and possibly could be
inaccurate. If you want exact do the COUNT(col).

If you're not on a version new enough for COUNT(col) FILTER WHERE ..., you
could just SUM(CASE WHEN col IS NULL THEN 1 ELSE 0 END)

Regards

David Rowley

--
David Rowley http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/&gt;
PostgreSQL Development, 24x7 Support, Training & Services

#4John McKown
john.archie.mckown@gmail.com
In reply to: David Nelson (#2)
Re: Count of non-null values per table column

David,

It still came through as junk. But I reconstructed it below

=== original message ===
Apologies if this has been asked before. My search only turned up ways to
list the total non-null values for all columns as a single number. I want
the count for each column by column.

I have inherited a database consisting of two related huge monolithic
tables that lack referential integrity between them, or even basic data
constraints. One of the problems these tables have is every single non-PK
column is NULLable. I am trying to understand the information that is
actually stored and used so that I can implement a (hopefully) better
design. Towards that end I would like to know the count of non-null values
in each column per column. In other words I would like to get the following
output from a table (the numbers are totally made up):

column_name | num_values
------------+-----------
col1 | 5787
------------+-----------
col2 | 17
------------+-----------
col3 | 567
------------+-----------
col4 | 5787
------------+-----------
col5 | 143
------------+-----------
col6 | 1
------------+-----------

====

​I assume what "num_values" contains is the _distinctly different_ number
of values in "column_name" for each and every column name in a table.​ E.g.
if "col1" contains value 'x' twice and 'y' ten times,then "num_values"
would be 2, not 12. Or do you really want the 12? I'm unsure.

--

Schrodinger's backup: The condition of any backup is unknown until a
restore is attempted.

Yoda of Borg, we are. Futile, resistance is, yes. Assimilated, you will be.

He's about as useful as a wax frying pan.

10 to the 12th power microphones = 1 Megaphone

Maranatha! <><
John McKown

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#3)
Re: Count of non-null values per table column

David Rowley <david.rowley@2ndquadrant.com> writes:

On 15 August 2015 at 02:32, David Nelson <dlnelson77808@outlook.com> wrote:

Hello list,<br><br>Apologies if this has been asked before. My search only
turned up ways to list the total non-null values for all columns as a
single number. I want the count for each column by column.

I assume the tables are quite large if you don't want to just issue a:
SELECT COUNT(a) FILTER(WHERE a IS NULL),COUNT(b) FILTER(WHERE b IS NULL)
... (assuming you're on a version new enough to support agg FILTER)

AFAIK this should work in any version, or indeed any SQL-compliant DBMS:

select count(col1), count(col2), ... from table;

COUNT with an argument counts the non-null values of that argument.

On the other hand if you were happy with just an approximation then you
could look at pg_stats;

Yeah; you might want to ANALYZE the table first to be sure the stats are
up to date.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6David Nelson
dnelson77808@gmail.com
In reply to: John McKown (#4)
Re: Count of non-null values per table column

On Fri, Aug 14, 2015 at 9:59 AM, John McKown <john.archie.mckown@gmail.com>
wrote:

David,

It still came through as junk. But I reconstructed it below

=== original message ===
Apologies if this has been asked before. My search only turned up ways to

list the total non-null values for all columns as a single number. I want
the count for each column by column.

I have inherited a database consisting of two related huge monolithic

tables that lack referential integrity between them, or even basic data
constraints. One of the problems these tables have is every single non-PK
column is NULLable. I am trying to understand the information that is
actually stored and used so that I can implement a (hopefully) better
design. Towards that end I would like to know the count of non-null values
in each column per column. In other words I would like to get the following
output from a table (the numbers are totally made up):

column_name | num_values
------------+-----------
col1 | 5787
------------+-----------
col2 | 17
------------+-----------
col3 | 567
------------+-----------
col4 | 5787
------------+-----------
col5 | 143
------------+-----------
col6 | 1
------------+-----------

====

I assume what "num_values" contains is the _distinctly different_ number

of values in "column_name" for each and every column name in a table. E.g.
if "col1" contains value 'x' twice and 'y' ten times,then "num_values"
would be 2, not 12. Or do you really want the 12? I'm unsure.

Thanks John for fixing that which microsoft screwed up, and I've switched
to my gmail account for this.

For my purposes 12 would be fine. I just want to know of the total number of
rows in the table, how many have a value in each column. I guess I'm
actually trying to get the complement of the number of NULLs per column in
the end.

BTW, aplologies for omitting basic info:
SELECT version();
version

----------------------------------------------------------------------------------------------------------------
PostgreSQL 9.1.18 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7
20120313 (Red Hat 4.4.7-11), 64-bit

--

Schrodinger's backup: The condition of any backup is unknown until a

restore is attempted.

Yoda of Borg, we are. Futile, resistance is, yes. Assimilated, you will

be.

Show quoted text

He's about as useful as a wax frying pan.

10 to the 12th power microphones = 1 Megaphone

Maranatha! <><
John McKown

#7David Nelson
dnelson77808@gmail.com
In reply to: Tom Lane (#5)
Re: Count of non-null values per table column

On Fri, Aug 14, 2015 at 10:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Rowley <david.rowley@2ndquadrant.com> writes:

On 15 August 2015 at 02:32, David Nelson <dlnelson77808@outlook.com>

wrote:

Hello list,<br><br>Apologies if this has been asked before. My search

only

turned up ways to list the total non-null values for all columns as a
single number. I want the count for each column by column.

I assume the tables are quite large if you don't want to just issue a:
SELECT COUNT(a) FILTER(WHERE a IS NULL),COUNT(b) FILTER(WHERE b IS NULL)
... (assuming you're on a version new enough to support agg FILTER)

AFAIK this should work in any version, or indeed any SQL-compliant DBMS:

select count(col1), count(col2), ... from table;

Thanks Tom (I've switched to my gmail account for this conversation). Tthat
is the way I would do it for a table with a small number of columns, but
these have several dozen so this would get tedious. Although I just
realized I could output the list of column names from the pg_stat view to a
file and whip up some vi find and replace to create the entire statement
pretty quickly. I was just wondering if that was the only way or not.

BTW, aplologies for omitting basic info:
SELECT version();
version

----------------------------------------------------------------------------------------------------------------
PostgreSQL 9.1.18 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7
20120313 (Red Hat 4.4.7-11), 64-bit

Show quoted text

COUNT with an argument counts the non-null values of that argument.

On the other hand if you were happy with just an approximation then you
could look at pg_stats;

Yeah; you might want to ANALYZE the table first to be sure the stats are
up to date.

regards, tom lane

#8Ken Tanzer
ken.tanzer@gmail.com
In reply to: David Nelson (#7)
Re: Count of non-null values per table column

On Fri, Aug 14, 2015 at 6:35 PM, David Nelson <dnelson77808@gmail.com>
wrote:

On Fri, Aug 14, 2015 at 10:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Rowley <david.rowley@2ndquadrant.com> writes:

Tthat is the way I would do it for a table with a small number of columns,
but these have several dozen so this would get tedious. Although I just
realized I could output the list of column names from the pg_stat view to a
file and whip up some vi find and replace to create the entire statement
pretty quickly. I was just wondering if that was the only way or not.

You could use this to generate the SQL:

\set my_table my_real_table_name

SELECT 'SELECT COUNT(*) AS total_rows, '||array_to_string(array(SELECT
'COUNT('||column_name::text ||') AS ' || column_name::text FROM
information_schema.columns WHERE table_name=:'my_table'),E',\n') || ' FROM
' || :'my_table' || ';';

Cheers,
Ken
--
AGENCY Software
A Free Software data system
By and for non-profits
*http://agency-software.org/ <http://agency-software.org/&gt;*
*https://agency-software.org/demo/client
<https://agency-software.org/demo/client&gt;*
ken.tanzer@agency-software.org
(253) 245-3801

Subscribe to the mailing list
<agency-general-request@lists.sourceforge.net?body=subscribe> to
learn more about AGENCY or
follow the discussion.

#9David Nelson
dnelson77808@gmail.com
In reply to: Ken Tanzer (#8)
Re: Count of non-null values per table column

On Fri, Aug 14, 2015 at 9:17 PM, Ken Tanzer <ken.tanzer@gmail.com> wrote:

On Fri, Aug 14, 2015 at 6:35 PM, David Nelson <dnelson77808@gmail.com>
wrote:

On Fri, Aug 14, 2015 at 10:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Rowley <david.rowley@2ndquadrant.com> writes:

Tthat is the way I would do it for a table with a small number of
columns, but these have several dozen so this would get tedious. Although I
just realized I could output the list of column names from the pg_stat view
to a file and whip up some vi find and replace to create the entire
statement pretty quickly. I was just wondering if that was the only way or
not.

You could use this to generate the SQL:

\set my_table my_real_table_name

SELECT 'SELECT COUNT(*) AS total_rows, '||array_to_string(array(SELECT
'COUNT('||column_name::text ||') AS ' || column_name::text FROM
information_schema.columns WHERE table_name=:'my_table'),E',\n') || ' FROM
' || :'my_table' || ';';

Oh yeah, SQL to generate SQL. Done it many times to create ETL scripts.
Don't know why I didn't think of that here. Definitely a viable option.

Thanks Ken.

Show quoted text

Cheers,
Ken
--
AGENCY Software
A Free Software data system
By and for non-profits
*http://agency-software.org/ <http://agency-software.org/&gt;*
*https://agency-software.org/demo/client
<https://agency-software.org/demo/client&gt;*
ken.tanzer@agency-software.org
(253) 245-3801

Subscribe to the mailing list
<agency-general-request@lists.sourceforge.net?body=subscribe> to
learn more about AGENCY or
follow the discussion.