array_agg and/or =ANY doesn't appear to be functioning as I expect

Started by Rhys A.D. Stewartabout 8 years ago3 messagesgeneral
Jump to latest
#1Rhys A.D. Stewart
rhys.stewart@gmail.com

Greetings All,
I'm having an issue which is very perplexing. The having clause in a
query doesn't appear to be working as I expect it. Either that or my
understanding of array_agg() is flawed.

I'm using PostgreSQL 10.1, compiled by Visual C++ build 1800, 64-bit.
The table in question looks like:

CREATE TABLE confounded.dataset
(
seq integer,
path_seq integer,
start_vid bigint,
end_vid bigint,
node bigint,
edge bigint,
cost double precision,
agg_cost double precision,
st_length double precision,
truth boolean
);

The following query returns 3 rows:

with listing as (
select start_vid, end_vid, array_agg(node order by path_seq)
node_array, array_agg(edge order by path_seq) edge_array
from confounded.dataset
group by start_vid,end_vid
having true =ALL (array_agg(truth))
)

select * from listing;

The issue is, if I unnest the node_array column from the listing cte
and do a subselect on confounded.dataset I get back rows where truth =
false.

with listing as (
select start_vid, end_vid, array_agg(node order by path_seq)
node_array, array_agg(edge order by path_seq) edge_array
from confounded.dataset
group by start_vid,end_vid
having true =ALL (array_agg(truth))
)
select count(*) from confounded.dataset
where node in (select distinct unnest(node_array) from listing) and
truth = false;

I would expect the above query to return 0 rows.

the dataset can be found at
https://gist.github.com/rhysallister/59239c76e8ec265b81777038bf272879

Rhys
Peace & Love|Live Long & Prosper

#2Hannes Erven
hannes@erven.at
In reply to: Rhys A.D. Stewart (#1)
Re: array_agg and/or =ANY doesn't appear to be functioning as I expect

Hi Rhys,

Am 2018-01-21 um 02:42 schrieb Rhys A.D. Stewart:

Greetings All,
I'm having an issue which is very perplexing. The having clause in a
query doesn't appear to be working as I expect it. Either that or my
understanding of array_agg() is flawed.

[...]

with listing as (
select start_vid, end_vid, array_agg(node order by path_seq)
node_array, array_agg(edge order by path_seq) edge_array
from confounded.dataset
group by start_vid,end_vid
having true =ALL (array_agg(truth))
)
select count(*) from confounded.dataset
where node in (select distinct unnest(node_array) from listing) and
truth = false;

I would expect the above query to return 0 rows.

the answer is in your data: "node" is not a UNIQUE field, and there are
node values with multiple rows.
e.g. node=977 has one row with truth=true and one with truth=false.

So what your second query really does is "select all node values from
listing for which another entry with truth=false exists in the dataset".

Presuming that "seq" is a primary key [although not declared], you
probably meant to restrict your query on that.

Best regards,

-hannes

#3Rhys A.D. Stewart
rhys.stewart@gmail.com
In reply to: Hannes Erven (#2)
Re: array_agg and/or =ANY doesn't appear to be functioning as I expect

Hannes,

Thanks for your observations...... Will take a look at the data.

Regards,

Rhys

On Jan 20, 2018 11:00 PM, "Hannes Erven" <hannes@erven.at> wrote:

Hi Rhys,

Am 2018-01-21 um 02:42 schrieb Rhys A.D. Stewart:

Greetings All,
I'm having an issue which is very perplexing. The having clause in a
query doesn't appear to be working as I expect it. Either that or my
understanding of array_agg() is flawed.

[...]

with listing as (
select start_vid, end_vid, array_agg(node order by path_seq)
node_array, array_agg(edge order by path_seq) edge_array
from confounded.dataset
group by start_vid,end_vid
having true =ALL (array_agg(truth))
)
select count(*) from confounded.dataset
where node in (select distinct unnest(node_array) from listing) and
truth = false;

I would expect the above query to return 0 rows.

the answer is in your data: "node" is not a UNIQUE field, and there are
node values with multiple rows.
e.g. node=977 has one row with truth=true and one with truth=false.

So what your second query really does is "select all node values from
listing for which another entry with truth=false exists in the dataset".

Presuming that "seq" is a primary key [although not declared], you probably
meant to restrict your query on that.

Best regards,

-hannes