How to eliminate extra "NOT EXISTS"-query here?

Started by Andreas Joseph Kroghover 2 years ago6 messagesgeneral

Andreas Joseph Krogh

andreas@visena.com

over 2 years ago

Hi, I'm testing if some dataset contains an array of elements and want to
return all “not containing the specified array”, including entries in master
table not being referenced.

I have the following schema:

drop table if exists stuff;
drop table if exists test;
CREATE TABLE test(
id varchar primary key
);

create table stuff(
id serial primary key,
test_id varchar NOT NULL REFERENCES test(id),
v varchar not null,
unique (test_id, v)
);

INSERT INTO test(id) values ('a');
INSERT INTO test(id) values ('b');
INSERT INTO test(id) values ('c');
INSERT INTO test(id) values ('d');

INSERT INTO stuff(test_id, v)
values ('a', 'x')
;

INSERT INTO stuff(test_id, v)
values ('b', 'x')
, ('b', 'y')
;

INSERT INTO stuff(test_id, v)
values ('c', 'x')
, ('c', 'y')
, ('c', 'z')
;

select * from test t
WHERE NOT ARRAY['x']::varchar[] <@ (select array_agg(s.v) from stuff s WHERE s.
test_id= t.id)
;

select * from test t
WHERE NOT ARRAY['x', 'y']::varchar[] <@ (select array_agg(s.v) from stuff s
WHERE s.test_id = t.id)
;

select * from test t
WHERE NOT ARRAY['x', 'y', 'z']::varchar[] <@ (select array_agg(s.v) from stuff
s WHERE s.test_id = t.id)
;

select * from test t
WHERE NOT ARRAY ['x', 'y', 'z', 't']::varchar[] <@ (select array_agg(s.v) from
stuffs WHERE s.test_id = t.id)
;

-- This works, but I'd rather not do the extra EXISTS
select * from test t
WHERE (NOT ARRAY ['x', 'y', 'z', 't']::varchar[] <@ (select array_agg(s.v) from
stuffs WHERE s.test_id = t.id)
OR NOT EXISTS (
select * from stuff s where s.test_id = t.id
)
)
;

So, I want to return all entries in test not having any of ARRAY ['x', 'y',
'z', 't'] referenced in the table stuff, and I'd like to have test.id="d"
returned as well, but in order to do that I need to execute the “or not
exists”-query. Is it possible to avoid that?

--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andreas@visena.com <mailto:andreas@visena.com>
www.visena.com <https://www.visena.com>
<https://www.visena.com>

tgl@sss.pgh.pa.us

over 2 years ago

In reply to: Andreas Joseph Krogh (#1)

Re: How to eliminate extra "NOT EXISTS"-query here?

Andreas Joseph Krogh <andreas@visena.com> writes:

-- This works, but I'd rather not do the extra EXISTS
select * from test t
WHERE (NOT ARRAY ['x', 'y', 'z', 't']::varchar[] <@ (select array_agg(s.v) from
stuffs WHERE s.test_id = t.id)
OR NOT EXISTS (
select * from stuff s where s.test_id = t.id
)
)
;

So, I want to return all entries in test not having any of ARRAY ['x', 'y',
'z', 't'] referenced in the table stuff, and I'd like to have test.id="d"
returned as well, but in order to do that I need to execute the “or not
exists”-query. Is it possible to avoid that?

Probably not directly, but perhaps you could improve the performance of
this query by converting the sub-selects into a left join:

select * from test t
left join
(select s.test_id, array_agg(s.v) as arr from stuffs group by s.test_id) ss
on ss.test_id = t.id
WHERE (NOT ARRAY ['x', 'y', 'z', 't']::varchar[] <@ ss.arr)
OR ss.test_id IS NULL;

Another possibility is

...
WHERE (ARRAY ['x', 'y', 'z', 't']::varchar[] <@ ss.arr) IS NOT TRUE

but I don't think that's more readable really, and it will save little.

In either case, this would result in computing array_agg once for
each group of test_id values in "stuffs", while your original computes
a similar aggregate for each row in "test". So whether this is better
depends on the relative sizes of the tables, although my proposal
avoids random access to "stuffs" so it will have some advantage.

regards, tom lane

hector.vass@gmail.com

over 2 years ago

In reply to: Tom Lane (#2)

Re: How to eliminate extra "NOT EXISTS"-query here?

Not sure you need to use array why not simple table joins, so a table with
your criteria x y z t joined to stuff to give you candidates that do match,
then left join with coalesce to add the 'd'

select

--a.id,b.test_id,

coalesce(a.id,b.test_id) as finalresult

from test a

left join (

select

test_id

from stuff a

inner join (values ('x'),('y'),('z'),('t')) b (v) using(v)

group by 1

)b on(a.id=b.test_id);

Regards
Hector Vass

On Sat, Nov 25, 2023 at 4:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Andreas Joseph Krogh <andreas@visena.com> writes:

-- This works, but I'd rather not do the extra EXISTS
select * from test t
WHERE (NOT ARRAY ['x', 'y', 'z', 't']::varchar[] <@ (select

array_agg(s.v) from

stuffs WHERE s.test_id = t.id)
OR NOT EXISTS (
select * from stuff s where s.test_id = t.id
)
)
;

So, I want to return all entries in test not having any of ARRAY ['x',

'y',

'z', 't'] referenced in the table stuff, and I'd like to have test.id="d"

returned as well, but in order to do that I need to execute the “or not
exists”-query. Is it possible to avoid that?

Probably not directly, but perhaps you could improve the performance of
this query by converting the sub-selects into a left join:

select * from test t
left join
(select s.test_id, array_agg(s.v) as arr from stuffs group by
s.test_id) ss
on ss.test_id = t.id
WHERE (NOT ARRAY ['x', 'y', 'z', 't']::varchar[] <@ ss.arr)
OR ss.test_id IS NULL;

Another possibility is

...
WHERE (ARRAY ['x', 'y', 'z', 't']::varchar[] <@ ss.arr) IS NOT TRUE

but I don't think that's more readable really, and it will save little.

In either case, this would result in computing array_agg once for
each group of test_id values in "stuffs", while your original computes
a similar aggregate for each row in "test". So whether this is better
depends on the relative sizes of the tables, although my proposal
avoids random access to "stuffs" so it will have some advantage.

regards, tom lane

Andreas Joseph Krogh

andreas@visena.com

over 2 years ago

In reply to: Tom Lane (#2)

Re: How to eliminate extra "NOT EXISTS"-query here?

På lørdag 25. november 2023 kl. 17:08:28, skrev Tom Lane <tgl@sss.pgh.pa.us
<mailto:tgl@sss.pgh.pa.us>>:
Andreas Joseph Krogh <andreas@visena.com> writes:

-- This works, but I'd rather not do the extra EXISTS
select * from test t
WHERE (NOT ARRAY ['x', 'y', 'z', 't']::varchar[] <@ (select array_agg(s.v)

from

stuffs WHERE s.test_id = t.id)
OR NOT EXISTS (
select * from stuff s where s.test_id = t.id
)
)
;

So, I want to return all entries in test not having any of ARRAY ['x', 'y',
'z', 't'] referenced in the table stuff, and I'd like to have test.id="d"
returned as well, but in order to do that I need to execute the “or not
exists”-query. Is it possible to avoid that?

Probably not directly, but perhaps you could improve the performance of
this query by converting the sub-selects into a left join:

select * from test t
left join
(select s.test_id, array_agg(s.v) as arr from stuffs group by s.test_id) ss
on ss.test_id = t.id
WHERE (NOT ARRAY ['x', 'y', 'z', 't']::varchar[] <@ ss.arr)
OR ss.test_id IS NULL;

Another possibility is

...
WHERE (ARRAY ['x', 'y', 'z', 't']::varchar[] <@ ss.arr) IS NOT TRUE

but I don't think that's more readable really, and it will save little.

In either case, this would result in computing array_agg once for
each group of test_id values in "stuffs", while your original computes
a similar aggregate for each row in "test". So whether this is better
depends on the relative sizes of the tables, although my proposal
avoids random access to "stuffs" so it will have some advantage.

regards, tom lane
Excellent, thanks!

--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andreas@visena.com <mailto:andreas@visena.com>
www.visena.com <https://www.visena.com>
<https://www.visena.com>

Dominique Devienne

ddevienne@gmail.com

over 2 years ago

In reply to: hector vass (#3)

Re: How to eliminate extra "NOT EXISTS"-query here?

On Sat, Nov 25, 2023 at 5:53 PM hector vass <hector.vass@gmail.com> wrote:

Not sure you need to use array why not simple table joins, so a table with
your criteria x y z t joined to stuff to give you candidates that do match,
then left join with coalesce to add the 'd'

select

--a.id,b.test_id,

coalesce(a.id,b.test_id) as finalresult

from test a

left join (

select

test_id

from stuff a

inner join (values ('x'),('y'),('z'),('t')) b (v) using(v)

group by 1

)b on(a.id=b.test_id);

Hi Hector. Hopefully this is not a stupid question...

How is that equivalent from the `NOT ARRAY ... <@ ...` though?
The inner-join-distinct above will return test_id's on any match, but you
can't know if all array values are matches. Which is different from

Is the first array contained by the second

from the <@ operator, no?
I'm unfamiliar with these operators, so am I missing something?
Just trying to understand the logic here. Thanks, --DD

hector.vass@gmail.com

over 2 years ago

In reply to: Dominique Devienne (#5)

Re: How to eliminate extra "NOT EXISTS"-query here?

Not equivalent to the use of NOT ARRAY and entirely possible I have
misunderstood the requirement ...do you have some more test cases the non
array solution does not work for

Regards
Hector Vass
07773 352559

On Mon, Nov 27, 2023 at 9:29 AM Dominique Devienne <ddevienne@gmail.com>
wrote:

Show quoted text

On Sat, Nov 25, 2023 at 5:53 PM hector vass <hector.vass@gmail.com> wrote:

Not sure you need to use array why not simple table joins, so a table
with your criteria x y z t joined to stuff to give you candidates that do
match, then left join with coalesce to add the 'd'

select

--a.id,b.test_id,

coalesce(a.id,b.test_id) as finalresult

from test a

left join (

select

test_id

from stuff a

inner join (values ('x'),('y'),('z'),('t')) b (v) using(v)

group by 1

)b on(a.id=b.test_id);

Hi Hector. Hopefully this is not a stupid question...

How is that equivalent from the `NOT ARRAY ... <@ ...` though?
The inner-join-distinct above will return test_id's on any match, but you
can't know if all array values are matches. Which is different from

Is the first array contained by the second

from the <@ operator, no?
I'm unfamiliar with these operators, so am I missing something?
Just trying to understand the logic here. Thanks, --DD