Array intersection

Started by Josh Trutwinover 18 years ago19 messagesgeneral
Jump to latest
#1Josh Trutwin
josh@trutwins.homeip.net

Hi,

Is it possible to find the intersection of two array values?

a = '{1,2,3}'
b = '{2,3,4}'

a intersect b = '{2,3}'

Assume I need to write a pl/pgsql function to do this.

Josh

#2Josh Trutwin
josh@trutwins.homeip.net
In reply to: Josh Trutwin (#1)
Re: Array intersection

On Wed, 17 Oct 2007 10:19:43 -0500
Josh Trutwin <josh@trutwins.homeip.net> wrote:

Hi,

Is it possible to find the intersection of two array values?

a = '{1,2,3}'
b = '{2,3,4}'

a intersect b = '{2,3}'

Assume I need to write a pl/pgsql function to do this.

nm - I just wrote a function - though curious if this is the most
effecient way:

CREATE OR REPLACE FUNCTION array_has_intersect (array1 INTEGER[],
array2 INTEGER[]) RETURNS BOOLEAN
AS $$
BEGIN
IF array1 IS NULL OR array2 IS NULL THEN
RETURN FALSE;
END IF;

FOR i IN ARRAY_LOWER(array1,1) .. ARRAY_UPPER(array1,1) LOOP
FOR j IN ARRAY_LOWER(array2,1) .. ARRAY_UPPER(array2,1) LOOP
IF (array1[i] = array2[j]) THEN
RETURN TRUE;
END IF;
END LOOP;
END LOOP;

RETURN FALSE;
END;
$$ LANGUAGE PLPGSQL;

psql=> select array_has_intersect('{1,2,3}', '{1,3,4}');
array_has_intersect
---------------------
t

psql=> select array_has_intersect('{1,2,3}', '{21,23,24}');
array_has_intersect
---------------------
f

It doesn't return the actual intersection, but could easily be
modified to do so.

Josh

#3Sam Mason
sam@samason.me.uk
In reply to: Josh Trutwin (#2)
Re: Array intersection

On Wed, Oct 17, 2007 at 10:37:23AM -0500, Josh Trutwin wrote:

On Wed, 17 Oct 2007 10:19:43 -0500
Josh Trutwin <josh@trutwins.homeip.net> wrote:

Hi,

Is it possible to find the intersection of two array values?

a = '{1,2,3}'
b = '{2,3,4}'

a intersect b = '{2,3}'

Assume I need to write a pl/pgsql function to do this.

nm - I just wrote a function - though curious if this is the most
effecient way:

CREATE OR REPLACE FUNCTION array_has_intersect (array1 INTEGER[],
array2 INTEGER[]) RETURNS BOOLEAN
AS $$
BEGIN
IF array1 IS NULL OR array2 IS NULL THEN
RETURN FALSE;
END IF;

FOR i IN ARRAY_LOWER(array1,1) .. ARRAY_UPPER(array1,1) LOOP
FOR j IN ARRAY_LOWER(array2,1) .. ARRAY_UPPER(array2,1) LOOP
IF (array1[i] = array2[j]) THEN
RETURN TRUE;
END IF;
END LOOP;
END LOOP;

RETURN FALSE;
END;
$$ LANGUAGE PLPGSQL;

This is only going to work for one-dimensional arrays (I'm not sure how
you would ever fix that with the support postgres has for arrays) but
the (computational) complexity of having an embedded FOR loops looks bad
for performance. As you can already use '=ANY' syntax to search inside
an array, you may as well use that---it's probably a bit more faster
than the plpgsql work-alike. Leading to the following implementation of
intersect:

CREATE OR REPLACE FUNCTION array_intersect (array1 INTEGER[],array2 INTEGER[]) RETURNS INTEGER[]
AS $$
DECLARE
out INTEGER[];
BEGIN
IF array1 IS NULL OR array2 IS NULL THEN
RETURN '[]';
END IF;
FOR i IN ARRAY_LOWER(array1,1) .. ARRAY_UPPER(array1,1) LOOP
IF (array1[i] =ANY (array2)) THEN
out := array_append(out,array1[i]);
END IF;
END LOOP;
RETURN out;
END;
$$ LANGUAGE PLPGSQL;

It seems to work for me, but as a side effect will leave the array sorted
in the same order as the first parameter and with any duplicates it has.
Even more annoyingly if there is no intersection it will return NULL
instead of an empty array, how do I fix this?

Sam

p.s. plpgsql is badly in need of parametric polymorphism, the silly
anyarray/anyelement it's got at the moment means it's impossible to
declare an array of the same type as the type of its parameters.

#4Rodrigo De León
rdeleonp@gmail.com
In reply to: Josh Trutwin (#2)
Re: Array intersection

On 10/17/07, Josh Trutwin <josh@trutwins.homeip.net> wrote:

nm - I just wrote a function - though curious if this is the most
effecient way:

If you only want TRUE or FALSE, you can use '&&':

t=# SELECT '{1,2}'::INT[] && '{2,3}'::INT[];
?column?
----------
t
(1 row)

#5Josh Trutwin
josh@trutwins.homeip.net
In reply to: Rodrigo De León (#4)
Re: Array intersection

On Wed, 17 Oct 2007 11:12:27 -0500
"Rodrigo De León" <rdeleonp@gmail.com> wrote:

On 10/17/07, Josh Trutwin <josh@trutwins.homeip.net> wrote:

nm - I just wrote a function - though curious if this is the most
effecient way:

If you only want TRUE or FALSE, you can use '&&':

t=# SELECT '{1,2}'::INT[] && '{2,3}'::INT[];
?column?
----------
t
(1 row)

Is that 8.2 or above? I'm on 8.1.10:

psql=> SELECT '{1,2}'::INT[] && '{2,3}'::INT[];
ERROR: operator does not exist: integer[] && integer[]
HINT: No operator matches the given name and argument type(s). You
may need to add explicit type casts.

Josh

#6Josh Trutwin
josh@trutwins.homeip.net
In reply to: Sam Mason (#3)
Re: Array intersection

This is only going to work for one-dimensional arrays (I'm not sure
how you would ever fix that with the support postgres has for
arrays) but the (computational) complexity of having an embedded
FOR loops looks bad for performance. As you can already use '=ANY'
syntax to search inside an array, you may as well use that---it's
probably a bit more faster than the plpgsql work-alike. Leading to
the following implementation of intersect:

Thanks for the pointers.

It seems to work for me, but as a side effect will leave the array
sorted in the same order as the first parameter and with any
duplicates it has. Even more annoyingly if there is no intersection
it will return NULL instead of an empty array, how do I fix this?

It's inelegant, but I just did this:

CREATE OR REPLACE FUNCTION array_intersect (array1 INTEGER[],array2
INTEGER[]) RETURNS INTEGER[]
AS $$
DECLARE
out INTEGER[];
return_empty BOOLEAN := TRUE;
BEGIN
IF array1 IS NULL OR array2 IS NULL THEN
RETURN '[]';
END IF;
FOR i IN ARRAY_LOWER(array1,1) .. ARRAY_UPPER(array1,1) LOOP
IF (array1[i] =ANY (array2)) THEN
out := array_append(out,array1[i]);
return_empty := FALSE;
END IF;
END LOOP;

IF return_empty THEN
RETURN '{}';
END IF;

RETURN out;
END;
$$ LANGUAGE PLPGSQL;

psql=> select array_intersect('{1,2,3}', '{6,7,8}');
array_intersect
-----------------
{}
(1 row)

Josh

#7Josh Trutwin
josh@trutwins.homeip.net
In reply to: Sam Mason (#3)
Re: Array intersection

On Wed, 17 Oct 2007 17:08:06 +0100
Sam Mason <sam@samason.me.uk> wrote:

CREATE OR REPLACE FUNCTION array_intersect (array1
INTEGER[],array2 INTEGER[]) RETURNS INTEGER[] AS $$
DECLARE
out INTEGER[];
BEGIN
IF array1 IS NULL OR array2 IS NULL THEN
RETURN '[]';
END IF;
FOR i IN ARRAY_LOWER(array1,1) .. ARRAY_UPPER(array1,1) LOOP
IF (array1[i] =ANY (array2)) THEN
out := array_append(out,array1[i]);
END IF;
END LOOP;
RETURN out;
END;
$$ LANGUAGE PLPGSQL;

Is the =ANY specific to PG 8.2 or higher? On 8.1.10:

psql=> select array_intersect('{1,2,3}', '{1,2,6,7,8}');
array_intersect
-----------------

(1 row)

Also, I think the first return needs to be:

RETURN '{}';

instead of:

RETURN '[]';

Josh

#8Merlin Moncure
mmoncure@gmail.com
In reply to: Josh Trutwin (#5)
Re: Array intersection

On 10/17/07, Josh Trutwin <josh@trutwins.homeip.net> wrote:

On Wed, 17 Oct 2007 11:12:27 -0500
"Rodrigo De León" <rdeleonp@gmail.com> wrote:

On 10/17/07, Josh Trutwin <josh@trutwins.homeip.net> wrote:

nm - I just wrote a function - though curious if this is the most
effecient way:

If you only want TRUE or FALSE, you can use '&&':

t=# SELECT '{1,2}'::INT[] && '{2,3}'::INT[];
?column?
----------
t
(1 row)

Is that 8.2 or above? I'm on 8.1.10:

introduced in 8.2

merlin

#9Josh Trutwin
josh@trutwins.homeip.net
In reply to: Merlin Moncure (#8)
Re: Array intersection

On Wed, 17 Oct 2007 12:33:13 -0400
"Merlin Moncure" <mmoncure@gmail.com> wrote:

On 10/17/07, Josh Trutwin <josh@trutwins.homeip.net> wrote:

On Wed, 17 Oct 2007 11:12:27 -0500
"Rodrigo De León" <rdeleonp@gmail.com> wrote:

On 10/17/07, Josh Trutwin <josh@trutwins.homeip.net> wrote:

nm - I just wrote a function - though curious if this is the
most effecient way:

If you only want TRUE or FALSE, you can use '&&':

t=# SELECT '{1,2}'::INT[] && '{2,3}'::INT[];
?column?
----------
t
(1 row)

Is that 8.2 or above? I'm on 8.1.10:

introduced in 8.2

http://www.postgresql.org/docs/8.2/static/functions-array.html

Thanks - yet another reason to upgrade - waiting till 8.3 tho.

Josh

#10Sam Mason
sam@samason.me.uk
In reply to: Josh Trutwin (#6)
Re: Array intersection

On Wed, Oct 17, 2007 at 11:28:31AM -0500, Josh Trutwin wrote:

It's inelegant, but I just did this:

IF return_empty THEN
RETURN '{}';
END IF;

humm, why didn't that seem to work for me... ah well. Next version
fixes a problem that I didn't test of the inputs being NULL. '[]' isn't
semantically correct, let alone the correct syntax. I've also fixed the
problem with items in the first array appearing twice. Try:

CREATE OR REPLACE FUNCTION array_intersect (array1 INTEGER[],array2 INTEGER[]) RETURNS INTEGER[]
AS $$
DECLARE
out INTEGER[];
BEGIN
out := '{}'::INTEGER[];
IF array1 IS NULL OR array2 IS NULL THEN
RETURN NULL;
END IF;
FOR i IN array_lower(array1,1) .. array_upper(array1,1) LOOP
IF (array1[i] = ANY (array2)) AND NOT array1[i] = ANY (out) THEN
out := array_append(out,array1[i]);
END IF;
END LOOP;
RETURN out;
END;
$$ LANGUAGE PLPGSQL;

Sam

#11Sam Mason
sam@samason.me.uk
In reply to: Josh Trutwin (#7)
Re: Array intersection

On Wed, Oct 17, 2007 at 11:31:51AM -0500, Josh Trutwin wrote:

Is the =ANY specific to PG 8.2 or higher? On 8.1.10:

It appears (according to [1]http://www.postgresql.org/docs/8.1/static/functions-comparisons.html#AEN13394 and [2]http://www.postgresql.org/docs/8.2/static/functions-comparisons.html#AEN14115) that you may be able to just
remove the '=' to get it working with 8.1.x. i.e.

v ANY ('{1,2}')

is correct in 8.1.x but in 8.2.x it's

v = ANY ('{1,2}')

I don't have an 8.1 box so I can't test unfortunatly.

Sam

[1]: http://www.postgresql.org/docs/8.1/static/functions-comparisons.html#AEN13394
[2]: http://www.postgresql.org/docs/8.2/static/functions-comparisons.html#AEN14115

#12David Fetter
david@fetter.org
In reply to: Josh Trutwin (#1)
Re: Array intersection

On Wed, Oct 17, 2007 at 10:19:43AM -0500, Josh Trutwin wrote:

Hi,

Is it possible to find the intersection of two array values?

a = '{1,2,3}'
b = '{2,3,4}'

a intersect b = '{2,3}'

Assume I need to write a pl/pgsql function to do this.

You can use an SQL function, which has the advantage of always being
available :)

It's up to you to ensure that the input arrays have the same type.

Cheers,
David.

CREATE OR REPLACE FUNCTION array_intersect(ANYARRAY, ANYARRAY)
RETURNS ANYARRAY
LANGUAGE SQL
AS $$
SELECT ARRAY(
SELECT $1[i] AS "the_intersection"
FROM generate_series(
array_lower($1,1),
array_upper($1,1)
) AS i
INTERSECT
SELECT $2[j] AS "the_intersection"
FROM generate_series(
array_lower($2,1),
array_upper($2,1)
) AS j
);
$$;
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#13Josh Trutwin
josh@trutwins.homeip.net
In reply to: Sam Mason (#10)
Re: Array intersection

On Wed, 17 Oct 2007 17:42:21 +0100
Sam Mason <sam@samason.me.uk> wrote:

<snip>

CREATE OR REPLACE FUNCTION array_intersect (array1
INTEGER[],array2 INTEGER[]) RETURNS INTEGER[] AS $$
DECLARE
out INTEGER[];
BEGIN
out := '{}'::INTEGER[];
IF array1 IS NULL OR array2 IS NULL THEN
RETURN NULL;
END IF;
FOR i IN array_lower(array1,1) .. array_upper(array1,1) LOOP
IF (array1[i] = ANY (array2)) AND NOT array1[i] = ANY
(out) THEN out := array_append(out,array1[i]);
END IF;
END LOOP;
RETURN out;
END;
$$ LANGUAGE PLPGSQL;

Works like a champ on 8.1.

Thanks!

Did you see David Fetter's reply to the original post? He has
an interesting alternative.

Josh

#14Josh Trutwin
josh@trutwins.homeip.net
In reply to: Sam Mason (#11)
Re: Array intersection

On Wed, 17 Oct 2007 17:49:01 +0100
Sam Mason <sam@samason.me.uk> wrote:

On Wed, Oct 17, 2007 at 11:31:51AM -0500, Josh Trutwin wrote:

Is the =ANY specific to PG 8.2 or higher? On 8.1.10:

It appears (according to [1] and [2]) that you may be able to just
remove the '=' to get it working with 8.1.x. i.e.

v ANY ('{1,2}')

is correct in 8.1.x but in 8.2.x it's

v = ANY ('{1,2}')

I don't have an 8.1 box so I can't test unfortunatly.

= ANY appears to work in 8.1 - at least your function from the other
reply worked just fine on 8.1 and I have used = ANY in some other
queries.

Thanks,

Josh

#15Josh Trutwin
josh@trutwins.homeip.net
In reply to: David Fetter (#12)
Re: Array intersection

On Wed, 17 Oct 2007 10:04:21 -0700
David Fetter <david@fetter.org> wrote:

<snip>

CREATE OR REPLACE FUNCTION array_intersect(ANYARRAY, ANYARRAY)
RETURNS ANYARRAY
LANGUAGE SQL
AS $$
SELECT ARRAY(
SELECT $1[i] AS "the_intersection"
FROM generate_series(
array_lower($1,1),
array_upper($1,1)
) AS i
INTERSECT
SELECT $2[j] AS "the_intersection"
FROM generate_series(
array_lower($2,1),
array_upper($2,1)
) AS j
);
$$;

Doesn't appear to work on 8.1:

psql=> select array_intersect('{1,2,3}', '{2,3,4}');
ERROR: could not determine anyarray/anyelement type because input
has type "unknown"

Hopefully upgrading to 8.3 soon so I'll have another look then.

Thanks,

Josh

#16David Fetter
david@fetter.org
In reply to: Josh Trutwin (#15)
Re: Array intersection

On Wed, Oct 17, 2007 at 01:06:35PM -0500, Josh Trutwin wrote:

On Wed, 17 Oct 2007 10:04:21 -0700
David Fetter <david@fetter.org> wrote:

<snip>

CREATE OR REPLACE FUNCTION array_intersect(ANYARRAY, ANYARRAY)
RETURNS ANYARRAY
LANGUAGE SQL
AS $$
SELECT ARRAY(
SELECT $1[i] AS "the_intersection"
FROM generate_series(
array_lower($1,1),
array_upper($1,1)
) AS i
INTERSECT
SELECT $2[j] AS "the_intersection"
FROM generate_series(
array_lower($2,1),
array_upper($2,1)
) AS j
);
$$;

Doesn't appear to work on 8.1:

psql=> select array_intersect('{1,2,3}', '{2,3,4}');
ERROR: could not determine anyarray/anyelement type because input
has type "unknown"

As mentioned in the "release notes" ;), it's up to you to ensure that
the arrays have the right types, so you'd write the above as:

SELECT array_intersect('{1,2,3}'::int[], '{2,3,4}'::int[]);

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#17Josh Trutwin
josh@trutwins.homeip.net
In reply to: David Fetter (#16)
Re: Array intersection

On Wed, 17 Oct 2007 11:26:05 -0700
David Fetter <david@fetter.org> wrote:

Doesn't appear to work on 8.1:

psql=> select array_intersect('{1,2,3}', '{2,3,4}');
ERROR: could not determine anyarray/anyelement type because input
has type "unknown"

As mentioned in the "release notes" ;), it's up to you to ensure
that the arrays have the right types, so you'd write the above as:

SELECT array_intersect('{1,2,3}'::int[], '{2,3,4}'::int[]);

Ah - I saw that but I figured that just meant you couldn't do
something like this:

SELECT array_intersect('{1,2,3}', '{"a","b","c"}');

So basically with this array intersect function you always need to
cast the inputs.

Josh

#18Pavel Stehule
pavel.stehule@gmail.com
In reply to: David Fetter (#16)
Re: Array intersection

<snip>

CREATE OR REPLACE FUNCTION array_intersect(ANYARRAY, ANYARRAY)
RETURNS ANYARRAY
LANGUAGE SQL
AS $$
SELECT ARRAY(
SELECT $1[i] AS "the_intersection"
FROM generate_series(
array_lower($1,1),
array_upper($1,1)
) AS i
INTERSECT
SELECT $2[j] AS "the_intersection"
FROM generate_series(
array_lower($2,1),
array_upper($2,1)
) AS j
);
$$;

nice :)

Maybe we can add function "generate_iterator"

CREATE OR REPLACE FUNCTION generate_iterator(ANYARRAY)
RETURNS SETOF integer AS
$$
SELECT i
FROM generate_series(array_lower($1, 1), array_upper($1,1)) AS i
$$ LANGUAGE SQL;

then
CREATE OR REPLACE FUNCTION array_intersect(ANYARRAY, ANYARRAY)
RETURNS ANYARRAY
LANGUAGE SQL AS
$$
SELECT ARRAY(
SELECT $1[i]
FROM genarate_iterator($1) i
INTERSECT
SELECT $2[j]
FROM generate_iterator($2) j
)
$$ ;

Regars
Pavel

#19Sam Mason
sam@samason.me.uk
In reply to: Josh Trutwin (#13)
Re: Array intersection

On Wed, Oct 17, 2007 at 01:00:48PM -0500, Josh Trutwin wrote:

Works like a champ on 8.1.

Thanks!

That's OK, it was a good learning experience.

Did you see David Fetter's reply to the original post? He has
an interesting alternative.

I did, it's much more elegant. I've never seen generate_series used
like that, it's a very nice way of doing things. If your arrays are
ever more than a few elements long his is probably going to be much
faster. Generally for less than 6 or 7 elements it's better doing the
naive thing, not sure if that applies here though.

Sam