Differences between = ANY and IN?

Started by Maciek Sakrejdaover 2 years ago3 messages
#1Maciek Sakrejda
m.sakrejda@gmail.com

Hello,

My colleague has been working on submitting a patch [1]https://github.com/rails/rails/pull/49388 to the Ruby
Rails framework to address some of the problems discussed in [2]/messages/by-id/20230209172651.cfgrebpyyr72h7fv@alvherre.pgsql.
Regardless of whether that change lands, the change in Rails would be
useful since people will be running Postgres versions without this
patch for a while.

My colleague's patch changes SQL generated from Ruby expressions like
`where(id: [1, 2])` . This is currently translated to roughly `WHERE
id IN (1, 2)` and would be changed to `id = ANY('{1,2}')`.

As far as we know, the expressions are equivalent, but we wanted to
double-check: are there any edge cases to consider here (other than
the pg_stat_statements behavior, of course)?

Thanks,
Maciek

[1]: https://github.com/rails/rails/pull/49388
[2]: /messages/by-id/20230209172651.cfgrebpyyr72h7fv@alvherre.pgsql

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Maciek Sakrejda (#1)
Re: Differences between = ANY and IN?

Maciek Sakrejda <m.sakrejda@gmail.com> writes:

My colleague's patch changes SQL generated from Ruby expressions like
`where(id: [1, 2])` . This is currently translated to roughly `WHERE
id IN (1, 2)` and would be changed to `id = ANY('{1,2}')`.

As far as we know, the expressions are equivalent, but we wanted to
double-check: are there any edge cases to consider here (other than
the pg_stat_statements behavior, of course)?

You would find it profitable to read transformAExprIn() in parse_expr.c.
The most important points are in this comment:

* We try to generate a ScalarArrayOpExpr from IN/NOT IN, but this is only
* possible if there is a suitable array type available. If not, we fall
* back to a boolean condition tree with multiple copies of the lefthand
* expression. Also, any IN-list items that contain Vars are handled as
* separate boolean conditions, because that gives the planner more scope
* for optimization on such clauses.

If all the values in the IN form were being sent to the backend as
constants of the same datatype, I think you're okay to consider it
as exactly equivalent to =ANY. It would likely be a good idea to
provide an explicit cast `id = ANY('{1,2}'::int[])` rather than just
hoping an unadorned literal will be taken as the type you want
(see transformAExprOpAny and thence make_scalar_array_op).

regards, tom lane

#3Maciek Sakrejda
m.sakrejda@gmail.com
In reply to: Tom Lane (#2)
Re: Differences between = ANY and IN?

Great, thanks for the guidance!